Getting started¶
This guide walks through some of the important (and some of the lesser important) features of kiara, the goal is to introduce new users to the overall framework, so they can get a feeling for what it can do, and whether it might be useful for their own usage scenarios.
Setting up kiara¶
In order to use kiara, we'll need to install it into a Python virtual (or conda-) environment, along all the plugins we might want to use. For the purpose of this tutorial, we'll use conda to create such an environment, but you can of course use a 'normal' virtualenv if you prefer. How to install conda itself is out of scope of this tutorial, but you should not have problems finding instructions online.
One simple way is to install the Anaconda (individual edition), then use the Anaconda navigator to create a new environment, install the 'git' package in it if your system does not already have it (you can install 'git' by running the conda install -c anaconda git
command in your terminal for example), and use the 'Open Terminal' option of that environment to start up a terminal that has that virtual-/conda-environment already activated.
Here's how to create the environment, activate it, then install the necessary dependencies (assuming conda is installed). At some point in the process, you may be prompted by the terminal to confirm further proceeding (generally by typing "y" and enter) to complete all the steps.
conda create -n kiara_tutorial python=3.9
conda activate kiara_tutorial
conda install -c conda-forge mamba
mamba install -c conda-forge -c dharpa kiara kiara_plugin.core_types kiara_plugin.tabular kiara_plugin.network_analysis
Note
We are using mamba as our package manager here, instead of 'pure' conda. This is optional, but recommended since it makes things a lot faster.
Getting some example data¶
For this tutorial, we'll need some example data, so we can use kiara against it. We've prepared a git repository for that purpose:
git clone https://github.com/DHARPA-Project/kiara.examples.git
cd kiara.examples
Specifically, here we'll be using two CSV files that were created by my colleague Lena Jaskov: files
The files contain information about connection (edges) between medical journals (JournalEdges1902.csv
), as well as additional metadata for the journals themselves (JournalNodes1902.csv
). We'll use that data to create table and graph structures with kiara.
Checking for available operations¶
First, let's have a look which operations are available, and what we can do with them:
kiara operation list
╭─ Available operations ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Id Type(s) Description │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ create.database.from.file create_from Create a database from a file. │
│ create.database.from.file_bundle create_from Create a database from a file_bundle value. │
│ create.database.from.table create_from Create a database value from a table. │
│ create.network_data.from.files pipeline Create table values from files containing edges and │
│ node data, then assemble those to the network_data │
│ result. │
│ create.network_data.from.tables Create a graph object from one or two tables. │
│ create.table.from.file create_from Create a table from a file, trying to auto-determine │
│ the format of said file. │
│ create.table.from.file_bundle create_from Create a table value from a text file_bundle. │
│ date.check_range Check whether a date falls within a specified date │
│ range. │
│ date.extract_from_string Extract a date object from a string. │
│ export.file.as.file export_as -- n/a -- │
│ export.network_data.as.csv_files export_as Export network data as 2 csv files (one for edges, one │
│ for nodes. │
│ export.network_data.as.graphml_file export_as Export network data as graphml file. │
│ export.network_data.as.sql_dump export_as Export network data as a sql dump file. │
│ export.network_data.as.sqlite_db export_as Export network data as a sqlite database file. │
│ export.table.as.csv_file export_as Export a table as csv file. │
│ extract.date_array.from.table pipeline Extract a date array from a table column. │
│ file_bundle.pick.file Pick a single file from a file_bundle value. │
│ file_bundle.pick.sub_folder Pick a sub-folder from a file_bundle, resulting in a │
│ new file_bundle. │
│ filter.table Filter a table. │
│ import.database.from.local_file_path pipeline Import a database from a csv file. │
│ import.local.file Import a file from the local filesystem. │
│ import.local.file_bundle Import a folder (file_bundle) from the local │
│ filesystem. │
│ import.network_data.from.local_file_paths pipeline Onboard the edges and nodes from local files, create │
│ table values from them, then assemble those to the │
│ network_data result. │
│ import.table.from.local_file_path pipeline Import a table from a file on the local filesystem. │
│ import.table.from.local_folder_path pipeline Import a table from a local folder containing text │
│ files. │
│ kiara_plugin.my_kiara_module.my_kiara_module.tutorial_module -- n/a -- │
│ list.contains Check whether an element is in a list. │
│ logic.and Returns 'True' if both inputs are 'True'. │
│ logic.nand pipeline Returns 'False' if both inputs are 'True'. │
│ logic.nor pipeline Returns 'True' if both inputs are 'False'. │
│ logic.not Negates the input. │
│ logic.or Returns 'True' if one of the inputs is 'True'. │
│ logic.xor pipeline Returns 'True' if exactly one of it's two inputs is │
│ 'True'. │
│ my_kiara_module.example A very simple example module; concatenate two strings. │
│ parse.date_array Create an array of date objects from an array of │
│ strings. │
│ query.database Execute a sql query against a (sqlite) database. │
│ query.table Execute a sql query against an (Arrow) table. │
│ string_filter.tokens filter -- n/a -- │
│ table.pick.column Pick one column from a table, returning an array. │
│ table_filter.drop_columns filter -- n/a -- │
│ table_filter.select_columns filter -- n/a -- │
│ table_filter.select_rows filter -- n/a -- │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Note
In this guide we'll use the term operation to indicate an entity that transforms data in some way or form. kiara also has the concept of module (the differences are explained in more detail here), and in most cases the meaning of 'module' and 'operation' is roughly the same. Especially in the context of this 'Getting started' guide. Nonetheless, keep in mind that technically both terms refer to different things.
Importing data, and creating a table¶
Tables are arguably the most used (and useful) data structures in data science and data engineering. They come in different forms; some people call them spreadsheets, or dataframes. We're not fancy, so we won't do that: we'll call them tables.
A depressingly large amount of (tabular) data comes in CSV files, which is why we'll use one as an example here. Specifically, we will
use JournalNodes1902.csv
. As stated above, this file contains information about historical medical
journals (name, type, where it was from, etc.), and we'll later use it as the table which will provide node information in a network graph. We want to convert this file into a 'proper' table structure, because
that will make subsequent processing faster, and also simpler in a lot of cases. 'Proper', in this case means we'll convert it into a better format for internal use, for example containing information about the data type in each column, among other things.
Finding the right command, and how to use it¶
kiara likes its data 'onboarded' (or: 'imported'), meaning it prefers to work with data that was imported into its internal data store. This effectively duplicates a file on a users filesystem (and depending on the filesystem used this could mean doubling the hard-disk space required for that particular dataset). The reason behind this preference is that this ensures the data won't be modified by an external application after import. This enables kiara to employ some techniques to save memory, hard-disk space as well as cpu-resources down the line.
So, in most cases, the first thing you (as a user) want to do is 'import' the source data you want to work with. So, let's run the operation list
command again, but let's filter using the term 'import':
kiara operation list import
╭─ Filtered operations ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Id Type(s) Description │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ import.database.from.local_file_path pipeline Import a database from a csv file. │
│ import.local.file Import a file from the local filesystem. │
│ import.local.file_bundle Import a folder (file_bundle) from the local filesystem. │
│ import.network_data.from.local_file_paths pipeline Onboard the edges and nodes from local files, create table values from them, │
│ then assemble those to the network_data result. │
│ import.table.from.local_file_path pipeline Import a table from a file on the local filesystem. │
│ import.table.from.local_folder_path pipeline Import a table from a local folder containing text files. │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Importing the 'raw' file¶
After looking at the kiara operation list
output, it looks like the import.local.file
module might be just what we need (to be honest, import.table.from.local_file_path
is what we'd really use if we weren't stuck in this getting-started guide, but doing that would skip over a few important basics that are worth understanding).
kiara has the run
sub-command, which is used to execute operations. If we only provide a module name, and not any input, this command will tell us what it expects:
kiara run import.local.file
╭─ Run info: import.local.file ────────────────────────────────────────────────╮
│ │
│ Can't run operation: invalid or insufficient input(s) │
│ │
│ ──────────────────────────────────────────────────────────────────────────── │
│ │
│ Operation: import.local.file │
│ │
│ Import a file from the local filesystem. │
│ │
│ Inputs: │
│ │
│ field name status type description required default │
│ ────────────────────────────────────────────────────────────────────────── │
│ path not set string The local path to yes │
│ the file. │
│ │
│ │
│ Outputs: │
│ │
│ field name type description │
│ ────────────────────────────────────────────────────────────────────────── │
│ file file The loaded files. │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
As makes obvious sense, we need to provide a path
input, of type string
, letting kiara know where to pick up the file. The kiara command-line interface can
take complex inputs like dictionaries, but fortunately this is not necessary here. If you ever come into a situation where you need that, check out this section.
For simple inputs like string-type things, all we need to do is provide the input name, followed by '=' and the value itself:
kiara run import.local.file path=examples/data/journals/JournalNodes1902.csv
╭─ Result ─────────────────────────────────────────────────────────────────────╮
│ │
│ field data_type value │
│ ────────────────────────────────────────────────────────────────────────── │
│ file file Id,Label,JournalType,City,CountryNetworkTime,Prese… │
│ 75,Psychiatrische en neurologische │
│ bladen,specialized: psychiatry and │
│ neurology,Amsterdam,Netherlands,Netherlands,52.3666… │
│ 36,The American Journal of Insanity,specialized: │
│ psychiatry and neurology,Baltimore,United │
│ States,United States,39.289444,-76.615278,English │
│ 208,The American Journal of Psychology,specialized: │
│ psychology,Baltimore,United States,United │
│ States,39.289444,-76.615278,English │
│ 295,Die Krankenpflege,specialized: │
│ therapy,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 296,Die deutsche Klinik am Eingange des zwanzigsten │
│ Jahrhunderts,general medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 300,Therapeutische Monatshefte,specialized: │
│ therapy,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 1,Allgemeine Zeitschrift für │
│ Psychiatrie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 7,Archiv für Psychiatrie und │
│ Nervenkrankheiten,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 10,Berliner klinische Wochenschrift,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 13,Charité Annalen,general medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 21,Monatsschrift für Psychiatrie und │
│ Neurologie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 29,Virchows Archiv,"specialized: anatomy, physiology │
│ and pathology",Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 31,Zeitschrift für pädagogische Psychologie und │
│ Pathologie,specialized: psychology and │
│ pedagogy,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 42,Vierteljahrsschrift für gerichtliche Medizin und │
│ öffentliches Sanitätswesen,"specialized: │
│ anthropology, criminology and │
│ forensics",Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 47,Centralblatt für Nervenheilkunde und │
│ Psychiatrie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 50,Russische medicinische Rundschau,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 76,Deutsche Aerzte-Zeitung,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 87,Monatsschrift für Geburtshülfe und │
│ Gynäkologie,specialized: gynecology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 108,Archiv für klinische Chirurgie,specialized: │
│ surgery,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 113,Zeitschrift für klinische Medicin,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 159,Deutsche militärärztliche │
│ Zeitschrift,specialized: military │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 162,Jahresbericht über die Leistungen und │
│ Fortschritte auf dem Gebiete der Neurologie und │
│ Psychiatrie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 192,Ärztliche Sachverständigen-Zeitung,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 198,Zeitschrift für die Behandlung Schwachsinniger │
│ und Epileptischer,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 258,Der Pfarrbote,news media,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 71,Correspondenz-Blatt für Schweizer Aerzte,general │
│ medicine,Bern,Switzerland,Switzerland,46.948056,7.4… │
│ 6,Archiv für mikroskopische Anatomie,"specialized: │
│ anatomy, physiology and pathology",Bonn,German │
│ Empire,Germany,50.733333,7.1,German │
│ 203,The Journal of Abnormal Psychology,specialized: │
│ psychology,Boston,United States,United │
│ States,42.358056,-71.063611,English │
│ 273,"Correspondenz-Blatt der Deutschen Gesellschaft │
│ für Anthropologie, Ethnologie und │
│ Urgeschichte","specialized: anthropology, │
│ criminology and forensics",Braunschweig,German │
│ Empire,Germany,52.266667,10.516667,German │
│ 303,Policlinique de Bruxelles,general │
│ medicine,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 306,Annales de la Société Belge de │
│ Neurologie,specialized: psychiatry and │
│ neurology,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 19,Journal de neurologie,specialized: psychiatry and │
│ neurology,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 25,"Revue internationale d'électrothérapie, de │
│ physiologie, de médecine, de chirurgie, │
│ d'obstétrique, de thérapeutique, de chimie et de │
│ pharmacie",general │
│ medicine,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 35,Bulletin de la Société de Médecine Mentale de │
│ Belgique,specialized: psychiatry and │
│ neurology,Brussels,Belgium,Belgium,50.85,4.35,French │
│ ... │
│ │
│ ... │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
As you can see from the terminal output, this produced one piece of output data: file
(referring to the imported file), and it displays a preview of the file in question for us. By itself, this doesn't do anything yet, it just reads the file and then stops. What we want in this case is to 'save' the file, so we can refer to it again later. The process of 'saving' a value in kiara persists the file (rather: it's content and some metadata) into the kiara data store, giving it an internal unique id (string), and allows the user to 'tag' the value with one or multiple aliases. Aliases are names that are meaningful to the user, in order to make it easy to refer to datasets later on.
kiara supports saving any of the output values of a kiara run
command via the --save
flag. This --save
parameter takes a single string as argument, and can be used in two ways:
- if you want to save all output fields of a
run
you can just provide a single string (for exampleimported_journal_csv
) as the parameter. In this case, kiara will store all result items with an auto-generated alias in the form of[save_argument].[field_name]
. In our case this would result in one item being store in the data store, with the aliasimported_journal_csv.file
. - if you want to save only a subset of result values, or want to have more control about the aliases those results get, you can use the
--save
parameter for every field you want to persist. In this case the argument to--save
must be in the form of:[field_name]=[alias]
. You can use the--save
parameter multiple times, with different field names.
In our case, lets opt for the second option:
kiara run --save file=journal_nodes_file import.local.file path=examples/data/journals/JournalNodes1902.csv
╭─ Result ─────────────────────────────────────────────────────────────────────╮
│ │
│ field data_type value │
│ ────────────────────────────────────────────────────────────────────────── │
│ file file Id,Label,JournalType,City,CountryNetworkTime,Prese… │
│ 75,Psychiatrische en neurologische │
│ bladen,specialized: psychiatry and │
│ neurology,Amsterdam,Netherlands,Netherlands,52.3666… │
│ 36,The American Journal of Insanity,specialized: │
│ psychiatry and neurology,Baltimore,United │
│ States,United States,39.289444,-76.615278,English │
│ 208,The American Journal of Psychology,specialized: │
│ psychology,Baltimore,United States,United │
│ States,39.289444,-76.615278,English │
│ 295,Die Krankenpflege,specialized: │
│ therapy,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 296,Die deutsche Klinik am Eingange des zwanzigsten │
│ Jahrhunderts,general medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 300,Therapeutische Monatshefte,specialized: │
│ therapy,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 1,Allgemeine Zeitschrift für │
│ Psychiatrie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 7,Archiv für Psychiatrie und │
│ Nervenkrankheiten,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 10,Berliner klinische Wochenschrift,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 13,Charité Annalen,general medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 21,Monatsschrift für Psychiatrie und │
│ Neurologie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 29,Virchows Archiv,"specialized: anatomy, physiology │
│ and pathology",Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 31,Zeitschrift für pädagogische Psychologie und │
│ Pathologie,specialized: psychology and │
│ pedagogy,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 42,Vierteljahrsschrift für gerichtliche Medizin und │
│ öffentliches Sanitätswesen,"specialized: │
│ anthropology, criminology and │
│ forensics",Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 47,Centralblatt für Nervenheilkunde und │
│ Psychiatrie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 50,Russische medicinische Rundschau,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 76,Deutsche Aerzte-Zeitung,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 87,Monatsschrift für Geburtshülfe und │
│ Gynäkologie,specialized: gynecology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 108,Archiv für klinische Chirurgie,specialized: │
│ surgery,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 113,Zeitschrift für klinische Medicin,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 159,Deutsche militärärztliche │
│ Zeitschrift,specialized: military │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 162,Jahresbericht über die Leistungen und │
│ Fortschritte auf dem Gebiete der Neurologie und │
│ Psychiatrie,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 192,Ärztliche Sachverständigen-Zeitung,general │
│ medicine,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 198,Zeitschrift für die Behandlung Schwachsinniger │
│ und Epileptischer,specialized: psychiatry and │
│ neurology,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 258,Der Pfarrbote,news media,Berlin,German │
│ Empire,Germany,52.52,13.405,German │
│ 71,Correspondenz-Blatt für Schweizer Aerzte,general │
│ medicine,Bern,Switzerland,Switzerland,46.948056,7.4… │
│ 6,Archiv für mikroskopische Anatomie,"specialized: │
│ anatomy, physiology and pathology",Bonn,German │
│ Empire,Germany,50.733333,7.1,German │
│ 203,The Journal of Abnormal Psychology,specialized: │
│ psychology,Boston,United States,United │
│ States,42.358056,-71.063611,English │
│ 273,"Correspondenz-Blatt der Deutschen Gesellschaft │
│ für Anthropologie, Ethnologie und │
│ Urgeschichte","specialized: anthropology, │
│ criminology and forensics",Braunschweig,German │
│ Empire,Germany,52.266667,10.516667,German │
│ 303,Policlinique de Bruxelles,general │
│ medicine,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 306,Annales de la Société Belge de │
│ Neurologie,specialized: psychiatry and │
│ neurology,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 19,Journal de neurologie,specialized: psychiatry and │
│ neurology,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 25,"Revue internationale d'électrothérapie, de │
│ physiologie, de médecine, de chirurgie, │
│ d'obstétrique, de thérapeutique, de chimie et de │
│ pharmacie",general │
│ medicine,Brussels,Belgium,Belgium,50.85,4.35,French │
│ 35,Bulletin de la Société de Médecine Mentale de │
│ Belgique,specialized: psychiatry and │
│ neurology,Brussels,Belgium,Belgium,50.85,4.35,French │
│ ... │
│ │
│ ... │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Stored result value ────────────────────────────────────────────────────────╮
│ │
│ field data type stored id alias(es) │
│ ────────────────────────────────────────────────────────────────────────── │
│ file file 8bb90738-ab11-4cfb-8ada-f43549… journal_nodes_file │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Checking the data store¶
To check whether that worked, we can list all of our items in the data store, and see if the one we just created is in there:
kiara data list
╭─ Available aliases ──────────────────────────────────────────────────────────╮
│ │
│ alias type size │
│ ────────────────────────────────────── │
│ journal_nodes_file file 33.43 KB │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
All right! Looks like this worked.
Creating a table from an imported CSV file¶
CSV files are usually not much use by themselves, in most cases we want to create a table-like structure from them, so we can efficiently query the data. This usually also makes sure that the structure and format of the file is valid.
Let's ask kiara what 'create' related operations it has available:
kiara operation list create
╭─ Filtered operations ────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Id Type(s) Description │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ create.database.from.file create_from Create a database from a file. │
│ create.database.from.file_bundle create_from Create a database from a file_bundle value. │
│ create.database.from.table create_from Create a database value from a table. │
│ create.network_data.from.files pipeline Create table values from files containing edges and node data, │
│ then assemble those to the network_data result. │
│ create.network_data.from.tables Create a graph object from one or two tables. │
│ create.table.from.file create_from Create a table from a file, trying to auto-determine the format │
│ of said file. │
│ create.table.from.file_bundle create_from Create a table value from a text file_bundle. │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Righto, looks like create.table.from.file
might be our ticket! Let's see what it does:
kiara operation explain create.table.from.file
╭─ Operation: create.table.from.file ──────────────────────────────────────────╮
│ │
│ Documentation Create a table from a file, trying to auto-determine the │
│ format of said file. │
│ │
│ Inputs │
│ field │
│ name type descripti… Required Default │
│ ────────────────────────────────────────────────────── │
│ file file The source yes -- no │
│ value (of default │
│ type -- │
│ 'file'). │
│ │
│ │
│ Outputs │
│ field name type description │
│ ────────────────────────────────────────────────────── │
│ table table The result value (of type │
│ 'table'). │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
So, it needs an input file
of type ... file
, and will return a 'table'-named output of type, well ... table
. Looks good. Here is how we run this:
kiara run create.table.from.file file=alias:journal_nodes_file
╭─ Result ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ field data_type value │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ table table │
│ Id Label JournalType City CountryNetworkTime PresentDayCountry Latitude Longitude Language │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ 75 Psychiatrische en neurologische specialized: psychiatry and neuro Amsterdam Netherlands Netherlands 52.366667 4.9 Dutch │
│ 36 The American Journal of Insanity specialized: psychiatry and neuro Baltimore United States United States 39.289444 -76.615278 English │
│ 208 The American Journal of Psycholo specialized: psychology Baltimore United States United States 39.289444 -76.615278 English │
│ 295 Die Krankenpflege specialized: therapy Berlin German Empire Germany 52.52 13.405 German │
│ 296 Die deutsche Klinik am Eingange general medicine Berlin German Empire Germany 52.52 13.405 German │
│ 300 Therapeutische Monatshefte specialized: therapy Berlin German Empire Germany 52.52 13.405 German │
│ 1 Allgemeine Zeitschrift für Psych specialized: psychiatry and neuro Berlin German Empire Germany 52.52 13.405 German │
│ 7 Archiv für Psychiatrie und Nerve specialized: psychiatry and neuro Berlin German Empire Germany 52.52 13.405 German │
│ 10 Berliner klinische Wochenschrift general medicine Berlin German Empire Germany 52.52 13.405 German │
│ 13 Charité Annalen general medicine Berlin German Empire Germany 52.52 13.405 German │
│ 21 Monatsschrift für Psychiatrie un specialized: psychiatry and neuro Berlin German Empire Germany 52.52 13.405 German │
│ 29 Virchows Archiv specialized: anatomy, physiology Berlin German Empire Germany 52.52 13.405 German │
│ 31 Zeitschrift für pädagogische Psy specialized: psychology and pedag Berlin German Empire Germany 52.52 13.405 German │
│ 42 Vierteljahrsschrift für gerichtl specialized: anthropology, crimin Berlin German Empire Germany 52.52 13.405 German │
│ 47 Centralblatt für Nervenheilkunde specialized: psychiatry and neuro Berlin German Empire Germany 52.52 13.405 German │
│ 50 Russische medicinische Rundschau general medicine Berlin German Empire Germany 52.52 13.405 German │
│ ... ... ... ... ... ... ... ... ... │
│ ... ... ... ... ... ... ... ... ... │
│ 277 L'arte medica general medicine Turin Italy Italy 45.079167 7.676111 Italian │
│ 288 Allgemeine österreichische Geric specialized: anthropology, crimin Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 18 Jahrbücher für Psychiatrie specialized: psychiatry and neuro Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 30 Wiener klinische Rundschau general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 44 Wiener klinische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 45 Wiener medizinische Wochenschrif general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 72 Wiener medizinische Presse general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 81 Monatsschrift für Gesundheitspfl general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 93 Klinisch-therapeutische Wochensc general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 151 Medicinisch-chirurgisches Centra specialized: surgery Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 199 Der Militärazt specialized: military medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 261 Медицинская беседа general medicine Voronezh Russian Empire Russia 51.671667 39.210556 Russian │
│ 77 Medycyna general medicine Warsaw Russian Empire Poland 52.233333 21.016667 Polish │
│ 150 Kronika Lekarska general medicine Warsaw Russian Empire Poland 52.233333 21.016667 Polish │
│ 86 Grenzfragen des Nerven- und Seel specialized: psychiatry and neuro Wiesbaden German Empire Germany 50.0825 8.24 German │
│ 206 Ergebnisse der Allgemeinen Patho specialized: anatomy, physiology Wiesbaden German Empire Germany 50.0825 8.24 German │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Note
In this example we pre-pend the right side of the file=
argument with alias:
. This is necessary to make it clear to kiara that we mean
a dataset that lives in its data store, and we want to refer to it via its alias. Otherwise, kiara would have just interpreted the input as a string, and since that is of the wrong input type
(we needed a table), it would have thrown an error.
That output looks good, right? Much more table-y then before. Only thing is: we want to again 'save' this output, so we can use it later directly. No big deal, just like last time:
kiara run --output silent --save table=journal_nodes_table create.table.from.file file=alias:journal_nodes_file
╭─ Stored result value ────────────────────────────────────────────────────────╮
│ │
│ field data type stored id alias(es) │
│ ────────────────────────────────────────────────────────────────────────── │
│ table table 8ef2a5ea-031a-4370-93ac-c8d42… journal_nodes_table │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Note
Here we use the --output silent
command line option to supress any output of values. We've seen this already in the
last invocation of this command. kiara will still tell us the id of the value it just saved.
Checking the data store, again¶
Now, let's look again at the content of the kiara data store:
kiara data list
╭─ Available aliases ──────────────────────────────────────────────────────────╮
│ │
│ alias type size │
│ ──────────────────────────────────────── │
│ journal_nodes_file file 33.43 KB │
│ journal_nodes_table table 42.79 KB │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
As you can see, there are 2 items now: one file
, and one table
. If you ever want to get more details about any of the items in the data store, you can use one of those commands:
Display information about the data: kiara data explain
¶
kiara data explain alias:journal_nodes_table
╭─ Value details for: alias:journal_nodes_table ───────────────────────────────╮
│ │
│ value_id 8ef2a5ea-031a-4370-93ac-c8d42dc1ea3b │
│ kiara_id bc41cc78-899c-433b-8e8d-33d0c9990791 │
│ │
│ ──────────────────────────────────────────────────── │
│ data_type_info │
│ data_type_name table │
│ data_type_config {} │
│ characteristics { │
│ "is_scalar": false, │
│ "is_json_serializable": │
│ false │
│ } │
│ data_type_class │
│ python_cla… TableType │
│ python_mod… kiara_plug… │
│ full_name kiara_plug… │
│ │
│ │
│ destiny_backlinks {} │
│ enviroments None │
│ property_links { │
│ "metadata.python_class": │
│ "aae815fc-07fb-48ef-a8e3-cc18473d8389", │
│ "metadata.table": │
│ "30bec5f0-d69a-4b2e-bda2-e99a411d6463" │
│ } │
│ value_hash zdpuAn89Et1ENzfoASJRYcWEceyfRiPg664mN4nnHLFnjRLyg │
│ value_schema │
│ type table │
│ type_config {} │
│ default __not_set__ │
│ optional False │
│ is_constant False │
│ doc The result value (of type │
│ 'table'). │
│ │
│ value_size 42.79 KB │
│ value_status -- set -- │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
This command prints out the metadata kiara has stored about a value item. This commands supports displaying several internally important metadata details of stored datasets, check out the available options with kiara data explain --help
. One option that is particularly interesting is the --properties
one, which displays all the metadata properties kiara has collected about a value. We will experiment with this option a bit later in this tutorial.
Display the data itself: kiara data load
¶
kiara data load -s alias:journal_nodes_table
Id Label JournalType City CountryNetworkTime PresentDayCountry Latitude Longitude Language
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
75 Psychiatrische en neurologische bladen specialized: psychiatry and neurology Amsterdam Netherlands Netherlands 52.366667 4.9 Dutch
36 The American Journal of Insanity specialized: psychiatry and neurology Baltimore United States United States 39.289444 -76.615278 English
208 The American Journal of Psychology specialized: psychology Baltimore United States United States 39.289444 -76.615278 English
295 Die Krankenpflege specialized: therapy Berlin German Empire Germany 52.52 13.405 German
296 Die deutsche Klinik am Eingange des zwanzigste general medicine Berlin German Empire Germany 52.52 13.405 German
300 Therapeutische Monatshefte specialized: therapy Berlin German Empire Germany 52.52 13.405 German
1 Allgemeine Zeitschrift für Psychiatrie specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German
7 Archiv für Psychiatrie und Nervenkrankheiten specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German
10 Berliner klinische Wochenschrift general medicine Berlin German Empire Germany 52.52 13.405 German
13 Charité Annalen general medicine Berlin German Empire Germany 52.52 13.405 German
21 Monatsschrift für Psychiatrie und Neurologie specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German
29 Virchows Archiv specialized: anatomy, physiology and pathology Berlin German Empire Germany 52.52 13.405 German
31 Zeitschrift für pädagogische Psychologie und P specialized: psychology and pedagogy Berlin German Empire Germany 52.52 13.405 German
42 Vierteljahrsschrift für gerichtliche Medizin u specialized: anthropology, criminology and fore Berlin German Empire Germany 52.52 13.405 German
47 Centralblatt für Nervenheilkunde und Psychiatr specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German
50 Russische medicinische Rundschau general medicine Berlin German Empire Germany 52.52 13.405 German
... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ...
277 L'arte medica general medicine Turin Italy Italy 45.079167 7.676111 Italian
288 Allgemeine österreichische Gerichts-Zeitung specialized: anthropology, criminology and fore Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
18 Jahrbücher für Psychiatrie specialized: psychiatry and neurology Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
30 Wiener klinische Rundschau general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
44 Wiener klinische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
45 Wiener medizinische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
72 Wiener medizinische Presse general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
81 Monatsschrift für Gesundheitspflege general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
93 Klinisch-therapeutische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
151 Medicinisch-chirurgisches Centralblatt specialized: surgery Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
199 Der Militärazt specialized: military medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German
261 Медицинская беседа general medicine Voronezh Russian Empire Russia 51.671667 39.210556 Russian
77 Medycyna general medicine Warsaw Russian Empire Poland 52.233333 21.016667 Polish
150 Kronika Lekarska general medicine Warsaw Russian Empire Poland 52.233333 21.016667 Polish
86 Grenzfragen des Nerven- und Seelenlebens specialized: psychiatry and neurology Wiesbaden German Empire Germany 50.0825 8.24 German
206 Ergebnisse der Allgemeinen Pathologie und Path specialized: anatomy, physiology and pathology Wiesbaden German Empire Germany 50.0825 8.24 German
Note
If you omit the -s
flag, this command will let you browse the table (or any other supported data type) interactively, similar to a pager application.
This command loads the actual data, and prints out its content (or a representation of it that makes sense in a terminal-context).
Querying the table data¶
This section is a bit more advanced, so you can skip it if you want. It's just to show an example of what can be done with a stored table data item.
We'll be using the SQL query language to find the names and types of all journals from Berlin. The query for this is:
select Label, JournalType from data where City='Berlin'
The kiara module we are going to use is called query.table
. Let's check again the parameters this module expects:
kiara run query.table
╭─ Run info: query.table ──────────────────────────────────────────────────────╮
│ │
│ Can't run operation: invalid or insufficient input(s) │
│ │
│ ──────────────────────────────────────────────────────────────────────────── │
│ │
│ Operation: query.table │
│ │
│ Execute a sql query against an (Arrow) table. │
│ │
│ The default relation name for the sql query is 'data', but can be modified │
│ by the 'relation_name' config option/input. │
│ │
│ If the 'query' module config option is not set, users can provide their own │
│ query, otherwise the pre-set one will be used. │
│ │
│ Inputs: │
│ │
│ field name status type description required default │
│ ────────────────────────────────────────────────────────────────────────── │
│ query not set string The query, use yes │
│ the value of the │
│ 'relation_name' │
│ input as table, │
│ e.g. 'select * │
│ from data'. │
│ relation_name valid string The name the no data │
│ table is │
│ referred to in │
│ the sql query. │
│ table not set table The table to yes │
│ query │
│ │
│ │
│ Outputs: │
│ │
│ field name type description │
│ ────────────────────────────────────────────────────────────────────────── │
│ query_result table The query result. │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Aha. table
, and query
are required. Good, we have both. In this example we'll use the data item we've stored as input for another workflow. That goes like this:
kiara run query.table table=alias:journal_nodes_table query="select Label, JournalType from data where City='Berlin'"
╭─ Result ─────────────────────────────────────────────────────────────────────╮
│ │
│ field data_type value │
│ ────────────────────────────────────────────────────────────────────────── │
│ query_result table │
│ Label JournalType │
│ ─────────────────────────────────────────── │
│ Die Krankenpflege specialized: therap │
│ Die deutsche Klinik general medicine │
│ Therapeutische Mona specialized: therap │
│ Allgemeine Zeitschr specialized: psychi │
│ Archiv für Psychiat specialized: psychi │
│ Berliner klinische general medicine │
│ Charité Annalen general medicine │
│ Monatsschrift für P specialized: psychi │
│ Virchows Archiv specialized: anatom │
│ Zeitschrift für päd specialized: psycho │
│ Vierteljahrsschrift specialized: anthro │
│ Centralblatt für Ne specialized: psychi │
│ Russische medicinis general medicine │
│ Deutsche Aerzte-Zei general medicine │
│ Monatsschrift für G specialized: gyneco │
│ Archiv für klinisch specialized: surger │
│ Zeitschrift für kli general medicine │
│ Deutsche militärärz specialized: milita │
│ Jahresbericht über specialized: psychi │
│ Ärztliche Sachverst general medicine │
│ Zeitschrift für die specialized: psychi │
│ Der Pfarrbote news media │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Note how we use the alias:
-prefix again here, to signify to kiara that what follows is indeed a reference to a dataset, and not a string...
Saving the result of the query¶
As it is, the result of this query won't be saved anywhere. This might be fine for queries in exploratory-type situations. But in some cases
we might want to store the result of our work, similar to how we imported the original table in the first place. The kiara run
command can do that, using the --save
flag. It takes as argument a string. If that string contains a '=', it is interpreted as a key value pair where the key is the name of the field we want to save, and the value the alias we want to save it under. Here is how that goes:
kiara run query.table --output=silent --save query_result=berlin_journals table=alias:journal_nodes_table query="select Label, JournalType from data where City='Berlin'"
╭─ Stored result value ────────────────────────────────────────────────────────╮
│ │
│ field data type stored id alias(es) │
│ ────────────────────────────────────────────────────────────────────────── │
│ query_result table f32438b9-ab95-4d18-b15e-22… berlin_journals │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
From looking at the output, it seems that saving our result has worked. We can make sure by letting kiara 'explain' to us the data that is stored under the alias 'berlin_journals'. This time, let's also display the result tables properties (by using the --properties
flag:
kiara data explain --properties alias:berlin_journals
╭─ Value details for: alias:berlin_journals ───────────────────────────────────╮
│ │
│ value_id f32438b9-ab95-4d18-b15e-22f6e168454d │
│ kiara_id bc41cc78-899c-433b-8e8d-33d0c9990791 │
│ │
│ ──────────────────────────────────────────────────── │
│ data_type_info │
│ data_type_name table │
│ data_type_config {} │
│ characteristics { │
│ "is_scalar": false, │
│ "is_json_serializable": │
│ false │
│ } │
│ data_type_class │
│ python_cla… TableType │
│ python_mod… kiara_plug… │
│ full_name kiara_plug… │
│ │
│ │
│ destiny_backlinks {} │
│ enviroments None │
│ properties │
│ field value │
│ ────────────────────────────────────────────────── │
│ metadata.python_class { │
│ "python_class": { │
│ "python_class_name"… │
│ "python_module_name… │
│ "full_name": "kiara… │
│ } │
│ } │
│ metadata.table { │
│ "table": { │
│ "column_names": [ │
│ "Label", │
│ "JournalType" │
│ ], │
│ "column_schema": { │
│ "Label": { │
│ "type_name": "s… │
│ "metadata": { │
│ "arrow_type_i… │
│ } │
│ }, │
│ "JournalType": { │
│ "type_name": "s… │
│ "metadata": { │
│ "arrow_type_i… │
│ } │
│ } │
│ }, │
│ "rows": 22, │
│ "size": 1672 │
│ } │
│ } │
│ │
│ property_links { │
│ "metadata.python_class": │
│ "e71ccadc-8eee-49cb-93e1-ee42faeaad96", │
│ "metadata.table": │
│ "ca42d233-c95c-4d80-a690-f1acd840cf9f" │
│ } │
│ value_hash zdpuAq5Ty5hNtUaKWouPmS75LxteiQQv6Ue6Jsq9v39QoMPyw │
│ value_schema │
│ type table │
│ type_config {} │
│ default __not_set__ │
│ optional False │
│ is_constant False │
│ doc The query result. │
│ │
│ value_size 2.63 KB │
│ value_status -- set -- │
│ │
│ ──────────────────────────────────────────────────── │
│ │
│ properties │
│ metadata.python_class { │
│ "python_class": { │
│ "python_class_name"… │
│ "python_module_name… │
│ "full_name": "kiara… │
│ } │
│ } │
│ metadata.table { │
│ "table": { │
│ "column_names": [ │
│ "Label", │
│ "JournalType" │
│ ], │
│ "column_schema": { │
│ "Label": { │
│ "type_name": "s… │
│ "metadata": { │
│ "arrow_type_i… │
│ } │
│ }, │
│ "JournalType": { │
│ "type_name": "s… │
│ "metadata": { │
│ "arrow_type_i… │
│ } │
│ } │
│ }, │
│ "rows": 22, │
│ "size": 1672 │
│ } │
│ } │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Generating a network graph¶
Our goal for this tutorial is to create a network graph, and investigate its properties. Network graphs are usually created from one or two pieces of data (both tabular in nature):
- edges (mandatory): information about what nodes exist, and if and how they are connected
- nodes information (optional): information about attributes of each node
Note
In this tutorial we'll go through all the steps necessary to create a network graph object from two CSV files, one by one. This is a bit cumbersome, but it'll help you understand what actually happens. In a later tutorial we'll show how to create a kiara pipeline to combine all those steps into one.
Importing edges data, creating a table item from it¶
We already have our nodes imported into kiara (with the alias my_first_table
). Now we need to do the same for our edges. Similar to what we have done above, we want to import the file into
the kiara data store, and then convert it into a table. This time, let's just use a pre-pared (so-called) pipeline operation, which basically runs both operations in one, and feeds the right input(s) into the right input(s):
kiara operation explain import.table.from.local_file_path
╭─ Operation: import.table.from.local_file_path ───────────────────────────────╮
│ │
│ Documentation Import a table from a file on the local filesystem. │
│ │
│ Inputs │
│ field │
│ name type descrip… Required Default │
│ ────────────────────────────────────────────────────── │
│ path string The yes -- no │
│ local default │
│ path to -- │
│ the │
│ file. │
│ │
│ │
│ Outputs │
│ field name type description │
│ ────────────────────────────────────────────────────── │
│ imported_file file The loaded files. │
│ table table The result value (of type │
│ 'table'). │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
So, let's see:
kiara run --save journal_edges import.table.from.local_file_path path=examples/data/journals/JournalEdges1902.csv
╭─ Results ────────────────────────────────────────────────────────────────────╮
│ │
│ field data_type value │
│ ────────────────────────────────────────────────────────── │
│ imported_file file Source,Target,weight │
│ 1,1,11 │
│ 1,5,1 │
│ 1,7,6 │
│ 1,8,15 │
│ 1,10,24 │
│ 1,13,1 │
│ 1,14,2 │
│ 1,15,8 │
│ 1,18,7 │
│ 1,20,48 │
│ 1,21,7 │
│ 1,22,4 │
│ 1,23,75 │
│ 1,24,1 │
│ 1,26,8 │
│ 1,29,1 │
│ 1,30,14 │
│ 1,35,16 │
│ 1,36,23 │
│ 1,37,4 │
│ 1,38,5 │
│ 1,39,4 │
│ 1,40,10 │
│ 1,41,2 │
│ 1,42,4 │
│ 1,43,2 │
│ 1,44,1 │
│ 1,45,5 │
│ 1,46,7 │
│ 1,47,2 │
│ 1,56,1 │
│ 1,58,34 │
│ 1,61,9 │
│ 1,63,12 │
│ ... │
│ │
│ ... │
│ table table │
│ Source Target weight │
│ ────────────────────────── │
│ 1 1 11 │
│ 1 5 1 │
│ 1 7 6 │
│ 1 8 15 │
│ 1 10 24 │
│ 1 13 1 │
│ 1 14 2 │
│ 1 15 8 │
│ 1 18 7 │
│ 1 20 48 │
│ 1 21 7 │
│ 1 22 4 │
│ 1 23 75 │
│ 1 24 1 │
│ 1 26 8 │
│ 1 29 1 │
│ ... ... ... │
│ ... ... ... │
│ 51 108 1 │
│ 51 109 5 │
│ 51 110 1 │
│ 51 111 1 │
│ 51 112 1 │
│ 51 113 1 │
│ 51 114 2 │
│ 51 115 2 │
│ 51 116 1 │
│ 51 118 3 │
│ 51 119 2 │
│ 51 120 1 │
│ 51 121 1 │
│ 63 102 1 │
│ 147 27 11 │
│ 147 241 1 │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Stored result values ───────────────────────────────────────────────────────╮
│ │
│ field data type stored id alias(es) │
│ ────────────────────────────────────────────────────────────────────────── │
│ imported_file file d524e19e-be9b-4f56-b… journal_edges.impor… │
│ table table 0aea06f2-f729-4ec5-b… journal_edges.table │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Note
Here we've used a simple string (without '=') with the --save
option, and as you can see, kiara created two namespaced aliases for the result items.
At this stage we'll have two relevant tables in our store: journal_edges.table
, and journal_nodes_table
(note how both use different naming schemes due to us using the --save
option differently in both cases):
kiara data list
╭─ Available aliases ──────────────────────────────────────────────────────────╮
│ │
│ alias type size │
│ ──────────────────────────────────────────────── │
│ journal_edges.table table 9.13 KB │
│ journal_nodes_file file 33.43 KB │
│ journal_nodes_table table 42.79 KB │
│ journal_edges.imported_file file 3.02 KB │
│ berlin_journals table 2.63 KB │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Creating the graph¶
Now that we have the edges data in kiara in a useful format, we can create the graph object. The data type for graphs in kiara is called network_data
, so let's check out all the operations kiara has to offer related to network_data
:
kiara operation list network_data
╭─ Filtered operations ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Id Type(s) Description │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ create.network_data.from.files pipeline Create table values from files containing edges and node data, then assemble those to the network_data result. │
│ create.network_data.from.tables Create a graph object from one or two tables. │
│ export.network_data.as.csv_files export_as Export network data as 2 csv files (one for edges, one for nodes. │
│ export.network_data.as.graphml_file export_as Export network data as graphml file. │
│ export.network_data.as.sql_dump export_as Export network data as a sql dump file. │
│ export.network_data.as.sqlite_db export_as Export network data as a sqlite database file. │
│ import.network_data.from.local_file_paths pipeline Onboard the edges and nodes from local files, create table values from them, then assemble those to the network_data │
│ result. │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Hm, create.network_data.from.tables
looks good, right? Let's see that operations interface:
kiara operation explain create.network_data.from.tables
╭─ Operation: create.network_data.from.tables ─────────────────────────────────────────────────────────────────────────╮
│ │
│ Documentation Create a graph object from one or two tables. │
│ │
│ This module needs at least one table as input, providing the edges of the resulting network data │
│ set. │
│ If no further table is created, basic node information will be automatically created by using │
│ unique values from │
│ the edges source and target columns. │
│ │
│ Inputs │
│ field name type description Required Default │
│ ────────────────────────────────────────────────────────────────────────────────────────────── │
│ edges table A table that contains the edges yes -- no default -- │
│ data. │
│ source_column_name string The name of the source column no source │
│ name in the edges table. │
│ target_column_name string The name of the target column no target │
│ name in the edges table. │
│ edges_column_map dict An optional map of original no -- no default -- │
│ column name to desired. │
│ nodes table A table that contains the nodes no -- no default -- │
│ data. │
│ id_column_name string The name (before any potential no id │
│ column mapping) of the │
│ node-table column that contains │
│ the node identifier (used in the │
│ edges table). │
│ label_column_name string The name of a column that no -- no default -- │
│ contains the node label (before │
│ any potential column name │
│ mapping). If not specified, the │
│ value of the id value will be │
│ used as label. │
│ nodes_column_map dict An optional map of original no -- no default -- │
│ column name to desired. │
│ │
│ │
│ Outputs │
│ field name type description │
│ ────────────────────────────────────────────────────────────────────────────────────────────── │
│ network_data network_data The network/graph data. │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
From this information we can assemble our command, using alias:edges_table
as the main input, and saving it using the alias journals_graph
. We can figure the values for the other inputs out be running kiara data explain --properties journal_edges.table
, which will give us the edge column names, among other things (and, subsequently, `kiara data explain --properties journal_nodes_table. So, here goes nothing:
kiara run --save network_data=journals_graph create.network_data.from.tables edges=alias:journal_edges.table source_column_name=Source target_column_name=Target nodes=alias:journal_nodes_table id_column_name=Id label_column_name=Label
╭─ Result ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ field data_type value │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ network_data network_data │
│ Table: edges │
│ │
│ source target weight │
│ ────────────────────────── │
│ 1 1 11 │
│ 1 5 1 │
│ 1 7 6 │
│ 1 8 15 │
│ 1 10 24 │
│ 1 13 1 │
│ 1 14 2 │
│ 1 15 8 │
│ 1 18 7 │
│ 1 20 48 │
│ 1 21 7 │
│ 1 22 4 │
│ 1 23 75 │
│ 1 24 1 │
│ 1 26 8 │
│ 1 29 1 │
│ ... ... ... │
│ ... ... ... │
│ 51 108 1 │
│ 51 109 5 │
│ 51 110 1 │
│ 51 111 1 │
│ 51 112 1 │
│ 51 113 1 │
│ 51 114 2 │
│ 51 115 2 │
│ 51 116 1 │
│ 51 118 3 │
│ 51 119 2 │
│ 51 120 1 │
│ 51 121 1 │
│ 63 102 1 │
│ 147 27 11 │
│ 147 241 1 │
│ │
│ Table: nodes │
│ │
│ id label JournalType City CountryNetworkTime PresentDayCountry Latitude Longitude Language │
│ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ 75 Psychiatrische en neurologische bladen specialized: psychiatry and neurology Amsterdam Netherlands Netherlands 52.366667 4.9 Dutch │
│ 36 The American Journal of Insanity specialized: psychiatry and neurology Baltimore United States United States 39.289444 -76.615278 English │
│ 208 The American Journal of Psychology specialized: psychology Baltimore United States United States 39.289444 -76.615278 English │
│ 295 Die Krankenpflege specialized: therapy Berlin German Empire Germany 52.52 13.405 German │
│ 296 Die deutsche Klinik am Eingange des zwanzigsten general medicine Berlin German Empire Germany 52.52 13.405 German │
│ 300 Therapeutische Monatshefte specialized: therapy Berlin German Empire Germany 52.52 13.405 German │
│ 1 Allgemeine Zeitschrift für Psychiatrie specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German │
│ 7 Archiv für Psychiatrie und Nervenkrankheiten specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German │
│ 10 Berliner klinische Wochenschrift general medicine Berlin German Empire Germany 52.52 13.405 German │
│ 13 Charité Annalen general medicine Berlin German Empire Germany 52.52 13.405 German │
│ 21 Monatsschrift für Psychiatrie und Neurologie specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German │
│ 29 Virchows Archiv specialized: anatomy, physiology and pathology Berlin German Empire Germany 52.52 13.405 German │
│ 31 Zeitschrift für pädagogische Psychologie und Pat specialized: psychology and pedagogy Berlin German Empire Germany 52.52 13.405 German │
│ 42 Vierteljahrsschrift für gerichtliche Medizin und specialized: anthropology, criminology and fore Berlin German Empire Germany 52.52 13.405 German │
│ 47 Centralblatt für Nervenheilkunde und Psychiatrie specialized: psychiatry and neurology Berlin German Empire Germany 52.52 13.405 German │
│ 50 Russische medicinische Rundschau general medicine Berlin German Empire Germany 52.52 13.405 German │
│ ... ... ... ... ... ... ... ... ... │
│ ... ... ... ... ... ... ... ... ... │
│ 277 L'arte medica general medicine Turin Italy Italy 45.079167 7.676111 Italian │
│ 288 Allgemeine österreichische Gerichts-Zeitung specialized: anthropology, criminology and fore Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 18 Jahrbücher für Psychiatrie specialized: psychiatry and neurology Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 30 Wiener klinische Rundschau general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 44 Wiener klinische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 45 Wiener medizinische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 72 Wiener medizinische Presse general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 81 Monatsschrift für Gesundheitspflege general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 93 Klinisch-therapeutische Wochenschrift general medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 151 Medicinisch-chirurgisches Centralblatt specialized: surgery Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 199 Der Militärazt specialized: military medicine Vienna Austro-Hungarian Empire Austria 48.2 16.366667 German │
│ 261 Медицинская беседа general medicine Voronezh Russian Empire Russia 51.671667 39.210556 Russian │
│ 77 Medycyna general medicine Warsaw Russian Empire Poland 52.233333 21.016667 Polish │
│ 150 Kronika Lekarska general medicine Warsaw Russian Empire Poland 52.233333 21.016667 Polish │
│ 86 Grenzfragen des Nerven- und Seelenlebens specialized: psychiatry and neurology Wiesbaden German Empire Germany 50.0825 8.24 German │
│ 206 Ergebnisse der Allgemeinen Pathologie und Pathol specialized: anatomy, physiology and pathology Wiesbaden German Empire Germany 50.0825 8.24 German │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Stored result value ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ field data type stored id alias(es) │
│ ───────────────────────────────────────────────────────────────────────────────────── │
│ network_data network_data aea3b645-09a7-4c80-b782-608c03d188d5 journals_graph │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
To confirm our graph data is created, let's check the data store:
kiara data explain --properties alias:journals_graph
╭─ Value details for: alias:journals_graph ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ value_id aea3b645-09a7-4c80-b782-608c03d188d5 │
│ kiara_id bc41cc78-899c-433b-8e8d-33d0c9990791 │
│ │
│ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ data_type_info │
│ data_type_name network_data │
│ data_type_config {} │
│ characteristics { │
│ "is_scalar": false, │
│ "is_json_serializable": false │
│ } │
│ data_type_class │
│ python_class_name NetworkDataType │
│ python_module_name kiara_plugin.network_analysis.data_types │
│ full_name kiara_plugin.network_analysis.data_types.NetworkDataType │
│ │
│ │
│ destiny_backlinks {} │
│ enviroments None │
│ properties │
│ field value │
│ ───────────────────────────────────────────────────────────────────────────────────────────────── │
│ metadata.database { │
│ "tables": { │
│ "edges": { │
│ "column_names": [ │
│ "source", │
│ "target", │
│ "weight" │
│ ], │
│ "column_schema": { │
│ "source": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "target": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "weight": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ } │
│ }, │
│ "rows": 321, │
│ "size": 4096 │
│ }, │
│ "nodes": { │
│ "column_names": [ │
│ "id", │
│ "label", │
│ "JournalType", │
│ "City", │
│ "CountryNetworkTime", │
│ "PresentDayCountry", │
│ "Latitude", │
│ "Longitude", │
│ "Language" │
│ ], │
│ "column_schema": { │
│ "id": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "label": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "JournalType": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "City": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "CountryNetworkTime": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "PresentDayCountry": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "Latitude": { │
│ "type_name": "REAL", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "Longitude": { │
│ "type_name": "REAL", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "Language": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ } │
│ }, │
│ "rows": 276, │
│ "size": 40960 │
│ } │
│ } │
│ } │
│ metadata.graph_properties { │
│ "number_of_nodes": 276, │
│ "properties_by_graph_type": [ │
│ { │
│ "graph_type": "directed", │
│ "number_of_edges": 321 │
│ }, │
│ { │
│ "graph_type": "undirected", │
│ "number_of_edges": 313 │
│ }, │
│ { │
│ "graph_type": "directed-multi", │
│ "number_of_edges": 321 │
│ }, │
│ { │
│ "graph_type": "undirected-multi", │
│ "number_of_edges": 321 │
│ } │
│ ] │
│ } │
│ metadata.python_class { │
│ "python_class": { │
│ "python_class_name": "NetworkData", │
│ "python_module_name": "kiara_plugin.network_analysis.models", │
│ "full_name": "kiara_plugin.network_analysis.models.NetworkData" │
│ } │
│ } │
│ │
│ property_links { │
│ "metadata.database": "3c33c6b7-e152-4e52-b7bd-dfc28efb7045", │
│ "metadata.graph_properties": "b8d152d7-0381-4ced-a104-900ceeb1e1d3", │
│ "metadata.python_class": "3574d1d5-d9ed-435b-8b7a-395a43a4d4a1" │
│ } │
│ value_hash zdpuB17oZEahwMpecZvwQWEGDB17D9ppcHWaUQ6pWNLWsWKNX │
│ value_schema │
│ type network_data │
│ type_config {} │
│ default __not_set__ │
│ optional False │
│ is_constant False │
│ doc The network/graph data. │
│ │
│ value_size 61.44 KB │
│ value_status -- set -- │
│ │
│ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ │
│ properties │
│ metadata.database { │
│ "tables": { │
│ "edges": { │
│ "column_names": [ │
│ "source", │
│ "target", │
│ "weight" │
│ ], │
│ "column_schema": { │
│ "source": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "target": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "weight": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ } │
│ }, │
│ "rows": 321, │
│ "size": 4096 │
│ }, │
│ "nodes": { │
│ "column_names": [ │
│ "id", │
│ "label", │
│ "JournalType", │
│ "City", │
│ "CountryNetworkTime", │
│ "PresentDayCountry", │
│ "Latitude", │
│ "Longitude", │
│ "Language" │
│ ], │
│ "column_schema": { │
│ "id": { │
│ "type_name": "INTEGER", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "label": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "JournalType": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "City": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "CountryNetworkTime": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "PresentDayCountry": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "Latitude": { │
│ "type_name": "REAL", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "Longitude": { │
│ "type_name": "REAL", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ }, │
│ "Language": { │
│ "type_name": "TEXT", │
│ "metadata": { │
│ "nullable": false, │
│ "primary_key": false │
│ } │
│ } │
│ }, │
│ "rows": 276, │
│ "size": 40960 │
│ } │
│ } │
│ } │
│ metadata.graph_properties { │
│ "number_of_nodes": 276, │
│ "properties_by_graph_type": [ │
│ { │
│ "graph_type": "directed", │
│ "number_of_edges": 321 │
│ }, │
│ { │
│ "graph_type": "undirected", │
│ "number_of_edges": 313 │
│ }, │
│ { │
│ "graph_type": "directed-multi", │
│ "number_of_edges": 321 │
│ }, │
│ { │
│ "graph_type": "undirected-multi", │
│ "number_of_edges": 321 │
│ } │
│ ] │
│ } │
│ metadata.python_class { │
│ "python_class": { │
│ "python_class_name": "NetworkData", │
│ "python_module_name": "kiara_plugin.network_analysis.models", │
│ "full_name": "kiara_plugin.network_analysis.models.NetworkData" │
│ } │
│ } │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
All good. Also, check out the metadata kiara knows about the graph already.
Side-note: investigating the graph value lineage¶
kiara keeps track of all the modules and inputs that went into producing a value, basically its entire ancestry. This is not the place to explain why, and how that can be very powerful and useful. But if you are ever interested about what went into creating a particular value, you can do this with:
kiara data explain --lineage alias:journals_graph
╭─ Value details for: alias:journals_graph ────────────────────────────────────╮
│ │
│ value_id aea3b645-09a7-4c80-b782-608c03d188d5 │
│ kiara_id bc41cc78-899c-433b-8e8d-33d0c9990791 │
│ │
│ ──────────────────────────────────────────────────── │
│ data_type_info │
│ data_type_name network_data │
│ data_type_config {} │
│ characteristics { │
│ "is_scalar": false, │
│ "is_json_serializable": │
│ false │
│ } │
│ data_type_class │
│ python_cla… NetworkDat… │
│ python_mod… kiara_plug… │
│ full_name kiara_plug… │
│ │
│ │
│ destiny_backlinks {} │
│ enviroments None │
│ property_links { │
│ "metadata.database": │
│ "3c33c6b7-e152-4e52-b7bd-dfc28efb7045", │
│ "metadata.graph_properties": │
│ "b8d152d7-0381-4ced-a104-900ceeb1e1d3", │
│ "metadata.python_class": │
│ "3574d1d5-d9ed-435b-8b7a-395a43a4d4a1" │
│ } │
│ value_hash zdpuB17oZEahwMpecZvwQWEGDB17D9ppcHWaUQ6pWNLWsWKNX │
│ value_schema │
│ type network_data │
│ type_config {} │
│ default __not_set__ │
│ optional False │
│ is_constant False │
│ doc The network/graph data. │
│ │
│ value_size 61.44 KB │
│ value_status -- set -- │
│ │
│ ──────────────────────────────────────────────────── │
│ │
│ lineage create.network_data.from.tables │
│ ├── input: edges (table) = │
│ │ 0aea06f2-f729-4ec5-b4dc-707606dd7269 │
│ │ └── create.table │
│ │ └── input: file (file) = │
│ │ d524e19e-be9b-4f56-bcd1-103a3cc13f9f │
│ │ └── import.local.file │
│ │ └── input: path (string) = │
│ │ 837a0090-f20b-436d-8125-a3076df… │
│ ├── input: edges_column_map (dict) = │
│ │ 1142536c-72ee-4e00-accf-55b3448ab1ae │
│ ├── input: id_column_name (string) = │
│ │ 48227a2a-a772-4653-8bed-92621f06fa6d │
│ ├── input: label_column_name (string) = │
│ │ bb52e30e-0b80-479f-8233-25cc069f1c5e │
│ ├── input: nodes (table) = │
│ │ 8ef2a5ea-031a-4370-93ac-c8d42dc1ea3b │
│ │ └── create.table │
│ │ └── input: file (file) = │
│ │ 8bb90738-ab11-4cfb-8ada-f43549cd6d20 │
│ │ └── import.local.file │
│ │ └── input: path (string) = │
│ │ 89ea8b61-7e84-42db-ac7f-0d3751c… │
│ ├── input: nodes_column_map (dict) = │
│ │ bcc75a9e-e20d-4758-a48a-81637975a8cc │
│ ├── input: source_column_name (string) = │
│ │ e30154e1-22eb-4efd-8969-a42113770a7d │
│ └── input: target_column_name (string) = │
│ aa33a9c1-d880-4054-a4a6-6f74bf4c8172 │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
As you can see, this basically describes what we've done so far, to get to this stage. You could now do a kiara explain data value:<value_id>
on each of the value ids you see here, if you were so inclined.
More¶
... to come ...