Skip to content

Writing your own kiara module - the basics

Preparation

Check out the 'kiara getting started guide'

If you haven't already, it would make sense for you to go through the kiara getting started guide. This will give you a good overview of the relevant kiara features, and how the module(s) you are going to write fits in.

Setting up development environment

To get going, we need a Python virtual environment in which to develop. We'll be using conda for that here, but this will work for normal virtual environments as well. As a first step, install conda (if you haven't already). Then:

conda create -n my_kiara_module python=3.9
conda activate my_kiara_module
conda install -c conda-forge mamba   # this is optional, but makes everything install related much faster, if you don't use it, replace 'mamba' with 'conda' below
mamba install -c conda-forge -c dharpa kiara kiara_plugin.core_types kiara_plugin.tabular

Note

For Linux, if you experience errors, you might or might not have to also execute: mamba update -c conda-forge libstdcxx-ng.

After this, the kiara command-line application should be available to you, you can test whether that works, for example via kiara operation list.

Creating a kiara plugin project

For this tutorial, we'll use a project template to create a bare-bones kiara plugin project, which we will augment with our own module(s).

First we need to install the cruft conda package, which we will use to create our project stub:

mamba install -c conda-forge cruft

Now, we run cruft against our template git repo, feel free to change any of the answers to the questions you'll be asked:

cruft create https://github.com/DHARPA-Project/kiara_plugin.develop.git

full_name []: Markus Binsteiner
email []: markus@frkl.io
project_name [my-kiara-plugin]: my-kiara-module
project_slug [my_kiara_module]: my_kiara_module
project_short_description [my-kiara-module]: A kiara plugin project for learning to create kiara modules.
github_user [DHARPA-Project]:
anaconda_user [dharpa]:

This should have created a new folder, named kiara_plugin.my_kiara_module. Next, we initialize and install the new plugin Python project into our conda environment:

cd kiara_plugin.my_kiara_module
git init
git checkout -b develop
pip install -e .

Note

TODO: explain what happened here?

Once this is done, you should see a new operation called my_kiara_module.example:

kiara operation list example
╭─ Filtered operations ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   Id                        Type(s)   Description                                                                                        │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   my_kiara_module.example             A very simple example module; concatenate two strings.                                             │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Note

The example string token at the end of the above command filters the output to operations that match the token.

This module comes as example code with the project template, and is located in the modules/__init__.py Python file. It only serves as an example and blueprint for your own modules, and you can delete the module class within the file if you wish.

Pre-loading a table dataset

In our tutorial we'll create a module to filter a table. In order to do this we'll need to pre-seed our kiara data store with a tabular dataset. Here is the command to run (with the project root as our working directory):

kiara run --save table=journal_nodes_table import.table.from.local_file_path path=examples/data/journals/JournalNodes1902.csv
╭─ Results ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                              │
│   field           data_type   value                                                                                                                                                                                                          │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   imported_file   file        Id,Label,JournalType,City,CountryNetworkTime,PresentDayCountry,Latitude,Longitude,Language                                                                                                                    │
│                               75,Psychiatrische en neurologische bladen,specialized: psychiatry and neurology,Amsterdam,Netherlands,Netherlands,52.366667,4.9,Dutch                                                                          │
│                               36,The American Journal of Insanity,specialized: psychiatry and neurology,Baltimore,United States,United States,39.289444,-76.615278,English                                                                   │
│                               208,The American Journal of Psychology,specialized: psychology,Baltimore,United States,United States,39.289444,-76.615278,English                                                                              │
│                               295,Die Krankenpflege,specialized: therapy,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                                    │
│                               296,Die deutsche Klinik am Eingange des zwanzigsten Jahrhunderts,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                             │
│                               300,Therapeutische Monatshefte,specialized: therapy,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                           │
│                               1,Allgemeine Zeitschrift für Psychiatrie,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German                                                                                │
│                               7,Archiv für Psychiatrie und Nervenkrankheiten,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German                                                                          │
│                               10,Berliner klinische Wochenschrift,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                          │
│                               13,Charité Annalen,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                                           │
│                               21,Monatsschrift für Psychiatrie und Neurologie,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German                                                                         │
│                               29,Virchows Archiv,"specialized: anatomy, physiology and pathology",Berlin,German Empire,Germany,52.52,13.405,German                                                                                           │
│                               31,Zeitschrift für pädagogische Psychologie und Pathologie,specialized: psychology and pedagogy,Berlin,German Empire,Germany,52.52,13.405,German                                                               │
│                               42,Vierteljahrsschrift für gerichtliche Medizin und öffentliches Sanitätswesen,"specialized: anthropology, criminology and forensics",Berlin,German Empire,Germany,52.52,13.405,German                         │
│                               47,Centralblatt für Nervenheilkunde und Psychiatrie,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German                                                                     │
│                               50,Russische medicinische Rundschau,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                          │
│                               76,Deutsche Aerzte-Zeitung,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                                   │
│                               87,Monatsschrift für Geburtshülfe und Gynäkologie,specialized: gynecology,Berlin,German Empire,Germany,52.52,13.405,German                                                                                     │
│                               108,Archiv für klinische Chirurgie,specialized: surgery,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                       │
│                               113,Zeitschrift für klinische Medicin,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                        │
│                               159,Deutsche militärärztliche Zeitschrift,specialized: military medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                      │
│                               162,Jahresbericht über die Leistungen und Fortschritte auf dem Gebiete der Neurologie und Psychiatrie,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German                   │
│                               192,Ärztliche Sachverständigen-Zeitung,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                       │
│                               198,Zeitschrift für die Behandlung Schwachsinniger und Epileptischer,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German                                                    │
│                               258,Der Pfarrbote,news media,Berlin,German Empire,Germany,52.52,13.405,German                                                                                                                                  │
│                               71,Correspondenz-Blatt für Schweizer Aerzte,general medicine,Bern,Switzerland,Switzerland,46.948056,7.4475,German                                                                                              │
│                               6,Archiv für mikroskopische Anatomie,"specialized: anatomy, physiology and pathology",Bonn,German Empire,Germany,50.733333,7.1,German                                                                          │
│                               203,The Journal of Abnormal Psychology,specialized: psychology,Boston,United States,United States,42.358056,-71.063611,English                                                                                 │
│                               273,"Correspondenz-Blatt der Deutschen Gesellschaft für Anthropologie, Ethnologie und Urgeschichte","specialized: anthropology, criminology and forensics",Braunschweig,German                                 │
│                               Empire,Germany,52.266667,10.516667,German                                                                                                                                                                      │
│                               303,Policlinique de Bruxelles,general medicine,Brussels,Belgium,Belgium,50.85,4.35,French                                                                                                                      │
│                               306,Annales de la Société Belge de Neurologie,specialized: psychiatry and neurology,Brussels,Belgium,Belgium,50.85,4.35,French                                                                                 │
│                               19,Journal de neurologie,specialized: psychiatry and neurology,Brussels,Belgium,Belgium,50.85,4.35,French                                                                                                      │
│                               25,"Revue internationale d'électrothérapie, de physiologie, de médecine, de chirurgie, d'obstétrique, de thérapeutique, de chimie et de pharmacie",general                                                     │
│                               medicine,Brussels,Belgium,Belgium,50.85,4.35,French                                                                                                                                                            │
│                               35,Bulletin de la Société de Médecine Mentale de Belgique,specialized: psychiatry and neurology,Brussels,Belgium,Belgium,50.85,4.35,French                                                                     │
│                               ...                                                                                                                                                                                                            │
│                                                                                                                                                                                                                                              │
│                               ...                                                                                                                                                                                                            │
│   table           table                                                                                                                                                                                                                      │
│                                 Id    Label                                              JournalType                                         City        CountryNetworkTime        PresentDayCountry   Latitude    Longitude    Language     │
│                                ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                                 75    Psychiatrische en neurologische bladen             specialized: psychiatry and neurology               Amsterdam   Netherlands               Netherlands         52.366667   4.9          Dutch        │
│                                 36    The American Journal of Insanity                   specialized: psychiatry and neurology               Baltimore   United States             United States       39.289444   -76.615278   English      │
│                                 208   The American Journal of Psychology                 specialized: psychology                             Baltimore   United States             United States       39.289444   -76.615278   English      │
│                                 295   Die Krankenpflege                                  specialized: therapy                                Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 296   Die deutsche Klinik am Eingange des zwanzigsten    general medicine                                    Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 300   Therapeutische Monatshefte                         specialized: therapy                                Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 1     Allgemeine Zeitschrift für Psychiatrie             specialized: psychiatry and neurology               Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 7     Archiv für Psychiatrie und Nervenkrankheiten       specialized: psychiatry and neurology               Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 10    Berliner klinische Wochenschrift                   general medicine                                    Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 13    Charité Annalen                                    general medicine                                    Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 21    Monatsschrift für Psychiatrie und Neurologie       specialized: psychiatry and neurology               Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 29    Virchows Archiv                                    specialized: anatomy, physiology and pathology      Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 31    Zeitschrift für pädagogische Psychologie und Pat   specialized: psychology and pedagogy                Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 42    Vierteljahrsschrift für gerichtliche Medizin und   specialized: anthropology, criminology and forens   Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 47    Centralblatt für Nervenheilkunde und Psychiatrie   specialized: psychiatry and neurology               Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 50    Russische medicinische Rundschau                   general medicine                                    Berlin      German Empire             Germany             52.52       13.405       German       │
│                                 ...   ...                                                ...                                                 ...         ...                       ...                 ...         ...          ...          │
│                                 ...   ...                                                ...                                                 ...         ...                       ...                 ...         ...          ...          │
│                                 277   L'arte medica                                      general medicine                                    Turin       Italy                     Italy               45.079167   7.676111     Italian      │
│                                 288   Allgemeine österreichische Gerichts-Zeitung        specialized: anthropology, criminology and forens   Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 18    Jahrbücher für Psychiatrie                         specialized: psychiatry and neurology               Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 30    Wiener klinische Rundschau                         general medicine                                    Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 44    Wiener klinische Wochenschrift                     general medicine                                    Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 45    Wiener medizinische Wochenschrift                  general medicine                                    Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 72    Wiener medizinische Presse                         general medicine                                    Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 81    Monatsschrift für Gesundheitspflege                general medicine                                    Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 93    Klinisch-therapeutische Wochenschrift              general medicine                                    Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 151   Medicinisch-chirurgisches Centralblatt             specialized: surgery                                Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 199   Der Militärazt                                     specialized: military medicine                      Vienna      Austro-Hungarian Empire   Austria             48.2        16.366667    German       │
│                                 261   Медицинская беседа                                 general medicine                                    Voronezh    Russian Empire            Russia              51.671667   39.210556    Russian      │
│                                 77    Medycyna                                           general medicine                                    Warsaw      Russian Empire            Poland              52.233333   21.016667    Polish       │
│                                 150   Kronika Lekarska                                   general medicine                                    Warsaw      Russian Empire            Poland              52.233333   21.016667    Polish       │
│                                 86    Grenzfragen des Nerven- und Seelenlebens           specialized: psychiatry and neurology               Wiesbaden   German Empire             Germany             50.0825     8.24         German       │
│                                 206   Ergebnisse der Allgemeinen Pathologie und Pathol   specialized: anatomy, physiology and pathology      Wiesbaden   German Empire             Germany             50.0825     8.24         German       │
│                                                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭─ Stored result values ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                              │
│   field           data type   stored id                              alias(es)                                                                                                                                                               │
│  ────────────────────────────────────────────────────────────────────────────────────────                                                                                                                                                    │
│   imported_file   file        64dbc562-b5ed-4d09-89aa-d8d7d41bd3b3                                                                                                                                                                           │
│   table           table       f4bda52f-5dc1-4441-adfd-109dbdf357d0   journal_nodes_table                                                                                                                                                     │
│                                                                                                                                                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

This should have created an item with alias journal_nodes_table in the kiara data store, which you can confirm with kiara data list.

Writing the kiara module

Ok, let's get started and create a kiara module that filters a table, using different filter criteria.

Module skeleton

In most cases you'd delete the example module mentioned above, and create your module in the Python file where the example module was, or in a new Python file in the "modules" folder. For the purpose of this tutorial, we can just leave the example module in place, because it can serve as a quick reference for our own module. Use the editor of your choice, and paste the following text below the existing code into modules/__init__.py:

from kiara import KiaraModule

class TutorialModule(KiaraModule):

    def create_inputs_schema(self):
        return {
            "table_input": {
                "type": "table"
            }
        }

    def create_outputs_schema(self):
        return {
            "table_output": {
                "type": "table"
            }
        }

    def process(self, inputs, outputs) -> None:
        pass

This module skeleton describes a kiara module that takes a dataset of type table as input (using table_input as input field name), and produces another table dataset as output (accordingly, using table_output as output field name). For your own modules, you'd probably use the field name table for both input and outputs, but in this tutorial we'll use the longer forms, to avoid any confusion.

On the next kiara run, the new module should be picked up by the operation list command:

kiara operation list tutorial_module
╭─ Filtered operations ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   Id                                                             Type(s)   Description                                                   │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   kiara_plugin.my_kiara_module.my_kiara_module.tutorial_module             -- n/a --                                                     │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The id of the module was autogenerated from the full Python path of its class: kiara_plugin.my_kiara_module.my_kiara_module.tutorial_module.

Module id and description

In most cases, we don't want such a long and unwieldy module name. We can assign our own, custom and meaningful id by setting the _module_type_name class attribute. In addition, we will want to add some documentation about the module and its functionality that is displayed to the user. For this, we use a normal Python doc string on the Python class body. For the purpose of this tutorial, we'll only add a single sentence, but in most cases you'll want to have a multi-paragraph markdown text here. So, taking all that into account, edit the module code to include:

...
...
class TutorialModule(KiaraModule):
    """Filter a table."""

    _module_type_name = "filter.table"

    def create_inputs_schema(self):
        return {
...
...

The output for our new module in the operation list is much prettier now:

kiara operation list filter
╭─ Filtered operations ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   Id                            Type(s)   Description                                                                                    │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   filter.table                            Filter a table.                                                                                │
│   string_filter.tokens          filter    -- n/a --                                                                                      │
│   table_filter.drop_columns     filter    -- n/a --                                                                                      │
│   table_filter.select_columns   filter    -- n/a --                                                                                      │
│   table_filter.select_rows      filter    -- n/a --                                                                                      │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

We can also let kiara tell us about what it knows about the operation itself:

kiara operation explain filter.table
╭─ Operation: filter.table ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   Documentation   Filter a table.                                                                                                        │
│                                                                                                                                          │
│   Inputs                                                                                                                                 │
│                     field name    type    description                                                    Required   Default              │
│                    ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                     table_input   table   -- n/a --                                                      yes        -- no default --     │
│                                                                                                                                          │
│                                                                                                                                          │
│   Outputs                                                                                                                                │
│                     field name     type    description                                                                                   │
│                    ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                     table_output   table   -- n/a --                                                                                     │
│                                                                                                                                          │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Input/output field documentation

As you can see in the explain output above, the information to the user is still a bit sparse. In most cases, we'll want to have some information about the input(s) the user is supposed to provide. Same for what the outputs actually mean. In both cases, we can add a doc attribute to each input and output field.

    ...
    ...
    def create_inputs_schema(self):
        return {
            "table_input": {
                "type": "table",
                "doc": "The table to filter."
            }
        }

    def create_outputs_schema(self):
        return {
            "table_output": {
                "type": "table",
                "doc": "The filtered table."
            }
        }
    ...
    ...

Run the explain command again, to check what kiara thinks of our module now:

kiara operation explain filter.table
╭─ Operation: filter.table ──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   Documentation   Filter a table.                                                                                                        │
│                                                                                                                                          │
│   Inputs                                                                                                                                 │
│                     field name    type    description                                                    Required   Default              │
│                    ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                     table_input   table   The table to filter.                                           yes        -- no default --     │
│                                                                                                                                          │
│                                                                                                                                          │
│   Outputs                                                                                                                                │
│                     field name     type    description                                                                                   │
│                    ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                     table_output   table   The filtered table.                                                                           │
│                                                                                                                                          │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Processing the inputs

Specifying the inputs (and outputs) is an important part of designing your module, it's basically the module's 'public API', and you want to avoid changing it (too much; or at all) as your module evolves over time. But of course, the actual processing is where the interesting stuff happens. In kiara, that is the process method of every module. The arguments to this method are called inputs and outputs, which are basically dicts that use the field names specified in the create_inputs_schema / create_outputs_schema as keys, and Python objects of class Value as values.

One thing to understand is that a Value object is not the same as the actual data. Instead, it's a reference to it (a means to retrieve it), and it also contains metadata about its provenance (pedigree/lineage) and other properties.

This is the signature of the process method, including type hints (which we will omit after this):

from kiara.models.values.value import ValueMap, ValueMapWritable

    def process(inputs: ValueMap, outputs: ValueMapWritable):
        ...
        ...

The inputs and outputs arguments to the process method are of type ValueMap; the two main methods to access input data are:

  • inputs.get_value_obj([field_name]): retrieve the (wrapper) Value object for a field
  • inputs.get_value_data([field_name]): retrieve the data object for a field

In addition, you can retrieve the data object via the value wrapper:

value = inputs.get_value_obj("field_name")
data = value.data
The class/type of the data depends on the data type of the value, so you'll have to consult the documentation about what to expect. TODO: expand on this, with a bit of example code/cli commands

The important methods to set an output is:

  • outputs.set_value(field_name, result_data): set a single output field
  • outputs.set_values(field_name_1=result_data_1, field_name_2=result_data_2, ...): set multiple result values at once

All that out of the way, let's get started implementing our table filter. We'll do it in stages, so hopefully we can cover all the important aspects in this tutorial in a way that makes intuitive sense.

To that end, let's write some code that does ...nothing. Our first iteration of our module will take the input table, and immediately set it as output:

def process(self, inputs, outputs):

    table_obj = inputs.get_value_obj("table_input")

    # some debug output is usually useful while developing. Something like:
    print(f"Filter module, table input: {table_obj}")
    print("Table data:")
    print(table_obj.data)

    outputs.set_value("table_output", table_obj)

If we run our module in this state, we should see our debug output, as well as the resulting table (which will be the unmodified input):

kiara run filter.table table_input=alias:journal_nodes_table
Filter module, table input value: Value(id=f4bda52f-5dc1-4441-adfd-109dbdf357d0, type=table, status=set, initialized=True optional=False)
Table data instance: KiaraTable(model_id=-- n/a --, category=kiara_table, fields=[data_path])

╭─ Result ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   field          data_type   value                                                                                                       │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   table_output   table                                                                                                                   │
│                                Id    Label        JournalTyp   City        CountryNe   PresentDay   Latitude    Longitude   Language     │
│                               ───────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                                75    Psychiatri   specialize   Amsterdam   Netherlan   Netherland   52.366667   4.9         Dutch        │
│                                36    The Americ   specialize   Baltimore   United St   United Sta   39.289444   -76.61527   English      │
│                                208   The Americ   specialize   Baltimore   United St   United Sta   39.289444   -76.61527   English      │
│                                295   Die Kranke   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                296   Die deutsc   general me   Berlin      German Em   Germany      52.52       13.405      German       │
│                                300   Therapeuti   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                1     Allgemeine   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                7     Archiv für   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                10    Berliner k   general me   Berlin      German Em   Germany      52.52       13.405      German       │
│                                13    Charité An   general me   Berlin      German Em   Germany      52.52       13.405      German       │
│                                21    Monatsschr   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                29    Virchows A   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                31    Zeitschrif   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                42    Vierteljah   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                47    Centralbla   specialize   Berlin      German Em   Germany      52.52       13.405      German       │
│                                50    Russische    general me   Berlin      German Em   Germany      52.52       13.405      German       │
│                                ...   ...          ...          ...         ...         ...          ...         ...         ...          │
│                                ...   ...          ...          ...         ...         ...          ...         ...         ...          │
│                                277   L'arte med   general me   Turin       Italy       Italy        45.079167   7.676111    Italian      │
│                                288   Allgemeine   specialize   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                18    Jahrbücher   specialize   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                30    Wiener kli   general me   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                44    Wiener kli   general me   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                45    Wiener med   general me   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                72    Wiener med   general me   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                81    Monatsschr   general me   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                93    Klinisch-t   general me   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                151   Medicinisc   specialize   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                199   Der Militä   specialize   Vienna      Austro-Hu   Austria      48.2        16.366667   German       │
│                                261   Медицинска   general me   Voronezh    Russian E   Russia       51.671667   39.210556   Russian      │
│                                77    Medycyna     general me   Warsaw      Russian E   Poland       52.233333   21.016667   Polish       │
│                                150   Kronika Le   general me   Warsaw      Russian E   Poland       52.233333   21.016667   Polish       │
│                                86    Grenzfrage   specialize   Wiesbaden   German Em   Germany      50.0825     8.24        German       │
│                                206   Ergebnisse   specialize   Wiesbaden   German Em   Germany      50.0825     8.24        German       │
│                                                                                                                                          │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Now it's time to drill a bit deeper into our input table, and figure out how to access the information it contains. kiara wraps data that shares some schema/structure into so-called 'data types'. You can access a list of the data types that are available in your current kiara environment with the data-type list sub-command:

kiara data-type list
╭─ Available data types ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   type name      type lineage   (qualifier) profiles   description                                                                       │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   any            -- n/a --      -- n/a --              'Any' type, the parent type for most other types.                                 │
│                                                                                                                                          │
│   array          -- n/a --      -- n/a --              An array, in most cases used as a column within a table.                          │
│                                                                                                                                          │
│   boolean        -- n/a --      -- n/a --              A boolean.                                                                        │
│                                                                                                                                          │
│   bytes          -- n/a --      -- n/a --              An array of bytes.                                                                │
│                                                                                                                                          │
│   database       -- n/a --      -- n/a --              A database, containing one or several tables.                                     │
│                                                                                                                                          │
│   date           -- n/a --      -- n/a --              A date.                                                                           │
│                                                                                                                                          │
│   dict           -- n/a --      -- n/a --              A dictionary.                                                                     │
│                                                                                                                                          │
│   file           -- n/a --      -- n/a --              A file.                                                                           │
│                                                                                                                                          │
│   file_bundle    -- n/a --      -- n/a --              A bundle of files (like a folder, zip archive, etc.).                             │
│                                                                                                                                          │
│   float          -- n/a --      -- n/a --              A float.                                                                          │
│                                                                                                                                          │
│   integer        -- n/a --      -- n/a --              An integer.                                                                       │
│                                                                                                                                          │
│   list           -- n/a --      -- n/a --              A list.                                                                           │
│                                                                                                                                          │
│   network_data   -- n/a --      -- n/a --              Data that can be assembled into a graph.                                          │
│                                                                                                                                          │
│   string         -- n/a --      -- n/a --              A string.                                                                         │
│                                                                                                                                          │
│   table          -- n/a --      -- n/a --              Tabular data (table, spreadsheet, data_frame, what have you).                     │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

To find out more about a specific data type, you can use data-type explain:

kiara data-type explain table
╭─ Data type: table ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   type_name     table                                                                                                                    │
│   type_config   {}                                                                                                                       │
│                                                                                                                                          │
│ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│                                                                                                                                          │
│  lineage              table                                                                                                              │
│                       any                                                                                                                │
│  qualifier profile(s) -- n/a --                                                                                                          │
│  Documentation                                                                                                                           │
│                         Tabular data (table, spreadsheet, data_frame, what have you).                                                    │
│                                                                                                                                          │
│                         The table data is organized in sets of columns (arrays of data of the same type), with each column having a      │
│                         string identifier.                                                                                               │
│                                                                                                                                          │
│                         kiara uses an instance of the [KiaraTable][kiara_plugin.tabular.models.table.KiaraTable] class to manage the     │
│                         table data, which let's developers access it in different formats (Apache Arrow Table, Pandas dataframe,         │
│                         Python dict of lists, more to follow...).                                                                        │
│                                                                                                                                          │
│                         Please consult the API doc of the KiaraTable class for more information about how to access and query the        │
│                         data:                                                                                                            │
│                                                                                                                                          │
│                          • KiaraTable API doc                                                                                            │
│                                                                                                                                          │
│                         Internally, the data is stored in Apache Feather format -- both in memory and on disk when saved, which          │
│                         enables some advanced usage to preserve memory and compute overhead.                                             │
│                                                                                                                                          │
│  Author(s)                                                                                                                               │
│                         Markus Binsteiner   markus@frkl.io                                                                               │
│                                                                                                                                          │
│  Context                                                                                                                                 │
│                         Tags         tabular                                                                                             │
│                         Labels       package: kiara_plugin.tabular                                                                       │
│                         References   source_repo: https://github.com/DHARPA-Project/kiara_plugin.tabular                                 │
│                                      documentation: https://DHARPA-Project.github.io/kiara_plugin.tabular/                               │
│                                                                                                                                          │
│  Python class                                                                                                                            │
│                         python_class_name    TableType                                                                                   │
│                         python_module_name   kiara_plugin.tabular.data_types.table                                                       │
│                         full_name            kiara_plugin.tabular.data_types.table.TableType                                             │
│                                                                                                                                          │
│  Config class                                                                                                                            │
│                         python_class_name    DataTypeConfig                                                                              │
│                         python_module_name   kiara.data_types                                                                            │
│                         full_name            kiara.data_types.DataTypeConfig                                                             │
│                                                                                                                                          │
│  Value class                                                                                                                             │
│                         python_class_name    KiaraTable                                                                                  │
│                         python_module_name   kiara_plugin.tabular.models.table                                                           │
│                         full_name            kiara_plugin.tabular.models.table.KiaraTable                                                │
│                                                                                                                                          │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Reading this, and following some of the links included. shows us that we can retrieve the table data as a Pandas dataframe using the to_pandas() method. As the documentation states, this loads the whole data into memory, which is something we should try to avoid, but in a lot of cases (esp. if we are dealing with sub-hundreds-of-megabytes-sized data) it's a perfectly acceptable approach. So, let's do this and use our existing knowledge of Pandas, and retrieve a list of column names from the table the user provided, print out that information debug-style, using print:

def process(self, inputs, outputs) -> None:

    table_obj = inputs.get_value_obj("table_input")

    print(f"Filter module, table input value: {table_obj}")
    print(f"Table data instance: {table_obj.data}")

    pandas_df = table_obj.data.to_pandas()
    print(f"Column names: {pandas_df.columns}")

    outputs.set_value("table_output", table_obj)

Again, let's run and see what's what (this time surpressing the result output we don't need right now, using --output silent):

kiara run --output silent filter.table table_input=alias:journal_nodes_table
Filter module, table input value: Value(id=f4bda52f-5dc1-4441-adfd-109dbdf357d0, type=table, status=set, initialized=True optional=False)
Table data instance: KiaraTable(model_id=-- n/a --, category=kiara_table, fields=[data_path])
Column names: Index(['Id', 'Label', 'JournalType', 'City', 'CountryNetworkTime',
       'PresentDayCountry', 'Latitude', 'Longitude', 'Language'],
      dtype='object')

Ok, now we filter. Initially, let's say our module accepts only tables that contain a 'City' column, and returns all rows that have 'Berlin' as a value there:

def process(self, inputs, outputs) -> None:

    from kiara.exceptions import KiaraProcessingException

    table_obj = inputs.get_value_obj("table_input")
    pandas_df = table_obj.data.to_pandas()

    column_names = pandas_df.columns
    if "City" not in column_names:
        raise KiaraProcessingException("Invalid table, does not contain a column named 'City'.")

    berlin_df = pandas_df.loc[pandas_df['City'] == "Berlin"]
    outputs.set_value("table_output", berlin_df)

And again, we run our module using our example dataset, and now we actually get something that is filtered:

kiara run filter.table table_input=alias:journal_nodes_table
╭─ Result ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   field          data_type   value                                                                                                       │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   table_output   table                                                                                                                   │
│                                Id    Label      JournalT   City     CountryN   PresentD   Latitude   Longitude   Language   __index_     │
│                               ───────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                                295   Die Kran   speciali   Berlin   German E   Germany    52.52      13.405      German     3            │
│                                296   Die deut   general    Berlin   German E   Germany    52.52      13.405      German     4            │
│                                300   Therapeu   speciali   Berlin   German E   Germany    52.52      13.405      German     5            │
│                                1     Allgemei   speciali   Berlin   German E   Germany    52.52      13.405      German     6            │
│                                7     Archiv f   speciali   Berlin   German E   Germany    52.52      13.405      German     7            │
│                                10    Berliner   general    Berlin   German E   Germany    52.52      13.405      German     8            │
│                                13    Charité    general    Berlin   German E   Germany    52.52      13.405      German     9            │
│                                21    Monatssc   speciali   Berlin   German E   Germany    52.52      13.405      German     10           │
│                                29    Virchows   speciali   Berlin   German E   Germany    52.52      13.405      German     11           │
│                                31    Zeitschr   speciali   Berlin   German E   Germany    52.52      13.405      German     12           │
│                                42    Viertelj   speciali   Berlin   German E   Germany    52.52      13.405      German     13           │
│                                47    Centralb   speciali   Berlin   German E   Germany    52.52      13.405      German     14           │
│                                50    Russisch   general    Berlin   German E   Germany    52.52      13.405      German     15           │
│                                76    Deutsche   general    Berlin   German E   Germany    52.52      13.405      German     16           │
│                                87    Monatssc   speciali   Berlin   German E   Germany    52.52      13.405      German     17           │
│                                108   Archiv f   speciali   Berlin   German E   Germany    52.52      13.405      German     18           │
│                                113   Zeitschr   general    Berlin   German E   Germany    52.52      13.405      German     19           │
│                                159   Deutsche   speciali   Berlin   German E   Germany    52.52      13.405      German     20           │
│                                162   Jahresbe   speciali   Berlin   German E   Germany    52.52      13.405      German     21           │
│                                192   Ärztlich   general    Berlin   German E   Germany    52.52      13.405      German     22           │
│                                198   Zeitschr   speciali   Berlin   German E   Germany    52.52      13.405      German     23           │
│                                258   Der Pfar   news med   Berlin   German E   Germany    52.52      13.405      German     24           │
│                                                                                                                                          │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Of course, a module like this is only of very limited value, because the tables it accepts as inputs must contain a column named 'City', and it only filters out a hardcoded string. Ideally, we'd want the user to provide both inputs, along with the table to filter. Let's add those module inputs, and adjust the processing method accordingly:

    def create_inputs_schema(self):
        return {
            "table_input": {
                "type": "table",
                "doc": "The table to filter."
            },
            "column_name": {
                "type": "string",
                "doc": "The column containing the element to use as filter.",
                "default": "City"
            },
            "filter_string": {
                "type": "string",
                "doc": "The string to use as filter."
            }
        }

    def process(self, inputs, outputs) -> None:

        from kiara.exceptions import KiaraProcessingException

        table_obj = inputs.get_value_obj("table_input")
        column_name = inputs.get_value_data("column_name")
        filter_string = inputs.get_value_data("filter_string")

        pandas_df = table_obj.data.to_pandas()

        column_names = pandas_df.columns
        if column_name not in column_names:
            raise KiaraProcessingException(f"Invalid table, does not contain a column named '{column_name}'. Available column names: {', '.join(column_names)}.")

        berlin_df = pandas_df.loc[pandas_df[column_name] == filter_string]
        outputs.set_value("table_output", berlin_df)

In this example, I've used a default value for the column_name input ('City'). This probably doesn't make a whole lot of sense, but it shows how to set defaults for input fields, which in a lot of cases does make sense. We can try to run this command using a missing filter_string argument, which shows off nicely what the kiara command-line interface has to say about something like this:

kiara run filter.table table_input=alias:journal_nodes_table
╭─ Run info: filter.table ───────────────────────────────────────────────────╮
│                                                                              │
│ Can't run operation: invalid or insufficient input(s)                        │
│                                                                              │
│ ──────────────────────────────────────────────────────────────────────────── │
│                                                                              │
│ Operation: filter.table                                                    │
│                                                                              │
│ Filter a table.                                                              │
│                                                                              │
│ Inputs:                                                                      │
│                                                                              │
│   field name      status    type     description        required   default   │
│  ──────────────────────────────────────────────────────────────────────────  │
│   column_name     valid     string   The column         no         City      │
│                                      containing the                          │
│                                      element to use                          │
│                                      as filter.                              │
│   filter_string   not set   string   The string to      yes                  │
│                                      use as filter.                          │
│   table_input     valid     table    The table to       yes                  │
│                                      filter.                                 │
│                                                                              │
│                                                                              │
│ Outputs:                                                                     │
│                                                                              │
│   field name     type    description                                         │
│  ──────────────────────────────────────────────────────────────────────────  │
│   table_output   table   The filtered table.                                 │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

As you can see, kiara complains about the missing input, but has used 'City' as default for the missing column_name input, and therefor is ok with the user not providing this. Ok, one more time, let's look for 'Amsterdam':

kiara run filter.table table_input=alias:journal_nodes_table filter_string=Amsterdam
╭─ Result ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                          │
│   field          data_type   value                                                                                                       │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   table_output   table                                                                                                                   │
│                                Id   Label      JournalT   City       CountryN   PresentD   Latitude   Longitud   Language   __index_     │
│                               ───────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                                75   Psychiat   speciali   Amsterda   Netherla   Netherla   52.36666   4.9        Dutch      0            │
│                                                                                                                                          │
│                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

This should give you a good basis to work on your own kiara module(s). Stay tuned for part II of this tutorial!