Skip to content

kiara_modules.core.array

ArrayMetadataModule

Extract metadata from an 'array' value.

MapModule

Map a list of values into another list of values.

This module must be configured with the type (and optional) configuration of another kiara module. This 'child' module will then be used to compute the array items of the result.

create_input_schema(self)

Abstract method to implement by child classes, returns a description of the input schema of this module.

If returning a dictionary of dictionaries, the format of the return value is as follows (items with '*' are optional):

{ "[input_field_name]: { "type": "[value_type]", "doc*": "[a description of this input]", "optional*': [boolean whether this input is optional or required (defaults to 'False')] "[other_input_field_name]: { "type: ... ... }

Source code in core/array/__init__.py
def create_input_schema(
    self,
) -> typing.Mapping[
    str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
]:

    inputs: typing.Dict[
        str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
    ] = {
        "array": {
            "type": "array",
            "doc": "The array containing the values the filter is applied on.",
        }
    }
    for input_name, schema in self.child_module.input_schemas.items():
        assert input_name != "array"
        if input_name == self.module_input_name:
            continue
        inputs[input_name] = schema
    return inputs

create_output_schema(self)

Abstract method to implement by child classes, returns a description of the output schema of this module.

If returning a dictionary of dictionaries, the format of the return value is as follows (items with '*' are optional):

{ "[output_field_name]: { "type": "[value_type]", "doc*": "[a description of this output]" "[other_input_field_name]: { "type: ... ... }

Source code in core/array/__init__.py
def create_output_schema(
    self,
) -> typing.Mapping[
    str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
]:

    outputs = {
        "array": {
            "type": "array",
            "doc": "An array of equal length to the input array, containing the 'mapped' values.",
        }
    }
    return outputs

module_instance_doc(self)

Return documentation for this instance of the module.

If not overwritten, will return this class' method doc().

Source code in core/array/__init__.py
def module_instance_doc(self) -> str:

    config: MapModuleConfig = self.config  # type: ignore

    module_type = config.module_type
    module_config = config.module_config

    m = self._kiara.create_module(
        module_type=module_type, module_config=module_config
    )
    type_md = m.get_type_metadata()
    doc = type_md.documentation.full_doc
    link = type_md.context.get_url_for_reference("module_doc")
    if not link:
        link_str = f"``{module_type}``"
    else:
        link_str = f"[``{module_type}``]({link})"

    result = f"""Map the values of the input list onto a new list of the same length, using the {link_str} module."""

    if doc and doc != "-- n/a --":
        result = result + f"\n\n``{module_type}`` documentation:\n\n{doc}"
    return result

MapModuleConfig pydantic-model

input_name: str pydantic-field

The name of the input name of the module which will receive the items from our input array. Can be omitted if the configured module only has a single input.

module_config: Dict[str, Any] pydantic-field

The config for the kiara filter module.

module_type: str pydantic-field required

The name of the kiara module to use to filter the input data.

output_name: str pydantic-field

The name of the output name of the module which will receive the items from our input array. Can be omitted if the configured module only has a single output.

SampleArrayModule

Sample an array.

Samples are used to randomly select a subset of a dataset, which helps test queries and workflows on smaller versions of the original data, to adjust parameters before a full run.

get_value_type() classmethod

Return the value type for this sample module.

Source code in core/array/__init__.py
@classmethod
def get_value_type(cls) -> str:
    return "array"

StoreArrayTypeModule

Save an Arrow array to a file.

This module wraps the input array into an Arrow Table, and saves this table as a feather file.

The output of this module is a dictionary representing the configuration to be used with kira to re-assemble the array object from disk.

store_value(self, value, base_path)

Save the value, and return the load config needed to load it again.

Source code in core/array/__init__.py
def store_value(self, value: Value, base_path: str):

    import pyarrow as pa
    from pyarrow import feather

    array: pa.Array = value.get_value_data()
    # folder = inputs.get_value_data("folder_path")
    # file_name = inputs.get_value_data("file_name")
    # column_name = inputs.get_value_data("column_name")

    path = os.path.join(base_path, ARRAY_SAVE_FILE_NAME)
    if os.path.exists(path):
        raise KiaraProcessingException(
            f"Can't write file, path already exists: {path}"
        )

    os.makedirs(os.path.dirname(path))

    table = pa.Table.from_arrays([array], names=[ARRAY_SAVE_COLUM_NAME])
    feather.write_feather(table, path)

    load_config = {
        "module_type": "array.restore",
        "inputs": {
            "base_path": base_path,
            "rel_path": ARRAY_SAVE_FILE_NAME,
            "format": "feather",
            "column_name": ARRAY_SAVE_COLUM_NAME,
        },
        "output_name": "array",
    }
    return load_config