kiara_modules.core.array¶
        ArrayMetadataModule
¶
    Extract metadata from an 'array' value.
        MapModule
¶
    Map a list of values into another list of values.
This module must be configured with the type (and optional) configuration of another kiara module. This 'child' module will then be used to compute the array items of the result.
create_input_schema(self)
¶
    Abstract method to implement by child classes, returns a description of the input schema of this module.
If returning a dictionary of dictionaries, the format of the return value is as follows (items with '*' are optional):
{
      "[input_field_name]: {
          "type": "[value_type]",
          "doc*": "[a description of this input]",
          "optional*': [boolean whether this input is optional or required (defaults to 'False')]
      "[other_input_field_name]: {
          "type: ...
          ...
      }
Source code in core/array/__init__.py
          def create_input_schema(
    self,
) -> typing.Mapping[
    str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
]:
    inputs: typing.Dict[
        str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
    ] = {
        "array": {
            "type": "array",
            "doc": "The array containing the values the filter is applied on.",
        }
    }
    for input_name, schema in self.child_module.input_schemas.items():
        assert input_name != "array"
        if input_name == self.module_input_name:
            continue
        inputs[input_name] = schema
    return inputs
create_output_schema(self)
¶
    Abstract method to implement by child classes, returns a description of the output schema of this module.
If returning a dictionary of dictionaries, the format of the return value is as follows (items with '*' are optional):
{
      "[output_field_name]: {
          "type": "[value_type]",
          "doc*": "[a description of this output]"
      "[other_input_field_name]: {
          "type: ...
          ...
      }
Source code in core/array/__init__.py
          def create_output_schema(
    self,
) -> typing.Mapping[
    str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
]:
    outputs = {
        "array": {
            "type": "array",
            "doc": "An array of equal length to the input array, containing the 'mapped' values.",
        }
    }
    return outputs
module_instance_doc(self)
¶
    Return documentation for this instance of the module.
If not overwritten, will return this class' method doc().
Source code in core/array/__init__.py
          def module_instance_doc(self) -> str:
    config: MapModuleConfig = self.config  # type: ignore
    module_type = config.module_type
    module_config = config.module_config
    m = self._kiara.create_module(
        module_type=module_type, module_config=module_config
    )
    type_md = m.get_type_metadata()
    doc = type_md.documentation.full_doc
    link = type_md.context.get_url_for_reference("module_doc")
    if not link:
        link_str = f"``{module_type}``"
    else:
        link_str = f"[``{module_type}``]({link})"
    result = f"""Map the values of the input list onto a new list of the same length, using the {link_str} module."""
    if doc and doc != "-- n/a --":
        result = result + f"\n\n``{module_type}`` documentation:\n\n{doc}"
    return result
        MapModuleConfig
  
      pydantic-model
  
¶
    
input_name: str
  
      pydantic-field
  
¶
    The name of the input name of the module which will receive the items from our input array. Can be omitted if the configured module only has a single input.
module_config: Dict[str, Any]
  
      pydantic-field
  
¶
    The config for the kiara filter module.
module_type: str
  
      pydantic-field
      required
  
¶
    The name of the kiara module to use to filter the input data.
output_name: str
  
      pydantic-field
  
¶
    The name of the output name of the module which will receive the items from our input array. Can be omitted if the configured module only has a single output.
        SampleArrayModule
¶
    Sample an array.
Samples are used to randomly select a subset of a dataset, which helps test queries and workflows on smaller versions of the original data, to adjust parameters before a full run.
get_value_type()
  
      classmethod
  
¶
    Return the value type for this sample module.
Source code in core/array/__init__.py
          @classmethod
def get_value_type(cls) -> str:
    return "array"
        StoreArrayTypeModule
¶
    Save an Arrow array to a file.
This module wraps the input array into an Arrow Table, and saves this table as a feather file.
The output of this module is a dictionary representing the configuration to be used with kira to re-assemble the array object from disk.
store_value(self, value, base_path)
¶
    Save the value, and return the load config needed to load it again.
Source code in core/array/__init__.py
          def store_value(self, value: Value, base_path: str):
    import pyarrow as pa
    from pyarrow import feather
    array: pa.Array = value.get_value_data()
    # folder = inputs.get_value_data("folder_path")
    # file_name = inputs.get_value_data("file_name")
    # column_name = inputs.get_value_data("column_name")
    path = os.path.join(base_path, ARRAY_SAVE_FILE_NAME)
    if os.path.exists(path):
        raise KiaraProcessingException(
            f"Can't write file, path already exists: {path}"
        )
    os.makedirs(os.path.dirname(path))
    table = pa.Table.from_arrays([array], names=[ARRAY_SAVE_COLUM_NAME])
    feather.write_feather(table, path)
    load_config = {
        "module_type": "array.restore",
        "inputs": {
            "base_path": base_path,
            "rel_path": ARRAY_SAVE_FILE_NAME,
            "format": "feather",
            "column_name": ARRAY_SAVE_COLUM_NAME,
        },
        "output_name": "array",
    }
    return load_config