Skip to content

kiara.operations.sample

SampleValueModule

Base class for operations that take samples of data.

create_input_schema(self)

Abstract method to implement by child classes, returns a description of the input schema of this module.

If returning a dictionary of dictionaries, the format of the return value is as follows (items with '*' are optional):

{ "[input_field_name]: { "type": "[value_type]", "doc*": "[a description of this input]", "optional*': [boolean whether this input is optional or required (defaults to 'False')] "[other_input_field_name]: { "type: ... ... }

Source code in kiara/operations/sample.py
def create_input_schema(
    self,
) -> typing.Mapping[
    str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
]:

    input_name = self.get_value_type()
    if input_name == "any":
        input_name = "value_item"

    return {
        input_name: {
            "type": self.get_value_type(),
            "doc": "The value to sample.",
        },
        "sample_size": {
            "type": "integer",
            "doc": "The sample size.",
            "default": 10,
        },
    }

create_output_schema(self)

Abstract method to implement by child classes, returns a description of the output schema of this module.

If returning a dictionary of dictionaries, the format of the return value is as follows (items with '*' are optional):

{ "[output_field_name]: { "type": "[value_type]", "doc*": "[a description of this output]" "[other_input_field_name]: { "type: ... ... }

Source code in kiara/operations/sample.py
def create_output_schema(
    self,
) -> typing.Mapping[
    str, typing.Union[ValueSchema, typing.Mapping[str, typing.Any]]
]:
    return {
        "sampled_value": {"type": self.get_value_type(), "doc": "The sampled value"}
    }

get_value_type() classmethod

Return the value type for this sample module.

Source code in kiara/operations/sample.py
@classmethod
@abc.abstractmethod
def get_value_type(cls) -> str:
    """Return the value type for this sample module."""

retrieve_module_profiles(kiara) classmethod

Retrieve a collection of profiles (pre-set module configs) for this kiara module type.

This is used to automatically create generally useful operations (incl. their ids).

Source code in kiara/operations/sample.py
@classmethod
def retrieve_module_profiles(
    cls, kiara: Kiara
) -> typing.Mapping[str, typing.Union[typing.Mapping[str, typing.Any], Operation]]:

    all_metadata_profiles: typing.Dict[
        str, typing.Dict[str, typing.Dict[str, typing.Any]]
    ] = {}

    value_type: str = cls.get_value_type()

    if value_type not in kiara.type_mgmt.value_type_names:
        log_message(
            f"Ignoring sample operation for source type '{value_type}': type not available"
        )
        return {}

    for sample_type in cls.get_supported_sample_types():

        op_config = {
            "module_type": cls._module_type_id,  # type: ignore
            "module_config": {"sample_type": sample_type},
            "doc": f"Sample value of type '{value_type}' using method: {sample_type}.",
        }
        key = f"sample.{value_type}.{sample_type}"
        if key in all_metadata_profiles.keys():
            raise Exception(f"Duplicate profile key: {key}")
        all_metadata_profiles[key] = op_config

    return all_metadata_profiles

SampleValueModuleConfig pydantic-model

sample_type: str pydantic-field required

The sample method.

SampleValueOperationType

Operation type for sampling data of different types.

This is useful to reduce the size of some datasets in previews and test-runs, while adjusting parameters and the like. Operations of this type can implement very simple or complex ways to take samples of the data they are fed. The most important ones are sampling operations relating to tables and arrays, but it might also make sense to sample texts, image-sets and so on.

Operations of this type take the value to be sampled as input, as well as a sample size (integer), which can be, for example, a number of rows, or percentage of the whole dataset, depending on sample type.

Modules that implement sampling should inherit from SampleValueModule, and will get auto-registered with operation ids following this template: <VALUE_TYPE>.sample.<SAMPLE_TYPE_NAME>, where SAMPLE_TYPE_NAME is a descriptive name what will be sampled, or how sampling will be done.

get_operations_for_value_type(self, value_type)

Find all operations that serialize the specified type.

The result dict uses the serialization type as key, and the operation itself as value.

Source code in kiara/operations/sample.py
def get_operations_for_value_type(
    self, value_type: str
) -> typing.Dict[str, Operation]:
    """Find all operations that serialize the specified type.

    The result dict uses the serialization type as key, and the operation itself as value.
    """

    result: typing.Dict[str, Operation] = {}
    for o_id, op in self.operations.items():
        sample_op_module_cls: typing.Type[SampleValueModule] = op.module_cls  # type: ignore
        source_type = sample_op_module_cls.get_value_type()
        if source_type == value_type:
            target_type = op.module_config["sample_type"]
            if target_type in result.keys():
                raise Exception(
                    f"Multiple operations to sample '{source_type}' using {target_type}"
                )
            result[target_type] = op

    return result