Skip to content

kiara_modules.core.metadata_schemas

This module contains the metadata models that are used in the kiara_modules.core package.

Metadata models are convenience wrappers that make it easier for kiara to find, create, manage and version metadata that is attached to data, as well as kiara modules. It is possible to register metadata using a JSON schema string, but it is recommended to create a metadata model, because it is much easier overall.

Metadata models must be a sub-class of [kiara.metadata.MetadataModel][].

ArrayMetadata pydantic-model

Describes properties fo the 'array' type.

length: int pydantic-field required

The number of elements the array contains.

size: int pydantic-field required

Total number of bytes consumed by the elements of the array.

ColumnSchema pydantic-model

Describes properties of a single column of the 'table' data type.

arrow_type_id: int pydantic-field required

The arrow type id of the column.

arrow_type_name: str pydantic-field required

The arrow type name of the column.

metadata: Dict[str, Any] pydantic-field

Other metadata for the column.

FolderImportConfig pydantic-model

exclude_dirs: List[str] pydantic-field

A list of strings, exclude all folders whose name ends with that string.

exclude_files: List[str] pydantic-field

A list of strings, exclude all files that match those (takes precedence over 'include_files'). Defaults to: ['.DS_Store'].

include_files: List[str] pydantic-field

A list of strings, include all files where the filename ends with that string.

KiaraFile pydantic-model

Describes properties for the 'file' value type.

import_time: str pydantic-field required

The time when the file was imported.

is_onboarded: bool pydantic-field

Whether the file is imported into the kiara data store.

mime_type: str pydantic-field required

The mime type of the file.

orig_filename: str pydantic-field required

The original filename of this file at the time of import.

orig_path: str pydantic-field

The original path to this file at the time of import.

path: str pydantic-field required

The archive path of the file.

size: int pydantic-field required

The size of the file.

__repr__(self) special

Return repr(self).

Source code in core/metadata_schemas.py
def __repr__(self):
    return f"FileMetadata(name={self.file_name})"

__str__(self) special

Return str(self).

Source code in core/metadata_schemas.py
def __str__(self):
    return self.__repr__()

load_file(source, target=None, incl_orig_path=False) classmethod

Utility method to read metadata of a file from disk and optionally move it into a data archive location.

Source code in core/metadata_schemas.py
@classmethod
def load_file(
    cls,
    source: str,
    target: typing.Optional[str] = None,
    incl_orig_path: bool = False,
):
    """Utility method to read metadata of a file from disk and optionally move it into a data archive location."""

    import mimetypes

    import filetype

    if not source:
        raise ValueError("No source path provided.")

    if not os.path.exists(os.path.realpath(source)):
        raise ValueError(f"Path does not exist: {source}")

    if not os.path.isfile(os.path.realpath(source)):
        raise ValueError(f"Path is not a file: {source}")

    orig_filename = os.path.basename(source)
    orig_path: str = os.path.abspath(source)
    file_import_time = datetime.datetime.now().isoformat()  # TODO: timezone

    file_stats = os.stat(orig_path)
    size = file_stats.st_size

    if target:
        if os.path.exists(target):
            raise ValueError(f"Target path exists: {target}")
        os.makedirs(os.path.dirname(target), exist_ok=True)
        shutil.copy2(source, target)
    else:
        target = orig_path

    r = mimetypes.guess_type(target)
    if r[0] is not None:
        mime_type = r[0]
    else:
        _mime_type = filetype.guess(target)
        if not _mime_type:
            mime_type = "application/octet-stream"
        else:
            mime_type = _mime_type.MIME

    if not incl_orig_path:
        _orig_path: typing.Optional[str] = None
    else:
        _orig_path = orig_path

    m = KiaraFile(
        orig_filename=orig_filename,
        orig_path=_orig_path,
        import_time=file_import_time,
        mime_type=mime_type,
        size=size,
        file_name=orig_filename,
        path=target,
    )
    return m

read_content(self, as_str=True, max_lines=-1)

Read the content of a file.

Source code in core/metadata_schemas.py
def read_content(
    self, as_str: bool = True, max_lines: int = -1
) -> typing.Union[str, bytes]:
    """Read the content of a file."""

    mode = "r" if as_str else "rb"

    with open(self.path, mode) as f:
        if not max_lines:
            content = f.read()
        else:
            content = "".join((next(f) for x in range(max_lines)))
    return content

KiaraFileBundle pydantic-model

Describes properties for the 'file_bundle' value type.

bundle_name: str pydantic-field required

The name of this bundle.

import_time: str pydantic-field required

The time when the file was imported.

included_files: Dict[str, kiara_modules.core.metadata_schemas.KiaraFile] pydantic-field required

A map of all the included files, incl. their properties.

is_onboarded: bool pydantic-field

Whether this bundle is imported into the kiara data store.

number_of_files: int pydantic-field required

How many files are included in this bundle.

orig_bundle_name: str pydantic-field required

The original name of this folder at the time of import.

orig_path: str pydantic-field

The original path to this folder at the time of import.

path: str pydantic-field required

The archive path of the folder.

size: int pydantic-field required

The size of all files in this folder, combined.

__repr__(self) special

Return repr(self).

Source code in core/metadata_schemas.py
def __repr__(self):
    return f"FileBundle(name={self.bundle_name})"

__str__(self) special

Return str(self).

Source code in core/metadata_schemas.py
def __str__(self):
    return self.__repr__()

TableMetadata pydantic-model

Describes properties for the 'table' data type.

column_names: List[str] pydantic-field required

The name of the columns of the table.

column_schema: Dict[str, kiara_modules.core.metadata_schemas.ColumnSchema] pydantic-field required

The schema description of the table.

rows: int pydantic-field required

The number of rows the table contains.

size: int pydantic-field required

The tables size in bytes.