kiara plugin: language_processing¶
This package contains a set of commonly used/useful modules, pipelines, types and metadata schemas for Kiara.
Description¶
Language-processing kiara modules and data types.
Package content¶
module_types¶
-
generate.LDA.for.tokens_array
: Perform Latent Dirichlet Allocation on a tokenized corpus. -
tokenize.string
: Tokenize a string. -
tokenize.texts_array
: Split sentences into words or words into characters. -
create.stopwords_list
: Create a list of stopwords from one or multiple sources. -
preprocess.tokens_array
: Preprocess lists of tokens, incl. lowercasing, remove special characers, etc.
operations¶
-
create.stopwords_list
: Create a list of stopwords from one or multiple sources. -
generate.LDA.for.tokens_array
: Perform Latent Dirichlet Allocation on a tokenized corpus. -
preprocess.tokens_array
: Preprocess lists of tokens, incl. lowercasing, remove special characers, etc. -
tokenize.string
: Tokenize a string. -
tokenize.texts_array
: Split sentences into words or words into characters.