Skip to content

kiara_plugin.language_processing

Home

kiara_plugin.language_processing

Home Home
Table of contents
Package contents
Package contents
- module_types
- operations
Usage
Development
API reference
API reference
- kiara_plugin
  kiara_plugin
  - language_processing
    
    language_processing
    
    data_types
    
    defaults
    
    models
    
    modules
    
    modules
    
    lda
    
    lemmatize
    
    tokens
    
    pipelines

kiara plugin: language_processing¶

This package contains a set of commonly used/useful modules, pipelines, types and metadata schemas for Kiara.

Description¶

Language-processing kiara modules and data types.

Package content¶

module_types¶

generate.LDA.for.tokens_array: Perform Latent Dirichlet Allocation on a tokenized corpus.
tokenize.string: Tokenize a string.
tokenize.texts_array: Split sentences into words or words into characters.
create.stopwords_list: Create a list of stopwords from one or multiple sources.
preprocess.tokens_array: Preprocess lists of tokens, incl. lowercasing, remove special characers, etc.

operations¶

create.stopwords_list: Create a list of stopwords from one or multiple sources.
generate.LDA.for.tokens_array: Perform Latent Dirichlet Allocation on a tokenized corpus.
preprocess.tokens_array: Preprocess lists of tokens, incl. lowercasing, remove special characers, etc.
tokenize.string: Tokenize a string.
tokenize.texts_array: Split sentences into words or words into characters.

Links¶