Skip to content

value

value.data_profile

                                                                                
 Documentation                                                                  
                          Generate a data profile report for a dataset.         
                                                                                
                          This uses the DataProfiler Python library, check      
                          out its documentation for more details.               
                                                                                
 Origin                                                                         
                          Authors   Markus Binsteiner (markus@frkl.io)          
                                                                                
 Context                                                                        
                          Tags         core                                     
                          Labels       package: kiara_modules.core              
                          References   source_repo:                             
                                       https://github.com/DHARPA-Project/kia…   
                                       documentation:                           
                                       https://dharpa.org/kiara_modules.core/   
                                       module_doc:                              
                                       https://dharpa.org/kiara_modules.core…   
                                       source_url:                              
                                       https://github.com/DHARPA-Project/kia…   
                                                                                
 Module config                                                                  
                          Field        Type     Description          Required   
                         ─────────────────────────────────────────────────────  
                          constants    object   Value constants      no         
                                                for this module.                
                          defaults     object   Value defaults for   no         
                                                this module.                    
                          value_type   string   The value type to    yes        
                                                profile.                        
                                                                                
 Module config          -- no config --                                         
 Python class                                                                   
                          class_name    DataProfilerModule                      
                          module_name   kiara_modules.core.value                
                          full_name     kiara_modules.core.value.DataProfile…   
                                                                                
 Processing source code  ─────────────────────────────────────────────────────  
                          def process(self, inputs: ValueSet, outputs: Value…   
                                                                                
                              import pyarrow as pa                              
                              from dataprofiler import Data, Profiler, Profi…   
                                                                                
                              set_verbosity(logging.WARNING)                    
                                                                                
                              value_type = self.get_config_value("value_type…   
                                                                                
                              profile_options = ProfilerOptions()               
                              profile_options.structured_options.data_labele…   
                              profile_options.unstructured_options.data_labe…   
                                                                                
                              if value_type == "table":                         
                                  table_item: pa.Table = inputs.get_value_da…   
                                  pd = table_item.to_pandas()                   
                                  profile = Profiler(                           
                                      pd, options=profile_options               
                                  )  # Calculate Statistics, Entity Recognit…   
                                  report = profile.report()                     
                                                                                
                              elif value_type == "file":                        
                                  file_item: KiaraFile = inputs.get_value_da…   
                                  data = Data(file_item.path)                   
                                  profile = Profiler(data, options=profile_o…   
                                  report = profile.report()                     
                              else:                                             
                                  raise KiaraProcessingException(               
                                      f"Data profiling of value type '{value…   
                                  )                                             
                                                                                
                              outputs.set_value("report", report)               
                                                                                
                         ─────────────────────────────────────────────────────