RDEToolKit
RDEToolKit is a fundamental Python package for creating workflows for RDE structured programs. By using various modules of RDEToolKit, you can easily build registration processes for research and experimental data to RDE. Primarily, RDEToolKit supports pre-processing and post-processing of user-defined structured processing. Additionally, by combining with Python modules used for research and experimental data, it enables more diverse processing from data registration to processing and graphing. This allows efficient management of the entire data science workflow, including data cleansing, transformation, aggregation, and visualization.
Challenges and Background
Research data management and sharing faced several challenges:
- Data Format Standardization: Different data formats and file structures across researchers
- Metadata Standardization: Inconsistent metadata descriptions
- Process Automation: Manual burden of data conversion and organization tasks
- Reproducibility: Difficulty in documenting and standardizing processing procedures
Key Concepts
Structured Processing Workflow
RDEToolKit executes "structured processing" to convert research data into standardized RDE format through three phases:
graph LR
Initialization --> Custom_Processing[Custom Structured Processing]
Custom_Processing --> Finalization
- Initialization: Directory creation, file loading, mode detection
- Custom Structured Processing: User-defined data transformation and analysis
- Finalization: Validation, thumbnail generation, metadata description
Four Processing Modes
RDEToolKit provides four processing modes based on data type and usage:
Mode | Purpose | Features |
---|---|---|
Invoice Mode | Single data file | Default mode, basic structured processing |
Excel Invoice Mode | Excel format invoices | Automatic processing of Excel invoice files |
Multi Data Tile | Multiple data files | Batch processing, error skip functionality |
RDE Format Mode | RDE standard format | Reprocessing of existing RDE data |
Configuration Files
Processing behavior can be flexibly controlled through configuration files (rdeconfig.yaml
or pyproject.toml
):
1 2 3 4 5 |
|
Installation
RDEToolKit is provided as a Python package and can be installed with the following command:
1 |
|
Code Sample
Sample1: With User-Defined Structured Processing | Sample2: Without User-Defined Structured Processing |
---|---|
Key Features
Automation Features
- Automatic Directory Structure Generation: Folder structure compliant with RDE standards
- Automatic File Format Detection: Processing mode selection based on input data
- Automatic Metadata Extraction: Metadata generation from file information
- Automatic Thumbnail Creation: Representative image generation from Main images
Validation Features
- Schema Validation: Data structure validation using JSON Schema
- File Integrity Check: Verification of required file existence
- Metadata Validation: Consistency check with metadata-def.json
Extensibility
- Custom Processing Integration: Integration of user-defined functions
- Plugin Functionality: Addition of custom processing logic
- Configuration Flexibility: Detailed settings in YAML/TOML format
Summary
Key values of RDEToolKit:
- Efficiency: Significant time reduction through automation of manual tasks
- Standardization: Unified conversion processing to RDE format
- Flexibility: Support for diverse research data formats
- Reliability: Quality assurance through validation features
- Extensibility: Easy integration of custom processing
Next Steps
To get started with RDEToolKit:
- Installation Guide - Environment setup procedures
- Quick Start - Experience your first structured processing
- User Guide - Detailed usage instructions