What is Structured Processing
Purpose
This page provides a detailed explanation of "structured processing," the core concept of RDEToolKit. By understanding the background, mechanisms, and implementation methods of structured processing, you will be able to build effective data transformation workflows.
Challenges and Background
Research Data Management Challenges
Research activities faced several challenges in data management and sharing:
- Format Diversity: Different data formats across researchers and research fields
- Metadata Inconsistency: Inconsistent description methods and items
- Process Personalization: Data conversion procedures dependent on individuals
- Reproducibility Difficulties: Poor documentation and sharing of processing procedures
Need for RDE (Research Data Express)
To solve these challenges, RDE was developed as a platform for managing and sharing research data in standardized formats. However, converting existing diverse data to RDE format was complex and time-consuming.
Key Concepts
Definition of Structured Processing
Structured Processing refers to a series of processes that convert research data to RDE standard format, add appropriate metadata, and make it ready for sharing.
graph TD
A[Raw Data] --> B[Structured Processing]
B --> C[RDE Standard Format Data]
C --> D[Data Sharing & Publication]
B --> B1[Data Transformation]
B --> B2[Metadata Addition]
B --> B3[Quality Verification]
B --> B4[File Organization]
Three-Phase Workflow
RDEToolKit's structured processing is executed in the following three phases:
1. Initialization
- Automatic Directory Structure Generation: Folder structure compliant with RDE standards
- Input Data Loading: Automatic file format detection
- Processing Mode Determination: Mode selection based on data characteristics
- Configuration File Analysis: Loading of operation parameters
graph TD
init1[Initialization] --> init2[Directory Creation]
init2 --> init3{Mode Selection}
init3 -->|default| init6[Invoice Mode]
init3 --> init7[Excel Invoice Mode]
init3 --> init8[Multi Data Tile]
init3 --> init9[RDE Format Mode]
init6 --> init10[Custom Structured Processing]
init7 --> init10
init8 --> init10
init9 --> init10
2. Custom Structured Processing
The stage where user-defined custom processing logic is executed:
- Data Transformation: Format conversion of research data
- Analysis Processing: Statistical analysis, visualization, machine learning
- Metadata Extraction: Automatic extraction from file information
- Quality Check: Data consistency verification
custom_processing_example.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
3. Finalization
Verification of processing results and final file organization:
- Validation: Structure verification using JSON Schema
- Thumbnail Generation: Automatic creation of representative images
- Metadata Description: Automatic generation of dataset descriptions
- File Organization: Finalization of directory structure
graph TD
end1[Custom Structured Processing] --> end2[Validation]
end2 --> end3{Setting:save_thumbnail_image}
end3 -->|False| end6[Automatic Description Generation]
end3 -->|True| end5[Thumbnail Generation from Main Images]
end5 --> end6
end6 --> end7[Completion]
Four Processing Modes
RDEToolKit provides four processing modes based on data characteristics and usage:
Mode | Activation Condition | Features | Usage |
---|---|---|---|
Invoice Mode | Default | Basic structured processing | Single data file |
Excel Invoice Mode | *._excel_invoice.xlsx file |
Automatic Excel invoice processing | Excel format invoices |
Multi Data Tile | extended_mode: 'MultiDataTile' |
Batch processing, error skip | Multiple data files |
RDE Format Mode | extended_mode: 'rdeformat' |
Reprocessing of existing RDE data | RDE standard format |
Custom Processing Function Implementation Patterns
In structured processing, custom processing is implemented using the following patterns:
processing_patterns.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Summary
Key values of structured processing:
- Automation: Significant reduction of manual work
- Standardization: Unified conversion to RDE format
- Quality Assurance: Reliability through validation features
- Extensibility: Flexible response through custom processing
- Reproducibility: Documentation and sharing of processing procedures
Important Points
- Single Responsibility: Each processing phase has clearly separated responsibilities
- Configuration-Driven: Flexible behavior control through configuration files
- Error Handling: Robust exception handling and log output
- Testability: Structure that enables unit testing
Next Steps
To deepen your understanding of structured processing:
- Configuration Files - Details of processing modes and behavior settings
- CLI Tools - Detailed command line operations
- Validation - Data quality verification methods
- API Reference - Check all available features