Directory Structure Specification
Purpose
This document explains the technical specifications of directory structures supported by RDE structured processing. It provides detailed definitions of directories required for project setup and execution.
Directory groups required before executing structured processing. These are created manually for local execution and auto-generated in RDE environments.
Directory Name |
Data Type |
Required |
Description |
inputdata |
Input Data |
○ |
Stores raw data files to be processed |
invoice |
Invoice Data |
○ |
Stores metadata file (invoice.json) |
tasksupport |
Configuration Files |
○ |
Stores schema definitions and metadata definition files |
Input Directory Contents
| data/
├── inputdata/
│ ├── sample_data.csv # Target data for processing
│ ├── measurement.xlsx # Measurement data
│ └── archive.zip # Compressed data
├── invoice/
│ └── invoice.json # Invoice metadata
└── tasksupport/
├── invoice.schema.json # Invoice schema definition
├── metadata-def.json # Metadata definition
└── rdeconfig.yaml # Processing configuration file
|
Output Directory Specification
Directory groups automatically generated as results of structured processing execution.
Standard Output Directories
Directory Name |
Data Type |
Auto-Generated |
RDE Registration |
Description |
raw |
Raw Data |
○ |
○ |
Copy of input data (shareable) |
structured |
Structured Data |
○ |
○ |
Processed data files |
meta |
Metadata |
○ |
○ |
metadata.json file |
main_image |
Images |
○ |
○ |
Images for dataset detail display |
other_image |
Images |
○ |
○ |
Images for file list display |
thumbnail |
Images |
○ |
○ |
Thumbnails for dataset list display |
nonshared_raw |
Non-shared Data |
○ |
○ |
Non-shareable file groups |
attachment |
Attachment Files |
- |
○ |
Manually placed attachment files |
logs |
Logs |
○ |
- |
Processing log files |
temp |
Temporary Files |
○ |
- |
Temporary work files |
Important Notes
The attachment
directory is not automatically generated by rdetoolkit. Create it manually if needed.
Post-Execution Directory Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 | data/
├── inputdata/
│ └── sample_data.csv
├── invoice/
│ └── invoice.json
├── logs/
│ └── rdesys.log
├── main_image/
│ └── chart.png
├── meta/
│ └── metadata.json
├── nonshared_raw/
├── other_image/
│ ├── detail1.png
│ └── detail2.png
├── raw/
│ └── sample_data.csv
├── structured/
│ └── processed_data.csv
├── tasksupport/
│ ├── invoice.schema.json
│ └── metadata-def.json
├── temp/
│ └── temp_file.json
└── thumbnail/
└── preview.png
|
Multi-Dataset Directory
Specification for the divided
directory used in ExcelInvoice mode and MultiDataTile mode.
Divided Directory Specification
Item |
Specification |
Naming Convention |
data/divided/{index}/ |
Index Format |
4-digit zero-padded (e.g., 0001, 0002, 0029) |
Subdirectories |
Same structure as standard output directories |
Excluded Directories |
inputdata , invoice , tasksupport |
Divided Directory Structure Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 | data/
├── divided/
│ ├── 0001/
│ │ ├── structured/
│ │ ├── meta/
│ │ ├── thumbnail/
│ │ ├── main_image/
│ │ ├── other_image/
│ │ ├── nonshared_raw/
│ │ └── raw/
│ └── 0002/
│ ├── structured/
│ ├── meta/
│ ├── thumbnail/
│ ├── main_image/
│ ├── other_image/
│ ├── nonshared_raw/
│ └── raw/
├── inputdata/
├── invoice/
├── tasksupport/
└── (other standard directories)
|
DirectoryOps API Specification
Technical specification for directory operations using the rdetoolkit.core.DirectoryOps
class.
Basic Usage
DirectoryOps Basic Operations |
---|
1
2
3
4
5
6
7
8
9
10
11
12 | from rdetoolkit.core import DirectoryOps
# Instance creation
dir_ops = DirectoryOps("data")
# Get standard directory path
structured_path = dir_ops.structured.path
print(structured_path) # data/structured
# Get indexed directory path
divided_path = dir_ops.structured(2).path
print(divided_path) # data/divided/0002/structured
|
Batch Operation Methods
Batch Directory Operations |
---|
| # Create all directories at once
all_paths = dir_ops.all()
print(all_paths)
# ['data/invoice', 'data/attachment', 'data/tasksupport',
# 'data/structured', 'data/meta', 'data/thumbnail',
# 'data/main_image', 'data/other_image', 'data/nonshared_raw', 'data/raw']
# Create indexed directories at once
indexed_paths = dir_ops.all(1)
print(indexed_paths)
# Above + ['data/divided/0001/structured', 'data/divided/0001/meta', ...]
|
File List Retrieval Methods
File List Retrieval |
---|
| # File list in specified directory
files = dir_ops.structured.list()
print(files)
# ['data/structured/file1.csv', 'data/structured/file2.csv']
# File list in divided directory
divided_files = dir_ops.structured(2).list()
print(divided_files)
# ['data/divided/0002/structured/file1.csv', ...]
|
Path Specification
Absolute and Relative Paths
- Base Directory:
data/
directory at project root
- Relative Path: Path notation relative to
data/
- Absolute Path: System absolute path notation
File Naming Conventions
File Type |
Naming Convention |
Example |
Metadata |
metadata.json |
data/meta/metadata.json |
Invoice |
invoice.json |
data/invoice/invoice.json |
Logs |
rdesys.log |
data/logs/rdesys.log |
Schema |
*.schema.json |
invoice.schema.json |
Configuration |
rdeconfig.yaml |
rdeconfig.yaml |
Next Steps
After understanding the directory structure specifications, refer to the following documents: