Skip to content

Directory Structure Specification

Purpose

This document explains the technical specifications of directory structures supported by RDE structured processing. It provides detailed definitions of directories required for project setup and execution.

Input Directory Specification

Directory groups required before executing structured processing. These are created manually for local execution and auto-generated in RDE environments.

Required Input Directories

Directory Name Data Type Required Description
inputdata Input Data Stores raw data files to be processed
invoice Invoice Data Stores metadata file (invoice.json)
tasksupport Configuration Files Stores schema definitions and metadata definition files

Input Directory Contents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data/
├── inputdata/
│   ├── sample_data.csv      # Target data for processing   ├── measurement.xlsx     # Measurement data   └── archive.zip          # Compressed data
├── invoice/
│   └── invoice.json         # Invoice metadata
└── tasksupport/
    ├── invoice.schema.json  # Invoice schema definition
    ├── metadata-def.json   # Metadata definition
    └── rdeconfig.yaml       # Processing configuration file

Output Directory Specification

Directory groups automatically generated as results of structured processing execution.

Standard Output Directories

Directory Name Data Type Auto-Generated RDE Registration Description
raw Raw Data Copy of input data (shareable)
structured Structured Data Processed data files
meta Metadata metadata.json file
main_image Images Images for dataset detail display
other_image Images Images for file list display
thumbnail Images Thumbnails for dataset list display
nonshared_raw Non-shared Data Non-shareable file groups
attachment Attachment Files - Manually placed attachment files
logs Logs - Processing log files
temp Temporary Files - Temporary work files

Important Notes

The attachment directory is not automatically generated by rdetoolkit. Create it manually if needed.

Post-Execution Directory Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
data/
├── inputdata/
│   └── sample_data.csv
├── invoice/
│   └── invoice.json
├── logs/
│   └── rdesys.log
├── main_image/
│   └── chart.png
├── meta/
│   └── metadata.json
├── nonshared_raw/
├── other_image/
│   ├── detail1.png
│   └── detail2.png
├── raw/
│   └── sample_data.csv
├── structured/
│   └── processed_data.csv
├── tasksupport/
│   ├── invoice.schema.json
│   └── metadata-def.json
├── temp/
│   └── temp_file.json
└── thumbnail/
    └── preview.png

Multi-Dataset Directory

Specification for the divided directory used in ExcelInvoice mode and MultiDataTile mode.

Divided Directory Specification

Item Specification
Naming Convention data/divided/{index}/
Index Format 4-digit zero-padded (e.g., 0001, 0002, 0029)
Subdirectories Same structure as standard output directories
Excluded Directories inputdata, invoice, tasksupport

Divided Directory Structure Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data/
├── divided/
│   ├── 0001/
│      ├── structured/
│      ├── meta/
│      ├── thumbnail/
│      ├── main_image/
│      ├── other_image/
│      ├── nonshared_raw/
│      └── raw/
│   └── 0002/
│       ├── structured/
│       ├── meta/
│       ├── thumbnail/
│       ├── main_image/
│       ├── other_image/
│       ├── nonshared_raw/
│       └── raw/
├── inputdata/
├── invoice/
├── tasksupport/
└── (other standard directories)

DirectoryOps API Specification

Technical specification for directory operations using the rdetoolkit.core.DirectoryOps class.

Basic Usage

DirectoryOps Basic Operations
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from rdetoolkit.core import DirectoryOps

# Instance creation
dir_ops = DirectoryOps("data")

# Get standard directory path
structured_path = dir_ops.structured.path
print(structured_path)  # data/structured

# Get indexed directory path
divided_path = dir_ops.structured(2).path
print(divided_path)  # data/divided/0002/structured

Batch Operation Methods

Batch Directory Operations
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create all directories at once
all_paths = dir_ops.all()
print(all_paths)
# ['data/invoice', 'data/attachment', 'data/tasksupport', 
#  'data/structured', 'data/meta', 'data/thumbnail', 
#  'data/main_image', 'data/other_image', 'data/nonshared_raw', 'data/raw']

# Create indexed directories at once
indexed_paths = dir_ops.all(1)
print(indexed_paths)
# Above + ['data/divided/0001/structured', 'data/divided/0001/meta', ...]

File List Retrieval Methods

File List Retrieval
1
2
3
4
5
6
7
8
9
# File list in specified directory
files = dir_ops.structured.list()
print(files)
# ['data/structured/file1.csv', 'data/structured/file2.csv']

# File list in divided directory
divided_files = dir_ops.structured(2).list()
print(divided_files)
# ['data/divided/0002/structured/file1.csv', ...]

Path Specification

Absolute and Relative Paths

  • Base Directory: data/ directory at project root
  • Relative Path: Path notation relative to data/
  • Absolute Path: System absolute path notation

File Naming Conventions

File Type Naming Convention Example
Metadata metadata.json data/meta/metadata.json
Invoice invoice.json data/invoice/invoice.json
Logs rdesys.log data/logs/rdesys.log
Schema *.schema.json invoice.schema.json
Configuration rdeconfig.yaml rdeconfig.yaml

Next Steps

After understanding the directory structure specifications, refer to the following documents: