Skip to content

Directory Path Retrieval Methods

Purpose

This guide explains how to manage directory paths used for file I/O in RDE structured processing. The new RdeDatasetPaths class unifies RdeInputDirPaths and RdeOutputResourcePath; new implementations should prefer this single-object interface while legacy code can continue to rely on the separate classes.

RdeDatasetPaths bundles input- and output-side paths into one object so dataset callbacks can accept a single argument. Configuration accessors and helper shortcuts are exposed on the same instance, keeping callback code concise.

Preferred Dataset Signature

Preferred signature
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from rdetoolkit.models.rde2types import RdeDatasetPaths


def dataset(paths: RdeDatasetPaths) -> None:
    # List incoming CSV files
    for csv_file in paths.inputdata.glob("*.csv"):
        print(f"Input CSV: {csv_file}")

    # Structured data output directory
    struct_dir = paths.struct
    print(f"Structured output: {struct_dir}")

Frequently Used Properties

  • paths.inputdata: Input data directory; works with Path.glob() and similar utilities.
  • paths.invoice: Input-side invoice directory.
  • paths.tasksupport: Directory containing auxiliary data such as metadata-def.json.
  • paths.struct: Structured data output directory.
  • paths.meta: Metadata output directory.
  • paths.rawfiles: Collected input files (per tile) after extraction or copying. Use this to determine the exact processing targets.
  • paths.raw / paths.nonshared_raw: Output locations for raw data.
  • paths.main_image, paths.other_image, paths.thumbnail: Image output locations.
  • paths.logs: Directory for workflow log files.
  • paths.metadata_def_json: Shortcut to tasksupport/metadata-def.json.

Example: Reading Input Files

Reading with RdeDatasetPaths
1
2
3
4
5
6
7
8
9
import pandas as pd
from rdetoolkit.models.rde2types import RdeDatasetPaths


def read_inputs(paths: RdeDatasetPaths) -> None:
    # rawfiles contains the finalized list of input artifacts for this tile
    for source in paths.rawfiles:
        df = pd.read_csv(source)
        print(f"{source.name} loaded: {df.shape}")

Example: Writing Outputs

Saving with RdeDatasetPaths
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import json
from rdetoolkit.models.rde2types import RdeDatasetPaths


def save_results(paths: RdeDatasetPaths, payload: dict) -> None:
    output_path = paths.struct / "results.json"
    output_path.write_text(
        json.dumps(payload, ensure_ascii=False, indent=2),
        encoding="utf-8",
    )

    meta_path = paths.meta / "metadata.json"
    meta_path.write_text(
        json.dumps({"count": len(payload)}, ensure_ascii=False, indent=2),
        encoding="utf-8",
    )

Compatibility and Legacy Style

Existing callbacks may still accept two arguments (RdeInputDirPaths, RdeOutputResourcePath). The toolkit keeps this signature for backward compatibility, but new structured processing code should adopt the unified single-argument form.

Legacy Signature Example

Legacy usage (maintenance only)
1
2
3
4
5
6
from rdetoolkit.models.rde2types import RdeInputDirPaths, RdeOutputResourcePath


def dataset(srcpaths: RdeInputDirPaths, resource_paths: RdeOutputResourcePath) -> None:
    print(srcpaths.inputdata)
    print(resource_paths.struct)

Splitting Back into Legacy Arguments

When you must call older helpers, use as_legacy_args() to recover the original pair.

Bridging to legacy helpers
1
2
3
4
5
6
from rdetoolkit.models.rde2types import RdeDatasetPaths


def dataset(paths: RdeDatasetPaths) -> None:
    srcpaths, resource_paths = paths.as_legacy_args()
    legacy_dataset(srcpaths, resource_paths)

Verifying Outputs

RdeDatasetPaths also works for existence checks and file counts.

Output directory verification
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from rdetoolkit.models.rde2types import RdeDatasetPaths


def verify_outputs(paths: RdeDatasetPaths) -> None:
    for name, directory in {
        "structured": paths.struct,
        "meta": paths.meta,
        "raw": paths.raw,
        "main_image": paths.main_image,
    }.items():
        if directory.exists():
            print(f"{name} directory: {len(list(directory.iterdir()))} items")
        else:
            print(f"⚠️ {name} directory is missing")