Skip to content

Validation Features

Overview

RDEToolKit implements comprehensive validation features to ensure the integrity and quality of RDE-related files. By performing pre-checks during local development, you can prevent errors when registering with RDE.

Prerequisites

  • RDEToolKit installation
  • Basic understanding of template files
  • Python 3.9 or higher

Validation Target Files

Main files subject to validation in RDEToolKit:

  • invoice.schema.json: Invoice schema file
  • invoice.json: Invoice data file
  • metadata-def.json: Metadata definition file
  • metadata.json: Metadata file

Important

These files can be modified within structured processing, making pre-validation crucial.

Related Documentation

About Template Files

invoice.schema.json Validation

Overview

invoice.schema.json is a schema file that configures RDE screens. It provides check functionality to verify that necessary fields are defined when modifying during structured processing or creating definition files locally.

Basic Usage

invoice.schema.json Validation
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import json
from pydantic import ValidationError

from rdetoolkit.validation import InvoiceValidator
from rdetoolkit.exceptions import InvoiceSchemaValidationError

# Schema definition
schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://rde.nims.go.jp/rde/dataset-templates/dataset_template_custom_sample/invoice.schema.json",
    "description": "RDE dataset template sample custom information invoice",
    "type": "object",
    "required": ["custom", "sample"],
    "properties": {
        "custom": {
            "type": "object",
            "label": {"ja": "固有情報", "en": "Custom Information"},
            "required": ["sample1"],
            "properties": {
                "sample1": {
                    "label": {"ja": "サンプル1", "en": "sample1"},
                    "type": "string",
                    "format": "date",
                    "options": {"unit": "A"}
                },
                "sample2": {
                    "label": {"ja": "サンプル2", "en": "sample2"},
                    "type": "number",
                    "options": {"unit": "b"}
                },
            },
        },
        "sample": {
            "type": "object",
            "label": {"ja": "試料情報", "en": "Sample Information"},
            "properties": {
                "generalAttributes": {
                    "type": "array",
                    "items": [
                        {
                            "type": "object",
                            "required": ["termId"],
                            "properties": {
                                "termId": {
                                    "const": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e"
                                }
                            }
                        }
                    ],
                },
                "specificAttributes": {"type": "array", "items": []},
            },
        },
    },
}

# Data example
data = {
    "datasetId": "1s1199df4-0d1v-41b0-1dea-23bf4dh09g12",
    "basic": {
        "dateSubmitted": "",
        "dataOwnerId": "0c233ef274f28e611de4074638b4dc43e737ab993132343532343430",
        "dataName": "test-dataset",
        "instrumentId": None,
        "experimentId": None,
        "description": None,
    },
    "custom": {"sample1": "2023-01-01", "sample2": 1.0},
    "sample": {
        "sampleId": "",
        "names": ["test"],
        "composition": None,
        "referenceUrl": None,
        "description": None,
        "generalAttributes": [
            {"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": None}
        ],
        "specificAttributes": [],
        "ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439",
    },
}

# Save schema file
with open("temp/invoice.schema.json", "w") as f:
    json.dump(schema, f, ensure_ascii=False, indent=2)

# Execute validation
validator = InvoiceValidator("temp/invoice.schema.json")
try:
    validator.validate(obj=data)
    print("Validation successful")
except ValidationError as validation_error:
    raise InvoiceSchemaValidationError from validation_error

Handling Validation Errors

When invoice.schema.json validation errors occur, pydantic_core._pydantic_core.ValidationError is raised.

Reading Error Messages

Error messages display the following information:

  • Field causing the error
  • Error type
  • Error message
Error Example
1
2
3
1. Field: required.0
   Type: literal_error
   Context: Input should be 'custom' or 'sample'

This example indicates that the required field must contain custom or sample.

Common Errors and Fixes

Error Example:

Problematic Schema
1
2
3
4
5
6
7
{
    "required": ["custom"], // sample is defined but not included
    "properties": {
        "custom": { /* ... */ },
        "sample": { /* ... */ }
    }
}

Fix:

Corrected Schema
1
2
3
4
5
6
7
{
    "required": ["custom", "sample"], // Include both
    "properties": {
        "custom": { /* ... */ },
        "sample": { /* ... */ }
    }
}

invoice.json Validation

Overview

invoice.json validation requires the corresponding invoice.schema.json. It checks data integrity according to constraints defined in the schema.

Basic Usage

invoice.json Validation
1
2
3
4
5
6
7
# Using the schema and data from above
validator = InvoiceValidator("temp/invoice.schema.json")
try:
    validator.validate(obj=data)
    print("invoice.json validation successful")
except ValidationError as validation_error:
    print(f"Validation error: {validation_error}")

Sample Information Validation

When developing structured processing in a local environment, you need to prepare invoice.json (invoice) in advance. When defining sample information, the following two cases are expected:

1. Adding New Sample Information

In this case, sampleId, names, and ownerId in the sample field are required.

New Sample Information
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"sample": {
    "sampleId": "de1132316439",
    "names": ["test"],
    "composition": null,
    "referenceUrl": null,
    "description": null,
    "generalAttributes": [
        {"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": null}
    ],
    "specificAttributes": [],
    "ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439"
}

2. Referencing Existing Sample Information

In this case, only sampleId in the sample field is required.

Existing Sample Information Reference
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"sample": {
    "sampleId": "de1132316439",
    "names": [],
    "composition": null,
    "referenceUrl": null,
    "description": null,
    "generalAttributes": [
        {"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": null}
    ],
    "specificAttributes": [],
    "ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439"
}

Sample Information Validation Errors

If neither of the above two cases is satisfied, validation errors will occur.

Sample Information Error Example
1
2
3
4
5
Error: Error in validating system standard field.
Please correct the following fields in invoice.json
Field: sample
Type: anyOf
Context: {'sampleId': '', 'names': 'test', 'generalAttributes': [...], 'specificAttributes': [], 'ownerId': ''} is not valid under any of the given schemas

Other Validation Errors

When there are deficiencies or invalid values in the basic items of invoice.json, jsonschema validation errors occur.

Basic Information Error Example
1
2
3
4
5
Error: Error in validating system standard item in invoice.schema.json.
Please correct the following fields in invoice.json
Field: basic.dataOwnerId
Type: pattern
Context: String does not match expected pattern

metadata-def.json Validation

Overview

metadata-def.json is a file that defines the structure and constraints of metadata. Validation of this file ensures the integrity of metadata schemas.

Basic Usage

metadata-def.json Validation
1
2
3
4
5
6
7
8
9
from rdetoolkit.validation import MetadataValidator

# Metadata definition file validation
metadata_validator = MetadataValidator("path/to/metadata-def.json")
try:
    metadata_validator.validate_schema()
    print("metadata-def.json validation successful")
except ValidationError as e:
    print(f"Metadata definition validation error: {e}")

metadata.json Validation

Overview

metadata.json is the actual metadata file based on the schema defined in metadata-def.json.

Basic Usage

metadata.json Validation
1
2
3
4
5
6
# Metadata file validation
try:
    metadata_validator.validate_data("path/to/metadata.json")
    print("metadata.json validation successful")
except ValidationError as e:
    print(f"Metadata validation error: {e}")

Integrated Validation

Automatic Validation in Workflows

Validation is automatically executed when running RDEToolKit workflows:

Workflow Integrated Validation
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from rdetoolkit import workflows

def my_dataset_function(rde):
    # Data processing logic
    rde.set_metadata({"status": "processed"})
    return 0

# Automatic validation is executed during workflow execution
try:
    result = workflows.run(my_dataset_function)
    print("Workflow execution successful")
except Exception as e:
    print(f"Workflow execution error (including validation): {e}")

Best Practices

Validation Strategy During Development

  1. Staged Validation
  2. Validate schema files first
  3. Validate data files later

  4. Continuous Checking

  5. Automatic validation on file changes
  6. Validation in CI/CD pipelines

  7. Error Handling

  8. Utilize detailed error messages
  9. Gradual error correction

Troubleshooting

Common Issues and Solutions

  1. Schema Syntax Errors
  2. Check JSON syntax
  3. Verify required fields

  4. Data Type Mismatches

  5. Compare with types defined in schema
  6. Check default values

  7. Reference Errors

  8. Verify file paths
  9. Check file existence

Practical Example

Complete Validation Workflow

Complete Validation Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import json
from pathlib import Path
from rdetoolkit.validation import InvoiceValidator, MetadataValidator
from rdetoolkit.exceptions import InvoiceSchemaValidationError

def validate_all_files(project_dir: Path):
    """Validate all files in the project"""

    # 1. invoice.schema.json validation
    schema_path = project_dir / "tasksupport" / "invoice.schema.json"
    invoice_path = project_dir / "invoice" / "invoice.json"

    try:
        invoice_validator = InvoiceValidator(schema_path)
        print("✓ invoice.schema.json validation successful")

        # 2. invoice.json validation
        with open(invoice_path) as f:
            invoice_data = json.load(f)

        invoice_validator.validate(obj=invoice_data)
        print("✓ invoice.json validation successful")

    except ValidationError as e:
        print(f"✗ Invoice validation error: {e}")
        return False

    # 3. metadata-def.json validation
    metadata_def_path = project_dir / "tasksupport" / "metadata-def.json"
    metadata_path = project_dir / "metadata.json"

    try:
        metadata_validator = MetadataValidator(metadata_def_path)
        metadata_validator.validate_schema()
        print("✓ metadata-def.json validation successful")

        # 4. metadata.json validation
        if metadata_path.exists():
            metadata_validator.validate_data(metadata_path)
            print("✓ metadata.json validation successful")

    except ValidationError as e:
        print(f"✗ Metadata validation error: {e}")
        return False

    print("🎉 All file validation completed")
    return True

# Usage example
project_directory = Path("./my_rde_project")
validate_all_files(project_directory)

Next Steps