Skip to content

Validation Features

Overview

RDEToolKit implements comprehensive validation features to ensure the integrity and quality of RDE-related files. By performing pre-checks during local development, you can prevent errors when registering with RDE.

Prerequisites

  • RDEToolKit installation
  • Basic understanding of template files
  • Python 3.9 or higher

Validation Target Files

Main files subject to validation in RDEToolKit:

  • invoice.schema.json: Invoice schema file
  • invoice.json: Invoice data file
  • metadata-def.json: Metadata definition file
  • metadata.json: Metadata file

Important

These files can be modified within structured processing, making pre-validation crucial.

Related Documentation

About Template Files

invoice.schema.json Validation

Overview

invoice.schema.json is a schema file that configures RDE screens. It provides check functionality to verify that necessary fields are defined when modifying during structured processing or creating definition files locally.

Basic Usage

invoice.schema.json Validation
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import json
from pydantic import ValidationError

from rdetoolkit.validation import InvoiceValidator
from rdetoolkit.exceptions import InvoiceSchemaValidationError

# Schema definition
schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://rde.nims.go.jp/rde/dataset-templates/dataset_template_custom_sample/invoice.schema.json",
    "description": "RDE dataset template sample custom information invoice",
    "type": "object",
    "required": ["custom", "sample"],
    "properties": {
        "custom": {
            "type": "object",
            "label": {"ja": "固有情報", "en": "Custom Information"},
            "required": ["sample1"],
            "properties": {
                "sample1": {
                    "label": {"ja": "サンプル1", "en": "sample1"},
                    "type": "string",
                    "format": "date",
                    "options": {"unit": "A"}
                },
                "sample2": {
                    "label": {"ja": "サンプル2", "en": "sample2"},
                    "type": "number",
                    "options": {"unit": "b"}
                },
            },
        },
        "sample": {
            "type": "object",
            "label": {"ja": "試料情報", "en": "Sample Information"},
            "properties": {
                "generalAttributes": {
                    "type": "array",
                    "items": [
                        {
                            "type": "object",
                            "required": ["termId"],
                            "properties": {
                                "termId": {
                                    "const": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e"
                                }
                            }
                        }
                    ],
                },
                "specificAttributes": {"type": "array", "items": []},
            },
        },
    },
}

# Data example
data = {
    "datasetId": "1s1199df4-0d1v-41b0-1dea-23bf4dh09g12",
    "basic": {
        "dateSubmitted": "",
        "dataOwnerId": "0c233ef274f28e611de4074638b4dc43e737ab993132343532343430",
        "dataName": "test-dataset",
        "instrumentId": None,
        "experimentId": None,
        "description": None,
    },
    "custom": {"sample1": "2023-01-01", "sample2": 1.0},
    "sample": {
        "sampleId": "",
        "names": ["test"],
        "composition": None,
        "referenceUrl": None,
        "description": None,
        "generalAttributes": [
            {"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": None}
        ],
        "specificAttributes": [],
        "ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439",
    },
}

# Save schema file
with open("temp/invoice.schema.json", "w") as f:
    json.dump(schema, f, ensure_ascii=False, indent=2)

# Execute validation
validator = InvoiceValidator("temp/invoice.schema.json")
try:
    validator.validate(obj=data)
    print("Validation successful")
except ValidationError as validation_error:
    raise InvoiceSchemaValidationError from validation_error

Handling Validation Errors

When invoice.schema.json validation errors occur, pydantic_core._pydantic_core.ValidationError is raised.

Reading Error Messages

Error messages display the following information:

  • Field causing the error
  • Error type
  • Error message
Error Example
1
2
3
1. Field: required.0
   Type: literal_error
   Context: Input should be 'custom' or 'sample'

This example indicates that the required field must contain custom or sample.

Common Errors and Fixes

Error Example:

Problematic Schema
1
2
3
4
5
6
7
{
    "required": ["custom"], // sample is defined but not included
    "properties": {
        "custom": { /* ... */ },
        "sample": { /* ... */ }
    }
}

Fix:

Corrected Schema
1
2
3
4
5
6
7
{
    "required": ["custom", "sample"], // Include both
    "properties": {
        "custom": { /* ... */ },
        "sample": { /* ... */ }
    }
}

invoice.json Validation

Overview

invoice.json validation requires the corresponding invoice.schema.json. It checks data integrity according to constraints defined in the schema.

Basic Usage

invoice.json Validation
1
2
3
4
5
6
7
# Using the schema and data from above
validator = InvoiceValidator("temp/invoice.schema.json")
try:
    validator.validate(obj=data)
    print("invoice.json validation successful")
except ValidationError as validation_error:
    print(f"Validation error: {validation_error}")

Sample Information Validation

When developing structured processing in a local environment, you need to prepare invoice.json (invoice) in advance. When defining sample information, the following two cases are expected:

1. Adding New Sample Information

In this case, sampleId, names, and ownerId in the sample field are required.

New Sample Information
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"sample": {
    "sampleId": "de1132316439",
    "names": ["test"],
    "composition": null,
    "referenceUrl": null,
    "description": null,
    "generalAttributes": [
        {"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": null}
    ],
    "specificAttributes": [],
    "ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439"
}

2. Referencing Existing Sample Information

In this case, only sampleId in the sample field is required.

Existing Sample Information Reference
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"sample": {
    "sampleId": "de1132316439",
    "names": [],
    "composition": null,
    "referenceUrl": null,
    "description": null,
    "generalAttributes": [
        {"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": null}
    ],
    "specificAttributes": [],
    "ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439"
}

Sample Information Validation Errors

If neither of the above two cases is satisfied, validation errors will occur.

Sample Information Error Example
1
2
3
4
5
Error: Error in validating system standard field.
Please correct the following fields in invoice.json
Field: sample
Type: anyOf
Context: {'sampleId': '', 'names': 'test', 'generalAttributes': [...], 'specificAttributes': [], 'ownerId': ''} is not valid under any of the given schemas

Other Validation Errors

When there are deficiencies or invalid values in the basic items of invoice.json, jsonschema validation errors occur.

Basic Information Error Example
1
2
3
4
5
Error: Error in validating system standard item in invoice.schema.json.
Please correct the following fields in invoice.json
Field: basic.dataOwnerId
Type: pattern
Context: String does not match expected pattern

metadata-def.json Validation

Overview

metadata-def.json is a file that defines the structure and constraints of metadata. Validation of this file ensures the integrity of metadata schemas.

Basic Usage

metadata-def.json Validation
1
2
3
4
5
6
7
8
9
from rdetoolkit.validation import MetadataValidator

# Metadata definition file validation
metadata_validator = MetadataValidator("path/to/metadata-def.json")
try:
    metadata_validator.validate_schema()
    print("metadata-def.json validation successful")
except ValidationError as e:
    print(f"Metadata definition validation error: {e}")

metadata.json Validation

Overview

metadata.json is the actual metadata file based on the schema defined in metadata-def.json.

Basic Usage

metadata.json Validation
1
2
3
4
5
6
# Metadata file validation
try:
    metadata_validator.validate_data("path/to/metadata.json")
    print("metadata.json validation successful")
except ValidationError as e:
    print(f"Metadata validation error: {e}")

Integrated Validation

Automatic Validation in Workflows

Validation is automatically executed when running RDEToolKit workflows:

Workflow Integrated Validation
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from rdetoolkit import workflows

def my_dataset_function(rde):
    # Data processing logic
    rde.set_metadata({"status": "processed"})
    return 0

# Automatic validation is executed during workflow execution
try:
    result = workflows.run(my_dataset_function)
    print("Workflow execution successful")
except Exception as e:
    print(f"Workflow execution error (including validation): {e}")

CLI Validation Commands

RDEToolKit provides command-line validation tools with standardized exit codes for CI/CD integration.

Available Validation Commands

The validate command provides five subcommands for comprehensive validation:

  • invoice-schema - Validate invoice schema JSON structure
  • metadata-def - Validate metadata definition JSON structure
  • invoice - Validate invoice.json against its schema
  • metadata - Validate metadata.json against metadata-def.json
  • all - Discover and validate all standard RDE files in a project

Exit Codes

All validation commands use standardized exit codes to enable integration with CI/CD pipelines and automation scripts:

Exit Code Status Description
0 Success All validations passed successfully
1 Validation Failure One or more validations failed due to data or schema issues
2 Usage Error Invalid command arguments, missing files, or configuration errors

Usage in CI/CD Pipelines

The exit codes enable robust error handling in shell scripts and CI/CD pipelines:

CI/CD Validation Script Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash
# Example CI/CD validation script

rdetoolkit validate all ./rde-project

EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
    echo "✓ Validation passed - proceeding with deployment"
    exit 0
elif [ $EXIT_CODE -eq 1 ]; then
    echo "✗ Validation failed - check data/schema for errors"
    exit 1
elif [ $EXIT_CODE -eq 2 ]; then
    echo "✗ Configuration error - check command arguments and file paths"
    exit 2
else
    echo "✗ Unexpected exit code: $EXIT_CODE"
    exit $EXIT_CODE
fi

CLI Usage Examples

Validate Invoice Schema (Success - Exit Code 0)

1
2
3
4
5
$ rdetoolkit validate invoice-schema tasksupport/invoice.schema.json
✓ VALID: tasksupport/invoice.schema.json

$ echo $?
0

Validate Invoice Data (Validation Failure - Exit Code 1)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ rdetoolkit validate invoice raw/invoice.json --schema tasksupport/invoice.schema.json
✗ INVALID: raw/invoice.json

Errors:
  1. Field: basic.dataOwnerId
     Type: pattern
     Message: String does not match pattern ^[a-zA-Z0-9]{56}$

$ echo $?
1

Missing File (Usage Error - Exit Code 2)

1
2
3
4
5
$ rdetoolkit validate invoice-schema /nonexistent/schema.json
Error: Schema file not found: /nonexistent/schema.json

$ echo $?
2

Validate All Files in Project

1
2
3
4
5
6
7
8
9
$ rdetoolkit validate all ./rde-project
✓ Validating invoice schema...
✓ Validating metadata definition...
✓ Validating invoice data...
✓ Validating metadata...
✓ All validations passed

$ echo $?
0

CI/CD Integration Examples

GitHub Actions Example

GitHub Actions Workflow
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
name: RDE Validation

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.12'

      - name: Install rdetoolkit
        run: pip install rdetoolkit

      - name: Validate RDE Project
        run: |
          rdetoolkit validate all ./rde-project
          if [ $? -eq 1 ]; then
            echo "::error::RDE validation failed - check data and schema"
            exit 1
          elif [ $? -eq 2 ]; then
            echo "::error::RDE validation configuration error"
            exit 2
          fi

GitLab CI Example

GitLab CI Configuration
1
2
3
4
5
6
7
8
9
validate:
  stage: test
  image: python:3.12
  script:
    - pip install rdetoolkit
    - rdetoolkit validate all ./rde-project
  allow_failure: false
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'

Shell Script Example

Standalone Validation Script
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
set -e

validate_rde_project() {
    local project_dir="$1"

    echo "Validating RDE project: $project_dir"
    rdetoolkit validate all "$project_dir"

    local exit_code=$?
    case $exit_code in
        0)
            echo "✓ Validation successful"
            return 0
            ;;
        1)
            echo "✗ Validation failed - data or schema errors"
            return 1
            ;;
        2)
            echo "✗ Configuration error - check arguments and files"
            return 2
            ;;
        *)
            echo "✗ Unexpected exit code: $exit_code"
            return $exit_code
            ;;
    esac
}

validate_rde_project "./rde-project"

Best Practices

Validation Strategy During Development

  1. Staged Validation
  2. Validate schema files first
  3. Validate data files later

  4. Continuous Checking

  5. Automatic validation on file changes
  6. Validation in CI/CD pipelines using exit codes

  7. Error Handling

  8. Utilize detailed error messages
  9. Gradual error correction
  10. Distinguish between validation failures (exit 1) and configuration errors (exit 2)

Troubleshooting

Common Issues and Solutions

  1. Schema Syntax Errors
  2. Check JSON syntax
  3. Verify required fields

  4. Data Type Mismatches

  5. Compare with types defined in schema
  6. Check default values

  7. Reference Errors

  8. Verify file paths
  9. Check file existence

Practical Example

Complete Validation Workflow

Complete Validation Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import json
from pathlib import Path
from rdetoolkit.validation import InvoiceValidator, MetadataValidator
from rdetoolkit.exceptions import InvoiceSchemaValidationError

def validate_all_files(project_dir: Path):
    """Validate all files in the project"""

    # 1. invoice.schema.json validation
    schema_path = project_dir / "tasksupport" / "invoice.schema.json"
    invoice_path = project_dir / "invoice" / "invoice.json"

    try:
        invoice_validator = InvoiceValidator(schema_path)
        print("✓ invoice.schema.json validation successful")

        # 2. invoice.json validation
        with open(invoice_path) as f:
            invoice_data = json.load(f)

        invoice_validator.validate(obj=invoice_data)
        print("✓ invoice.json validation successful")

    except ValidationError as e:
        print(f"✗ Invoice validation error: {e}")
        return False

    # 3. metadata-def.json validation
    metadata_def_path = project_dir / "tasksupport" / "metadata-def.json"
    metadata_path = project_dir / "metadata.json"

    try:
        metadata_validator = MetadataValidator(metadata_def_path)
        metadata_validator.validate_schema()
        print("✓ metadata-def.json validation successful")

        # 4. metadata.json validation
        if metadata_path.exists():
            metadata_validator.validate_data(metadata_path)
            print("✓ metadata.json validation successful")

    except ValidationError as e:
        print(f"✗ Metadata validation error: {e}")
        return False

    print("🎉 All file validation completed")
    return True

# Usage example
project_directory = Path("./my_rde_project")
validate_all_files(project_directory)

Next Steps