Validation Features
Overview
RDEToolKit implements comprehensive validation features to ensure the integrity and quality of RDE-related files. By performing pre-checks during local development, you can prevent errors when registering with RDE.
Prerequisites
- RDEToolKit installation
- Basic understanding of template files
- Python 3.9 or higher
Validation Target Files
Main files subject to validation in RDEToolKit:
- invoice.schema.json: Invoice schema file
- invoice.json: Invoice data file
- metadata-def.json: Metadata definition file
- metadata.json: Metadata file
Important
These files can be modified within structured processing, making pre-validation crucial.
invoice.schema.json Validation
Overview
invoice.schema.json
is a schema file that configures RDE screens. It provides check functionality to verify that necessary fields are defined when modifying during structured processing or creating definition files locally.
Basic Usage
invoice.schema.json Validation |
---|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93 | import json
from pydantic import ValidationError
from rdetoolkit.validation import InvoiceValidator
from rdetoolkit.exceptions import InvoiceSchemaValidationError
# Schema definition
schema = {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://rde.nims.go.jp/rde/dataset-templates/dataset_template_custom_sample/invoice.schema.json",
"description": "RDE dataset template sample custom information invoice",
"type": "object",
"required": ["custom", "sample"],
"properties": {
"custom": {
"type": "object",
"label": {"ja": "固有情報", "en": "Custom Information"},
"required": ["sample1"],
"properties": {
"sample1": {
"label": {"ja": "サンプル1", "en": "sample1"},
"type": "string",
"format": "date",
"options": {"unit": "A"}
},
"sample2": {
"label": {"ja": "サンプル2", "en": "sample2"},
"type": "number",
"options": {"unit": "b"}
},
},
},
"sample": {
"type": "object",
"label": {"ja": "試料情報", "en": "Sample Information"},
"properties": {
"generalAttributes": {
"type": "array",
"items": [
{
"type": "object",
"required": ["termId"],
"properties": {
"termId": {
"const": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e"
}
}
}
],
},
"specificAttributes": {"type": "array", "items": []},
},
},
},
}
# Data example
data = {
"datasetId": "1s1199df4-0d1v-41b0-1dea-23bf4dh09g12",
"basic": {
"dateSubmitted": "",
"dataOwnerId": "0c233ef274f28e611de4074638b4dc43e737ab993132343532343430",
"dataName": "test-dataset",
"instrumentId": None,
"experimentId": None,
"description": None,
},
"custom": {"sample1": "2023-01-01", "sample2": 1.0},
"sample": {
"sampleId": "",
"names": ["test"],
"composition": None,
"referenceUrl": None,
"description": None,
"generalAttributes": [
{"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": None}
],
"specificAttributes": [],
"ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439",
},
}
# Save schema file
with open("temp/invoice.schema.json", "w") as f:
json.dump(schema, f, ensure_ascii=False, indent=2)
# Execute validation
validator = InvoiceValidator("temp/invoice.schema.json")
try:
validator.validate(obj=data)
print("Validation successful")
except ValidationError as validation_error:
raise InvoiceSchemaValidationError from validation_error
|
Handling Validation Errors
When invoice.schema.json
validation errors occur, pydantic_core._pydantic_core.ValidationError
is raised.
Reading Error Messages
Error messages display the following information:
- Field causing the error
- Error type
- Error message
Error Example |
---|
| 1. Field: required.0
Type: literal_error
Context: Input should be 'custom' or 'sample'
|
This example indicates that the required
field must contain custom
or sample
.
Common Errors and Fixes
Error Example:
Problematic Schema |
---|
| {
"required": ["custom"], // sample is defined but not included
"properties": {
"custom": { /* ... */ },
"sample": { /* ... */ }
}
}
|
Fix:
Corrected Schema |
---|
| {
"required": ["custom", "sample"], // Include both
"properties": {
"custom": { /* ... */ },
"sample": { /* ... */ }
}
}
|
invoice.json Validation
Overview
invoice.json
validation requires the corresponding invoice.schema.json
. It checks data integrity according to constraints defined in the schema.
Basic Usage
invoice.json Validation |
---|
| # Using the schema and data from above
validator = InvoiceValidator("temp/invoice.schema.json")
try:
validator.validate(obj=data)
print("invoice.json validation successful")
except ValidationError as validation_error:
print(f"Validation error: {validation_error}")
|
When developing structured processing in a local environment, you need to prepare invoice.json
(invoice) in advance. When defining sample information, the following two cases are expected:
In this case, sampleId
, names
, and ownerId
in the sample
field are required.
New Sample Information |
---|
1
2
3
4
5
6
7
8
9
10
11
12 | "sample": {
"sampleId": "de1132316439",
"names": ["test"],
"composition": null,
"referenceUrl": null,
"description": null,
"generalAttributes": [
{"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": null}
],
"specificAttributes": [],
"ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439"
}
|
In this case, only sampleId
in the sample
field is required.
Existing Sample Information Reference |
---|
1
2
3
4
5
6
7
8
9
10
11
12 | "sample": {
"sampleId": "de1132316439",
"names": [],
"composition": null,
"referenceUrl": null,
"description": null,
"generalAttributes": [
{"termId": "3adf9874-7bcb-e5f8-99cb-3d6fd9d7b55e", "value": null}
],
"specificAttributes": [],
"ownerId": "de17c7b3f0ff5126831c2d519f481055ba466ddb6238666132316439"
}
|
If neither of the above two cases is satisfied, validation errors will occur.
Sample Information Error Example |
---|
| Error: Error in validating system standard field.
Please correct the following fields in invoice.json
Field: sample
Type: anyOf
Context: {'sampleId': '', 'names': 'test', 'generalAttributes': [...], 'specificAttributes': [], 'ownerId': ''} is not valid under any of the given schemas
|
Other Validation Errors
When there are deficiencies or invalid values in the basic
items of invoice.json
, jsonschema
validation errors occur.
Basic Information Error Example |
---|
| Error: Error in validating system standard item in invoice.schema.json.
Please correct the following fields in invoice.json
Field: basic.dataOwnerId
Type: pattern
Context: String does not match expected pattern
|
Overview
metadata-def.json
is a file that defines the structure and constraints of metadata. Validation of this file ensures the integrity of metadata schemas.
Basic Usage
metadata-def.json Validation |
---|
| from rdetoolkit.validation import MetadataValidator
# Metadata definition file validation
metadata_validator = MetadataValidator("path/to/metadata-def.json")
try:
metadata_validator.validate_schema()
print("metadata-def.json validation successful")
except ValidationError as e:
print(f"Metadata definition validation error: {e}")
|
Overview
metadata.json
is the actual metadata file based on the schema defined in metadata-def.json
.
Basic Usage
metadata.json Validation |
---|
| # Metadata file validation
try:
metadata_validator.validate_data("path/to/metadata.json")
print("metadata.json validation successful")
except ValidationError as e:
print(f"Metadata validation error: {e}")
|
Integrated Validation
Automatic Validation in Workflows
Validation is automatically executed when running RDEToolKit workflows:
Workflow Integrated Validation |
---|
1
2
3
4
5
6
7
8
9
10
11
12
13 | from rdetoolkit import workflows
def my_dataset_function(rde):
# Data processing logic
rde.set_metadata({"status": "processed"})
return 0
# Automatic validation is executed during workflow execution
try:
result = workflows.run(my_dataset_function)
print("Workflow execution successful")
except Exception as e:
print(f"Workflow execution error (including validation): {e}")
|
Best Practices
Validation Strategy During Development
- Staged Validation
- Validate schema files first
-
Validate data files later
-
Continuous Checking
- Automatic validation on file changes
-
Validation in CI/CD pipelines
-
Error Handling
- Utilize detailed error messages
- Gradual error correction
Troubleshooting
Common Issues and Solutions
- Schema Syntax Errors
- Check JSON syntax
-
Verify required fields
-
Data Type Mismatches
- Compare with types defined in schema
-
Check default values
-
Reference Errors
- Verify file paths
- Check file existence
Practical Example
Complete Validation Workflow
Complete Validation Example |
---|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51 | import json
from pathlib import Path
from rdetoolkit.validation import InvoiceValidator, MetadataValidator
from rdetoolkit.exceptions import InvoiceSchemaValidationError
def validate_all_files(project_dir: Path):
"""Validate all files in the project"""
# 1. invoice.schema.json validation
schema_path = project_dir / "tasksupport" / "invoice.schema.json"
invoice_path = project_dir / "invoice" / "invoice.json"
try:
invoice_validator = InvoiceValidator(schema_path)
print("✓ invoice.schema.json validation successful")
# 2. invoice.json validation
with open(invoice_path) as f:
invoice_data = json.load(f)
invoice_validator.validate(obj=invoice_data)
print("✓ invoice.json validation successful")
except ValidationError as e:
print(f"✗ Invoice validation error: {e}")
return False
# 3. metadata-def.json validation
metadata_def_path = project_dir / "tasksupport" / "metadata-def.json"
metadata_path = project_dir / "metadata.json"
try:
metadata_validator = MetadataValidator(metadata_def_path)
metadata_validator.validate_schema()
print("✓ metadata-def.json validation successful")
# 4. metadata.json validation
if metadata_path.exists():
metadata_validator.validate_data(metadata_path)
print("✓ metadata.json validation successful")
except ValidationError as e:
print(f"✗ Metadata validation error: {e}")
return False
print("🎉 All file validation completed")
return True
# Usage example
project_directory = Path("./my_rde_project")
validate_all_files(project_directory)
|
Next Steps