Compressed Controller API
Purpose
This module defines compressed file processing in RDEToolKit. It provides functionality for compressed file extraction, validation, information retrieval, and temporary file management.
Key Features
Compressed File Processing
- Support for various compression formats including ZIP, TAR, GZ
- Compressed file extraction and validation
- Proper handling of Japanese file names
File Management
- Temporary directory management
- Organization of extracted files
- Cleanup processing
src.rdetoolkit.impl.compressed_controller.CompressedFlatFileParser(xlsx_invoice)
Bases: ICompressedFileStructParser
Parser for compressed flat files, providing functionality to read and extract the contents.
This parser specifically deals with flat files that are compressed. It extracts the files and ensures they match the expected structure described in an excelinvoice.
Attributes:
Name | Type | Description |
---|---|---|
xlsx_invoice |
DataFrame
|
DataFrame representing the expected structure or content description of the compressed files. |
xlsx_invoice: Incomplete = xlsx_invoice
instance-attribute
read(zipfile, target_path)
Extracts the contents of the zipfile to the target path and checks their existence against the Excelinvoice.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
zipfile |
Path
|
Path to the compressed flat file to be read. |
required |
target_path |
Path
|
Destination directory where the zipfile will be extracted to. |
required |
Returns:
Type | Description |
---|---|
list[tuple[Path, ...]]
|
List[Tuple[Path, ...]]: A list of tuples containing file paths. Each tuple |
list[tuple[Path, ...]]
|
represents files from the compressed archive that matched the xlsx_invoice structure. |
src.rdetoolkit.impl.compressed_controller.CompressedFolderParser(xlsx_invoice)
Bases: ICompressedFileStructParser
Parser for compressed folders, extracting contents and ensuring they match an expected structure.
This parser is specifically designed for compressed folders. It extracts the content and verifies against a provided xlsx invoice structure.
Attributes:
Name | Type | Description |
---|---|---|
xlsx_invoice |
DataFrame
|
DataFrame representing the expected structure or content description of the compressed folder contents. |
xlsx_invoice: Incomplete = xlsx_invoice
instance-attribute
read(zipfile, target_path)
Extracts the contents of the zipfile and returns validated file paths.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
zipfile |
Path
|
Path to the compressed folder to be read. |
required |
target_path |
Path
|
Destination directory where the zipfile will be extracted. |
required |
Returns:
Type | Description |
---|---|
list[tuple[Path, ...]]
|
List[Tuple[Path, ...]]: A list of tuples containing file paths that have been |
list[tuple[Path, ...]]
|
validated based on unique directory names. |
validation_uniq_fspath(target_path, exclude_names)
Check if there are any non-unique directory names under the target directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_path |
Union[str, Path]
|
The directory path to scan |
required |
exclude_names |
list[str]
|
Excluded files |
required |
Raises:
Type | Description |
---|---|
StructuredError
|
An exception is raised when duplicate directory names are detected |
Returns:
Type | Description |
---|---|
dict[str, list[Path]]
|
dict[str, Path]: Returns the unique directory names and a list of files under each directory |
Note
This function checks for the existence of folders with the same name, differing only in case (e.g., 'folder1' and 'Folder1'). In a Unix-based filesystem, such folders can coexist when creating a zip file. However, Windows does not allow for this coexistence when downloading and unzipping the file, leading to an unzip failure in my environment. Therefore, it's necessary to check for folders whose names differ only in case.
src.rdetoolkit.impl.compressed_controller.parse_compressedfile_mode(xlsx_invoice)
Parses the mode of a compressed file and returns the corresponding parser object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
xlsx_invoice |
DataFrame
|
The invoice data in Excel format. |
required |
Returns:
Name | Type | Description |
---|---|---|
ICompressedFileStructParser |
ICompressedFileStructParser
|
An instance of the compressed file structure parser. |
Practical Usage
Basic Compressed File Processing
basic_compressed_processing.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Folder Structure Compressed File Processing
folder_compressed_processing.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Compressed File Mode Analysis
compressed_mode_analysis.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|