Compressed Controller API

Purpose

This module defines compressed file processing in RDEToolKit. It provides functionality for compressed file extraction, validation, information retrieval, and temporary file management.

Key Features

Compressed File Processing

Support for various compression formats including ZIP, TAR, GZ
Compressed file extraction and validation
Proper handling of Japanese file names

File Management

Temporary directory management
Organization of extracted files
Cleanup processing

`src.rdetoolkit.impl.compressed_controller.CompressedFlatFileParser(xlsx_invoice)`

Bases: ICompressedFileStructParser

Parser for compressed flat files, providing functionality to read and extract the contents.

This parser specifically deals with flat files that are compressed. It extracts the files and ensures they match the expected structure described in an excelinvoice.

属性：

名前	タイプ	デスクリプション
`xlsx_invoice`	`DataFrame`	DataFrame representing the expected structure or content description of the compressed files.

`xlsx_invoice = xlsx_invoice` `instance-attribute`

`read(zipfile, target_path)`

Extracts the contents of the zipfile to the target path and checks their existence against the Excelinvoice.

引数：

名前	タイプ	デスクリプション	デフォルト
`zipfile`	`Path`	Path to the compressed flat file to be read.	必須
`target_path`	`Path`	Destination directory where the zipfile will be extracted to.	必須

戻り値：

タイプ	デスクリプション
`list[tuple[Path, ...]]`	List[Tuple[Path, ...]]: A list of tuples containing file paths. Each tuple
`list[tuple[Path, ...]]`	represents files from the compressed archive that matched the xlsx_invoice structure.

`src.rdetoolkit.impl.compressed_controller.CompressedFolderParser(xlsx_invoice)`

Bases: ICompressedFileStructParser

Parser for compressed folders, extracting contents and ensuring they match an expected structure.

This parser is specifically designed for compressed folders. It extracts the content and verifies against a provided xlsx invoice structure.

属性：

名前	タイプ	デスクリプション
`xlsx_invoice`	`DataFrame`	DataFrame representing the expected structure or content description of the compressed folder contents.

`xlsx_invoice = xlsx_invoice` `instance-attribute`

`read(zipfile, target_path)`

Extracts the contents of the zipfile and returns validated file paths.

引数：

名前	タイプ	デスクリプション	デフォルト
`zipfile`	`Path`	Path to the compressed folder to be read.	必須
`target_path`	`Path`	Destination directory where the zipfile will be extracted.	必須

戻り値：

タイプ	デスクリプション
`list[tuple[Path, ...]]`	List[Tuple[Path, ...]]: A list of tuples containing file paths that have been
`list[tuple[Path, ...]]`	validated based on unique directory names.

`validation_uniq_fspath(target_path, exclude_names)`

Check if there are any non-unique directory names under the target directory.

引数：

名前	タイプ	デスクリプション	デフォルト
`target_path`	`Union[str, Path]`	The directory path to scan	必須
`exclude_names`	`list[str]`	Excluded files	必須

発生：

タイプ	デスクリプション
`StructuredError`	An exception is raised when duplicate directory names are detected

戻り値：

タイプ	デスクリプション
`dict[str, list[Path]]`	dict[str, Path]: Returns the unique directory names and a list of files under each directory

Note

This function checks for the existence of folders with the same name, differing only in case (e.g., 'folder1' and 'Folder1'). In a Unix-based filesystem, such folders can coexist when creating a zip file. However, Windows does not allow for this coexistence when downloading and unzipping the file, leading to an unzip failure in my environment. Therefore, it's necessary to check for folders whose names differ only in case.

`src.rdetoolkit.impl.compressed_controller.parse_compressedfile_mode(xlsx_invoice)`

Parses the mode of a compressed file and returns the corresponding parser object.

引数：

名前	タイプ	デスクリプション	デフォルト
`xlsx_invoice`	`DataFrame`	The invoice data in Excel format.	必須

戻り値：

名前	タイプ	デスクリプション
`ICompressedFileStructParser`	`ICompressedFileStructParser`	An instance of the compressed file structure parser.

Practical Usage

Basic Compressed File Processing

basic_compressed_processing.py
from rdetoolkit.impl.compressed_controller import CompressedFlatFileParser, CompressedFolderParser
from pathlib import Path

# Use flat file parser
flat_parser = CompressedFlatFileParser()

# Read compressed file
archive_path = Path("data/input/experiment_data.zip")
if archive_path.exists():
    try:
        # Read file
        parsed_data = flat_parser.read(archive_path)
        print(f"✓ Compressed file analysis completed: {parsed_data}")

        # Extract files
        unpacked_files = flat_parser._unpacked(archive_path)
        print(f"Number of extracted files: {len(unpacked_files)}")

        for file_path in unpacked_files:
            print(f"  - {file_path}")

    except Exception as e:
        print(f"✗ Compressed file processing error: {e}")

Folder Structure Compressed File Processing

folder_compressed_processing.py
from rdetoolkit.impl.compressed_controller import CompressedFolderParser
from pathlib import Path

# Use folder parser
folder_parser = CompressedFolderParser()

# Process compressed folder
archive_path = Path("data/input/experiment_folder.zip")
if archive_path.exists():
    try:
        # Read folder structure
        folder_data = folder_parser.read(archive_path)
        print(f"✓ Folder structure analysis completed: {folder_data}")

        # Validate unique paths
        validation_result = folder_parser.validation_uniq_fspath(folder_data)
        if validation_result:
            print("✓ File path uniqueness validation successful")
        else:
            print("✗ File path uniqueness validation failed")

        # Extract files
        unpacked_files = folder_parser._unpacked(archive_path)
        print(f"Number of extracted files: {len(unpacked_files)}")

    except Exception as e:
        print(f"✗ Folder compressed file processing error: {e}")

Compressed File Mode Analysis

compressed_mode_analysis.py
from rdetoolkit.impl.compressed_controller import parse_compressedfile_mode
from pathlib import Path

# Mode analysis for multiple compressed files
archive_files = [
    Path("data/input/flat_data.zip"),
    Path("data/input/folder_structure.zip"),
    Path("data/input/mixed_content.tar.gz")
]

for archive_path in archive_files:
    if archive_path.exists():
        try:
            # Analyze compressed file mode
            mode_result = parse_compressedfile_mode(archive_path)
            print(f"File: {archive_path.name}")
            print(f"Mode: {mode_result}")
            print(f"---")

        except Exception as e:
            print(f"✗ Mode analysis error {archive_path.name}: {e}")

Compressed Controller API

Purpose

Key Features

Compressed File Processing

File Management

src.rdetoolkit.impl.compressed_controller.CompressedFlatFileParser(xlsx_invoice)

xlsx_invoice = xlsx_invoice instance-attribute

read(zipfile, target_path)

src.rdetoolkit.impl.compressed_controller.CompressedFolderParser(xlsx_invoice)

xlsx_invoice = xlsx_invoice instance-attribute

read(zipfile, target_path)

validation_uniq_fspath(target_path, exclude_names)

src.rdetoolkit.impl.compressed_controller.parse_compressedfile_mode(xlsx_invoice)

Practical Usage

Basic Compressed File Processing

Folder Structure Compressed File Processing

Compressed File Mode Analysis

`src.rdetoolkit.impl.compressed_controller.CompressedFlatFileParser(xlsx_invoice)`

`xlsx_invoice = xlsx_invoice` `instance-attribute`

`read(zipfile, target_path)`

`src.rdetoolkit.impl.compressed_controller.CompressedFolderParser(xlsx_invoice)`

`xlsx_invoice = xlsx_invoice` `instance-attribute`

`read(zipfile, target_path)`

`validation_uniq_fspath(target_path, exclude_names)`

`src.rdetoolkit.impl.compressed_controller.parse_compressedfile_mode(xlsx_invoice)`