コンテンツにスキップ

Compressed Controller API

Purpose

This module defines compressed file processing in RDEToolKit. It provides functionality for compressed file extraction, validation, information retrieval, and temporary file management.

Key Features

Compressed File Processing

  • Support for various compression formats including ZIP, TAR, GZ
  • Compressed file extraction and validation
  • Proper handling of Japanese file names

File Management

  • Temporary directory management
  • Organization of extracted files
  • Cleanup processing

src.rdetoolkit.impl.compressed_controller.CompressedFlatFileParser(xlsx_invoice)

Bases: ICompressedFileStructParser

Parser for compressed flat files, providing functionality to read and extract the contents.

This parser specifically deals with flat files that are compressed. It extracts the files and ensures they match the expected structure described in an excelinvoice.

Attributes:

Name Type Description
xlsx_invoice DataFrame

DataFrame representing the expected structure or content description of the compressed files.

xlsx_invoice: Incomplete = xlsx_invoice instance-attribute

read(zipfile, target_path)

Extracts the contents of the zipfile to the target path and checks their existence against the Excelinvoice.

Parameters:

Name Type Description Default
zipfile Path

Path to the compressed flat file to be read.

required
target_path Path

Destination directory where the zipfile will be extracted to.

required

Returns:

Type Description
list[tuple[Path, ...]]

List[Tuple[Path, ...]]: A list of tuples containing file paths. Each tuple

list[tuple[Path, ...]]

represents files from the compressed archive that matched the xlsx_invoice structure.


src.rdetoolkit.impl.compressed_controller.CompressedFolderParser(xlsx_invoice)

Bases: ICompressedFileStructParser

Parser for compressed folders, extracting contents and ensuring they match an expected structure.

This parser is specifically designed for compressed folders. It extracts the content and verifies against a provided xlsx invoice structure.

Attributes:

Name Type Description
xlsx_invoice DataFrame

DataFrame representing the expected structure or content description of the compressed folder contents.

xlsx_invoice: Incomplete = xlsx_invoice instance-attribute

read(zipfile, target_path)

Extracts the contents of the zipfile and returns validated file paths.

Parameters:

Name Type Description Default
zipfile Path

Path to the compressed folder to be read.

required
target_path Path

Destination directory where the zipfile will be extracted.

required

Returns:

Type Description
list[tuple[Path, ...]]

List[Tuple[Path, ...]]: A list of tuples containing file paths that have been

list[tuple[Path, ...]]

validated based on unique directory names.

validation_uniq_fspath(target_path, exclude_names)

Check if there are any non-unique directory names under the target directory.

Parameters:

Name Type Description Default
target_path Union[str, Path]

The directory path to scan

required
exclude_names list[str]

Excluded files

required

Raises:

Type Description
StructuredError

An exception is raised when duplicate directory names are detected

Returns:

Type Description
dict[str, list[Path]]

dict[str, Path]: Returns the unique directory names and a list of files under each directory

Note

This function checks for the existence of folders with the same name, differing only in case (e.g., 'folder1' and 'Folder1'). In a Unix-based filesystem, such folders can coexist when creating a zip file. However, Windows does not allow for this coexistence when downloading and unzipping the file, leading to an unzip failure in my environment. Therefore, it's necessary to check for folders whose names differ only in case.


src.rdetoolkit.impl.compressed_controller.parse_compressedfile_mode(xlsx_invoice)

Parses the mode of a compressed file and returns the corresponding parser object.

Parameters:

Name Type Description Default
xlsx_invoice DataFrame

The invoice data in Excel format.

required

Returns:

Name Type Description
ICompressedFileStructParser ICompressedFileStructParser

An instance of the compressed file structure parser.


Practical Usage

Basic Compressed File Processing

basic_compressed_processing.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from rdetoolkit.impl.compressed_controller import CompressedFlatFileParser, CompressedFolderParser
from pathlib import Path

# Use flat file parser
flat_parser = CompressedFlatFileParser()

# Read compressed file
archive_path = Path("data/input/experiment_data.zip")
if archive_path.exists():
    try:
        # Read file
        parsed_data = flat_parser.read(archive_path)
        print(f"✓ Compressed file analysis completed: {parsed_data}")

        # Extract files
        unpacked_files = flat_parser._unpacked(archive_path)
        print(f"Number of extracted files: {len(unpacked_files)}")

        for file_path in unpacked_files:
            print(f"  - {file_path}")

    except Exception as e:
        print(f"✗ Compressed file processing error: {e}")

Folder Structure Compressed File Processing

folder_compressed_processing.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from rdetoolkit.impl.compressed_controller import CompressedFolderParser
from pathlib import Path

# Use folder parser
folder_parser = CompressedFolderParser()

# Process compressed folder
archive_path = Path("data/input/experiment_folder.zip")
if archive_path.exists():
    try:
        # Read folder structure
        folder_data = folder_parser.read(archive_path)
        print(f"✓ Folder structure analysis completed: {folder_data}")

        # Validate unique paths
        validation_result = folder_parser.validation_uniq_fspath(folder_data)
        if validation_result:
            print("✓ File path uniqueness validation successful")
        else:
            print("✗ File path uniqueness validation failed")

        # Extract files
        unpacked_files = folder_parser._unpacked(archive_path)
        print(f"Number of extracted files: {len(unpacked_files)}")

    except Exception as e:
        print(f"✗ Folder compressed file processing error: {e}")

Compressed File Mode Analysis

compressed_mode_analysis.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from rdetoolkit.impl.compressed_controller import parse_compressedfile_mode
from pathlib import Path

# Mode analysis for multiple compressed files
archive_files = [
    Path("data/input/flat_data.zip"),
    Path("data/input/folder_structure.zip"),
    Path("data/input/mixed_content.tar.gz")
]

for archive_path in archive_files:
    if archive_path.exists():
        try:
            # Analyze compressed file mode
            mode_result = parse_compressedfile_mode(archive_path)
            print(f"File: {archive_path.name}")
            print(f"Mode: {mode_result}")
            print(f"---")

        except Exception as e:
            print(f"✗ Mode analysis error {archive_path.name}: {e}")