The tool considers a file so large that it does not fit in memory as a single string and performs a split process of the string. The tool stores the result as separate files.
Project description
large_file_splitter
下の方に日本語の説明があります
Overview
- The tool considers a file so large that it does not fit in memory as a single string and performs a split process of the string. The tool stores the result as separate files.
- under construction
Usage
import large_file_splitter
# Split a large file [large_file_splitter].
large_file_splitter.split(
"dummy_large_file.txt", # File to be split
split_str = "SPLIT_MARK\r\n", # Split string (For convenience of splitting, it is processed as binary internally, so setting this to a single character is not recommended because it may lead to erroneous splitting of multi-byte characters, etc.)
div_mode = "start", # mode for handling split strings (delete: split string is not included in output; start: split string is concatenated at the beginning of the next chunk; end: split string is concatenated at the end of the previous chunk)
output_filename_frame = "./output/div_%d.txt", # Template for output filename (an integer value is automatically inserted for %d)
cache_size = 10 * 1024 * 1024 # Specify the size of the chunk of data to work with in memory (in bytes; memory capacity must be at least several times this size.)
)
概要
- メモリに乗らないほど巨大なファイルを一つの文字列とみなし、文字列のsplit処理を実施。その結果を別々のファイルとして格納するツール。
- 説明は執筆中です
使用例
import large_file_splitter
# 巨大ファイルの分割 [large_file_splitter]
large_file_splitter.split(
"dummy_large_file.txt", # 分割対象ファイル
split_str = "SPLIT_MARK\r\n", # 分割文字列 (分割の都合上内部ではbinaryとして処理するので、ここを一文字等にするのは、マルチバイト文字等の誤分割に繋がる可能性があるため非推奨)
div_mode = "start", # 分割文字列の扱いのモード (delete: 分割文字列は出力に含まない; start: 分割文字列は次の塊の先頭に結合される; end: 分割文字列は前の塊の末尾に結合される)
output_filename_frame = "./output/div_%d.txt", # 出力先ファイル名のテンプレート (%dのところは自動で整数値が挿入される)
cache_size = 10 * 1024 * 1024 # メモリで作業するデータ塊の大きさの指定 (バイト単位; メモリ容量は少なくともこの数倍は必要)
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for large-file-splitter-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4703f7c17b2da7eec8831812d341c2cc2b18d6e97e4759d9b2b2f6af47b93a7 |
|
MD5 | 460b7f5a51e07437befb56dfa7179115 |
|
BLAKE2b-256 | 9f2b4eb449612bff6f6b4405b7b01cd73ebb555cea7c5cd36bfc09c710e64bc7 |
Close
Hashes for large_file_splitter-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9b3615a33d622e79429fd6fdde1f82fd94d6c07025b336c81a4ab64bd1500b4 |
|
MD5 | 891acbdf24c683869d4d8853d8c8dca8 |
|
BLAKE2b-256 | 500841accc62ab97f4dfe88f9cce88b76495f902c35a483a7ddf758a09e6c675 |