# MFPK-ENC-V5 Format Specification ## Encrypted Multi-File Container Format (Binary V5) ## 1. Introduction MFPK-ENC-V5 is a compact binary container format for storing files and directories with authenticated encryption. The format features: - Argon2id key derivation for modern, memory-hard security. - Compact binary structure optimized for streaming operations. - Authenticated encryption using AES-256-GCM. - Strong resistance against GPU/ASIC-based attacks. - Encrypted metadata including file paths and timestamps. ## 2. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document indicate requirement levels. All multi-byte integers are network byte order (big-endian). ## 3. Cryptographic Primitives - **Key Derivation**: Argon2id - Salt length: 32 bytes - Output length: 32 bytes (256 bits) - Iterations: 3 - Memory cost: 64 MiB (65536 KiB) - Lanes: 4 - Password encoding: UTF-8 - **AEAD**: AES-256-GCM - IV (nonce) length: 12 bytes - Tag length: 16 bytes - Associated Data (AAD): none in this version - **Randomness**: IVs and salts MUST be generated using a CSPRNG. ## 4. Container Overview A container is a binary file consisting of: - A fixed global header containing magic, version, salt, and a password verification record. - A sequence of entries (files or directories). Each entry begins with a 4-byte binary sync word allowing scanning and recovery. The format is append-only. Removal and password change are performed by rewriting to a new file. ## 5. Global Header (fixed size: 72 bytes) | Offset | Size | Description | |--------|------|-------------| | 0 | 4 | MAGIC_VERSION = 0x89 'M' 'F' 0x05 (bytes: 89 4D 46 05) | | 4 | 32 | SALT (32 bytes) | | 36 | 12 | PWV_IV (AES-GCM 12-byte IV for password verification) | | 48 | 24 | PWV_CT (AES-GCM ciphertext+tag of an 8-byte marker) | **Notes:** - The password verification plaintext marker is exactly 8 bytes: PWV_MARKER = 50 57 56 35 4D 41 52 4B (ASCII "PWV5MARK"). This marker is NEVER stored in plaintext; only the IV and the ciphertext+tag are stored. - Total header size = 4 + 32 + 12 + 24 = 72 bytes. ## 6. Entry Record Each entry begins with a 4-byte sync word followed by a compact entry header describing the lengths of encrypted fields and the logical file size. All path strings are encrypted; no plaintext path appears in the container. **Constants:** - SYNC_WORD = A4 45 4E 54 (bytes: 0xA4, 'E', 'N', 'T') - ENTRY_TYPE: 1 byte - 0x00 = file - 0x01 = directory - CHUNK_SIZE = 1,048,576 bytes (1 MiB) - IV_SIZE = 12 bytes, TAG_SIZE = 16 bytes **Entry header layout (all big-endian):** | Offset | Size | Field | |--------|------|---------------------------------------------| | 0 | 4 | SYNC_WORD (A4 45 4E 54) | | 4 | 1 | ENTRY_TYPE (0x00 file, 0x01 dir) | | 5 | 3 | RESERVED (MUST be zero) | | 8 | 4 | FULLPATH_LEN (uint32) - length of ENCRYPTED_FULL_PATH | | 12 | 8 | SIZE (uint64) - logical plaintext file size in bytes. For directories, SIZE MUST be 0 | | 20 | 4 | BASEPATH_LEN (uint32) - length of ENCRYPTED_BASE_PATH | | 24 | 2 | TS_LEN (uint16) - length of ENCRYPTED_TIMESTAMP (0 for dir) | | 26 | 6 | RESERVED (MUST be zero) | | 32 | N1 | ENCRYPTED_FULL_PATH (N1 bytes) | | 32+N1 | N2 | ENCRYPTED_BASE_PATH (N2 bytes) | | 32+N1+N2 | N3 | ENCRYPTED_TIMESTAMP (N3 bytes, N3 == 0 for directories) | | ... | ... | CONTENT (files only): sequence of chunks | Encrypted fields use AES-GCM and are stored as IV || CIPHERTEXT+TAG, where IV = 12 bytes and TAG is 16 bytes. The ciphertext length equals plaintext length. ## 7. File Content Chunking - Files are stored as a sequence of chunks after the entry header. - For each plaintext chunk of up to CHUNK_SIZE bytes: - Generate a fresh, random 12-byte IV. - Store IV || AES-GCM(key, chunk) where ciphertext length is chunk_len + TAG_SIZE. - To skip content without decryption: ``` num_chunks = ceil(SIZE / CHUNK_SIZE) encrypted_size = SIZE + num_chunks * (IV_SIZE + TAG_SIZE) ``` and advance the file pointer by encrypted_size bytes. ## 8. Paths and Timestamps - FULL_PATH format: POSIX-like absolute path, e.g., "/dir/a.txt". - BASE_PATH is the parent directory path of FULL_PATH. - All paths are UTF-8 strings and are stored only in encrypted form. - TIMESTAMP (files only): big-endian IEEE-754 float64 of Unix mtime, then encrypted as a single AES-GCM blob (length TS_LEN bytes). - Directories MUST have TS_LEN = 0 and no content. ## 9. Root Directory Entry A well-formed container SHOULD include an explicit root directory entry during initialization: - ENTRY_TYPE = 0x01 (directory) - FULL_PATH = "/" - SIZE = 0 - BASE_PATH = "/" - TS_LEN = 0 (no timestamp) ## 10. Indexing and Scanning - Readers MAY scan sequentially starting after the 72-byte header: - Read 4 bytes; if not SYNC_WORD, implement resynchronization by scanning forward for the next SYNC_WORD boundary. - Read ENTRY_TYPE and header fields (FULLPATH_LEN, SIZE, BASEPATH_LEN, TS_LEN) and then the corresponding encrypted byte ranges. - Decrypt FULL_PATH and BASE_PATH to reconstruct the directory tree. - For files, decrypt TIMESTAMP to obtain mtime. Content decryption is optional for indexing. - Compute content start position immediately after ENCRYPTED_TIMESTAMP. - For files, skip content using the SIZE-based calculation above. - Decryption is REQUIRED only to recover plaintext paths and (for files) timestamps. Content decryption is not required for indexing. ## 11. Password Verification - To verify a password: - Read SALT from the header. - Derive a key via Argon2id with parameters: - iterations = 3 - memory_cost = 65536 (64 MiB) - lanes = 4 - salt = SALT - length = 32 bytes - Read PWV_IV and PWV_CT from the header and attempt decryption. - Successful decryption yielding the 8-byte PWV_MARKER indicates the password is correct. ## 12. Error Handling and Robustness - Readers SHOULD tolerate trailing or partial entries by: - Resynchronizing on SYNC_WORD when an entry parse fails. - Validating that all declared lengths are present. - Treating incomplete or malformed entries as end-of-container or skipping them safely. - Writers SHOULD flush after chunk writes to reduce risk of partial loss on interruption. - Removal operations SHOULD rewrite to a temporary file and replace on success to maintain atomicity. ## 13. Security Considerations - AES-GCM requires unique IVs per key; this format uses a fresh random IV per encrypted field and per content chunk. Implementations MUST use a CSPRNG. - Full and base paths are always encrypted to avoid leaking names or structure. Only the 1-byte ENTRY_TYPE is plaintext. - The password verification marker confirms correct key derivation without exposing the key or file contents. The marker value is not a secret; secrecy is provided by encryption. - Argon2id parameters are chosen to provide strong resistance against both CPU and GPU/ASIC-based attacks: - iterations = 3 provides reasonable iteration count - memory_cost = 65536 (64 MiB) makes GPU attacks expensive - lanes = 4 allows efficient use of modern multi-core systems - The salt length of 32 bytes provides 256 bits of entropy, which is sufficient for all practical purposes. ## 14. File Format - File extension: ".mfpk" (conventional). ## 15. Versioning - MAGIC_VERSION embeds the format version in byte 4 (0x05 for V5). - Backward compatibility with previous versions is NOT provided by V5. - V5 containers are incompatible with V4 and earlier implementations. ## 16. Constants Summary - MAGIC_VERSION: 89 4D 46 05 - HEADER_SIZE: 72 bytes - SALT_SIZE: 32 - PWV_MARKER (plaintext, encrypted in header): "PWV5MARK" (8 bytes) - SYNC_WORD: A4 45 4E 54 - ENTRY_TYPE_FILE: 0x00 - ENTRY_TYPE_DIRECTORY: 0x01 - IV_SIZE: 12 - TAG_SIZE: 16 - CHUNK_SIZE: 1,048,576 - Argon2id parameters: - iterations: 3 - memory_cost: 65536 (64 MiB) - lanes: 4 - length: 32 bytes ## 17. Example Layout (File Entry, Schematic) ``` [GLOBAL HEADER 72B] A4 45 4E 54 | 00 | 00 00 00 | [N1=fullpath_len u32] | [SIZE u64] | [N2=basepath_len u32] | [N3=ts_len u16] | 00..00 (6B) ENC(full_path) [N1] | ENC(base_path) [N2] | ENC(timestamp) [N3] | (IV || ENC(chunk1)) ... (IV || ENC(chunkN)) ```