197 lines
8.2 KiB
Markdown
197 lines
8.2 KiB
Markdown
# MFPK-ENC-V5 Format Specification
|
|
|
|
## Encrypted Multi-File Container Format (Binary V5)
|
|
|
|
## 1. Introduction
|
|
|
|
MFPK-ENC-V5 is a compact binary container format for storing files and directories with authenticated encryption. The format features:
|
|
- Argon2id key derivation for modern, memory-hard security.
|
|
- Compact binary structure optimized for streaming operations.
|
|
- Authenticated encryption using AES-256-GCM.
|
|
- Strong resistance against GPU/ASIC-based attacks.
|
|
- Encrypted metadata including file paths and timestamps.
|
|
|
|
## 2. Notational Conventions
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document indicate requirement levels.
|
|
|
|
All multi-byte integers are network byte order (big-endian).
|
|
|
|
## 3. Cryptographic Primitives
|
|
|
|
- **Key Derivation**: Argon2id
|
|
- Salt length: 32 bytes
|
|
- Output length: 32 bytes (256 bits)
|
|
- Iterations: 3
|
|
- Memory cost: 64 MiB (65536 KiB)
|
|
- Lanes: 4
|
|
- Password encoding: UTF-8
|
|
- **AEAD**: AES-256-GCM
|
|
- IV (nonce) length: 12 bytes
|
|
- Tag length: 16 bytes
|
|
- Associated Data (AAD): none in this version
|
|
- **Randomness**: IVs and salts MUST be generated using a CSPRNG.
|
|
|
|
## 4. Container Overview
|
|
|
|
A container is a binary file consisting of:
|
|
- A fixed global header containing magic, version, salt, and a password verification record.
|
|
- A sequence of entries (files or directories). Each entry begins with a 4-byte binary sync word allowing scanning and recovery.
|
|
|
|
The format is append-only. Removal and password change are performed by rewriting to a new file.
|
|
|
|
## 5. Global Header (fixed size: 72 bytes)
|
|
|
|
| Offset | Size | Description |
|
|
|--------|------|-------------|
|
|
| 0 | 4 | MAGIC_VERSION = 0x89 'M' 'F' 0x05 (bytes: 89 4D 46 05) |
|
|
| 4 | 32 | SALT (32 bytes) |
|
|
| 36 | 12 | PWV_IV (AES-GCM 12-byte IV for password verification) |
|
|
| 48 | 24 | PWV_CT (AES-GCM ciphertext+tag of an 8-byte marker) |
|
|
|
|
**Notes:**
|
|
- The password verification plaintext marker is exactly 8 bytes: PWV_MARKER = 50 57 56 35 4D 41 52 4B (ASCII "PWV5MARK"). This marker is NEVER stored in plaintext; only the IV and the ciphertext+tag are stored.
|
|
- Total header size = 4 + 32 + 12 + 24 = 72 bytes.
|
|
|
|
## 6. Entry Record
|
|
|
|
Each entry begins with a 4-byte sync word followed by a compact entry header describing the lengths of encrypted fields and the logical file size. All path strings are encrypted; no plaintext path appears in the container.
|
|
|
|
**Constants:**
|
|
- SYNC_WORD = A4 45 4E 54 (bytes: 0xA4, 'E', 'N', 'T')
|
|
- ENTRY_TYPE: 1 byte
|
|
- 0x00 = file
|
|
- 0x01 = directory
|
|
- CHUNK_SIZE = 1,048,576 bytes (1 MiB)
|
|
- IV_SIZE = 12 bytes, TAG_SIZE = 16 bytes
|
|
|
|
**Entry header layout (all big-endian):**
|
|
|
|
| Offset | Size | Field |
|
|
|--------|------|---------------------------------------------|
|
|
| 0 | 4 | SYNC_WORD (A4 45 4E 54) |
|
|
| 4 | 1 | ENTRY_TYPE (0x00 file, 0x01 dir) |
|
|
| 5 | 3 | RESERVED (MUST be zero) |
|
|
| 8 | 4 | FULLPATH_LEN (uint32) - length of ENCRYPTED_FULL_PATH |
|
|
| 12 | 8 | SIZE (uint64) - logical plaintext file size in bytes. For directories, SIZE MUST be 0 |
|
|
| 20 | 4 | BASEPATH_LEN (uint32) - length of ENCRYPTED_BASE_PATH |
|
|
| 24 | 2 | TS_LEN (uint16) - length of ENCRYPTED_TIMESTAMP (0 for dir) |
|
|
| 26 | 6 | RESERVED (MUST be zero) |
|
|
| 32 | N1 | ENCRYPTED_FULL_PATH (N1 bytes) |
|
|
| 32+N1 | N2 | ENCRYPTED_BASE_PATH (N2 bytes) |
|
|
| 32+N1+N2 | N3 | ENCRYPTED_TIMESTAMP (N3 bytes, N3 == 0 for directories) |
|
|
| ... | ... | CONTENT (files only): sequence of chunks |
|
|
|
|
Encrypted fields use AES-GCM and are stored as IV || CIPHERTEXT+TAG, where IV = 12 bytes and TAG is 16 bytes. The ciphertext length equals plaintext length.
|
|
|
|
## 7. File Content Chunking
|
|
|
|
- Files are stored as a sequence of chunks after the entry header.
|
|
- For each plaintext chunk of up to CHUNK_SIZE bytes:
|
|
- Generate a fresh, random 12-byte IV.
|
|
- Store IV || AES-GCM(key, chunk) where ciphertext length is chunk_len + TAG_SIZE.
|
|
- To skip content without decryption:
|
|
```
|
|
num_chunks = ceil(SIZE / CHUNK_SIZE)
|
|
encrypted_size = SIZE + num_chunks * (IV_SIZE + TAG_SIZE)
|
|
```
|
|
and advance the file pointer by encrypted_size bytes.
|
|
|
|
## 8. Paths and Timestamps
|
|
|
|
- FULL_PATH format: POSIX-like absolute path, e.g., "/dir/a.txt".
|
|
- BASE_PATH is the parent directory path of FULL_PATH.
|
|
- All paths are UTF-8 strings and are stored only in encrypted form.
|
|
- TIMESTAMP (files only): big-endian IEEE-754 float64 of Unix mtime, then encrypted as a single AES-GCM blob (length TS_LEN bytes).
|
|
- Directories MUST have TS_LEN = 0 and no content.
|
|
|
|
## 9. Root Directory Entry
|
|
|
|
A well-formed container SHOULD include an explicit root directory entry during initialization:
|
|
- ENTRY_TYPE = 0x01 (directory)
|
|
- FULL_PATH = "/"
|
|
- SIZE = 0
|
|
- BASE_PATH = "/"
|
|
- TS_LEN = 0 (no timestamp)
|
|
|
|
## 10. Indexing and Scanning
|
|
|
|
- Readers MAY scan sequentially starting after the 72-byte header:
|
|
- Read 4 bytes; if not SYNC_WORD, implement resynchronization by scanning forward for the next SYNC_WORD boundary.
|
|
- Read ENTRY_TYPE and header fields (FULLPATH_LEN, SIZE, BASEPATH_LEN, TS_LEN) and then the corresponding encrypted byte ranges.
|
|
- Decrypt FULL_PATH and BASE_PATH to reconstruct the directory tree.
|
|
- For files, decrypt TIMESTAMP to obtain mtime. Content decryption is optional for indexing.
|
|
- Compute content start position immediately after ENCRYPTED_TIMESTAMP.
|
|
- For files, skip content using the SIZE-based calculation above.
|
|
- Decryption is REQUIRED only to recover plaintext paths and (for files) timestamps. Content decryption is not required for indexing.
|
|
|
|
## 11. Password Verification
|
|
|
|
- To verify a password:
|
|
- Read SALT from the header.
|
|
- Derive a key via Argon2id with parameters:
|
|
- iterations = 3
|
|
- memory_cost = 65536 (64 MiB)
|
|
- lanes = 4
|
|
- salt = SALT
|
|
- length = 32 bytes
|
|
- Read PWV_IV and PWV_CT from the header and attempt decryption.
|
|
- Successful decryption yielding the 8-byte PWV_MARKER indicates the password is correct.
|
|
|
|
## 12. Error Handling and Robustness
|
|
|
|
- Readers SHOULD tolerate trailing or partial entries by:
|
|
- Resynchronizing on SYNC_WORD when an entry parse fails.
|
|
- Validating that all declared lengths are present.
|
|
- Treating incomplete or malformed entries as end-of-container or skipping them safely.
|
|
- Writers SHOULD flush after chunk writes to reduce risk of partial loss on interruption.
|
|
- Removal operations SHOULD rewrite to a temporary file and replace on success to maintain atomicity.
|
|
|
|
## 13. Security Considerations
|
|
|
|
- AES-GCM requires unique IVs per key; this format uses a fresh random IV per encrypted field and per content chunk. Implementations MUST use a CSPRNG.
|
|
- Full and base paths are always encrypted to avoid leaking names or structure. Only the 1-byte ENTRY_TYPE is plaintext.
|
|
- The password verification marker confirms correct key derivation without exposing the key or file contents. The marker value is not a secret; secrecy is provided by encryption.
|
|
- Argon2id parameters are chosen to provide strong resistance against both CPU and GPU/ASIC-based attacks:
|
|
- iterations = 3 provides reasonable iteration count
|
|
- memory_cost = 65536 (64 MiB) makes GPU attacks expensive
|
|
- lanes = 4 allows efficient use of modern multi-core systems
|
|
- The salt length of 32 bytes provides 256 bits of entropy, which is sufficient for all practical purposes.
|
|
|
|
## 14. File Format
|
|
|
|
- File extension: ".mfpk" (conventional).
|
|
|
|
## 15. Versioning
|
|
|
|
- MAGIC_VERSION embeds the format version in byte 4 (0x05 for V5).
|
|
- Backward compatibility with previous versions is NOT provided by V5.
|
|
- V5 containers are incompatible with V4 and earlier implementations.
|
|
|
|
## 16. Constants Summary
|
|
|
|
- MAGIC_VERSION: 89 4D 46 05
|
|
- HEADER_SIZE: 72 bytes
|
|
- SALT_SIZE: 32
|
|
- PWV_MARKER (plaintext, encrypted in header): "PWV5MARK" (8 bytes)
|
|
- SYNC_WORD: A4 45 4E 54
|
|
- ENTRY_TYPE_FILE: 0x00
|
|
- ENTRY_TYPE_DIRECTORY: 0x01
|
|
- IV_SIZE: 12
|
|
- TAG_SIZE: 16
|
|
- CHUNK_SIZE: 1,048,576
|
|
- Argon2id parameters:
|
|
- iterations: 3
|
|
- memory_cost: 65536 (64 MiB)
|
|
- lanes: 4
|
|
- length: 32 bytes
|
|
|
|
## 17. Example Layout (File Entry, Schematic)
|
|
|
|
```
|
|
[GLOBAL HEADER 72B]
|
|
A4 45 4E 54 | 00 | 00 00 00 | [N1=fullpath_len u32] |
|
|
[SIZE u64] | [N2=basepath_len u32] | [N3=ts_len u16] | 00..00 (6B)
|
|
ENC(full_path) [N1] | ENC(base_path) [N2] | ENC(timestamp) [N3] |
|
|
(IV || ENC(chunk1)) ... (IV || ENC(chunkN))
|
|
``` |