From bafbe96491b337dc34b5306cdc09fd68789da9ec Mon Sep 17 00:00:00 2001 From: Jayden Date: Sat, 13 Dec 2025 05:43:10 +0000 Subject: [PATCH] first commit --- specification_V5.md | 197 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 specification_V5.md diff --git a/specification_V5.md b/specification_V5.md new file mode 100644 index 0000000..634abb0 --- /dev/null +++ b/specification_V5.md @@ -0,0 +1,197 @@ +# MFPK-ENC-V5 Format Specification + +## Encrypted Multi-File Container Format (Binary V5) + +## 1. Introduction + +MFPK-ENC-V5 is a compact binary container format for storing files and directories with authenticated encryption. The format features: +- Argon2id key derivation for modern, memory-hard security. +- Compact binary structure optimized for streaming operations. +- Authenticated encryption using AES-256-GCM. +- Strong resistance against GPU/ASIC-based attacks. +- Encrypted metadata including file paths and timestamps. + +## 2. Notational Conventions + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document indicate requirement levels. + +All multi-byte integers are network byte order (big-endian). + +## 3. Cryptographic Primitives + +- **Key Derivation**: Argon2id + - Salt length: 32 bytes + - Output length: 32 bytes (256 bits) + - Iterations: 3 + - Memory cost: 64 MiB (65536 KiB) + - Lanes: 4 + - Password encoding: UTF-8 +- **AEAD**: AES-256-GCM + - IV (nonce) length: 12 bytes + - Tag length: 16 bytes + - Associated Data (AAD): none in this version +- **Randomness**: IVs and salts MUST be generated using a CSPRNG. + +## 4. Container Overview + +A container is a binary file consisting of: +- A fixed global header containing magic, version, salt, and a password verification record. +- A sequence of entries (files or directories). Each entry begins with a 4-byte binary sync word allowing scanning and recovery. + +The format is append-only. Removal and password change are performed by rewriting to a new file. + +## 5. Global Header (fixed size: 72 bytes) + +| Offset | Size | Description | +|--------|------|-------------| +| 0 | 4 | MAGIC_VERSION = 0x89 'M' 'F' 0x05 (bytes: 89 4D 46 05) | +| 4 | 32 | SALT (32 bytes) | +| 36 | 12 | PWV_IV (AES-GCM 12-byte IV for password verification) | +| 48 | 24 | PWV_CT (AES-GCM ciphertext+tag of an 8-byte marker) | + +**Notes:** +- The password verification plaintext marker is exactly 8 bytes: PWV_MARKER = 50 57 56 35 4D 41 52 4B (ASCII "PWV5MARK"). This marker is NEVER stored in plaintext; only the IV and the ciphertext+tag are stored. +- Total header size = 4 + 32 + 12 + 24 = 72 bytes. + +## 6. Entry Record + +Each entry begins with a 4-byte sync word followed by a compact entry header describing the lengths of encrypted fields and the logical file size. All path strings are encrypted; no plaintext path appears in the container. + +**Constants:** +- SYNC_WORD = A4 45 4E 54 (bytes: 0xA4, 'E', 'N', 'T') +- ENTRY_TYPE: 1 byte + - 0x00 = file + - 0x01 = directory +- CHUNK_SIZE = 1,048,576 bytes (1 MiB) +- IV_SIZE = 12 bytes, TAG_SIZE = 16 bytes + +**Entry header layout (all big-endian):** + +| Offset | Size | Field | +|--------|------|---------------------------------------------| +| 0 | 4 | SYNC_WORD (A4 45 4E 54) | +| 4 | 1 | ENTRY_TYPE (0x00 file, 0x01 dir) | +| 5 | 3 | RESERVED (MUST be zero) | +| 8 | 4 | FULLPATH_LEN (uint32) - length of ENCRYPTED_FULL_PATH | +| 12 | 8 | SIZE (uint64) - logical plaintext file size in bytes. For directories, SIZE MUST be 0 | +| 20 | 4 | BASEPATH_LEN (uint32) - length of ENCRYPTED_BASE_PATH | +| 24 | 2 | TS_LEN (uint16) - length of ENCRYPTED_TIMESTAMP (0 for dir) | +| 26 | 6 | RESERVED (MUST be zero) | +| 32 | N1 | ENCRYPTED_FULL_PATH (N1 bytes) | +| 32+N1 | N2 | ENCRYPTED_BASE_PATH (N2 bytes) | +| 32+N1+N2 | N3 | ENCRYPTED_TIMESTAMP (N3 bytes, N3 == 0 for directories) | +| ... | ... | CONTENT (files only): sequence of chunks | + +Encrypted fields use AES-GCM and are stored as IV || CIPHERTEXT+TAG, where IV = 12 bytes and TAG is 16 bytes. The ciphertext length equals plaintext length. + +## 7. File Content Chunking + +- Files are stored as a sequence of chunks after the entry header. +- For each plaintext chunk of up to CHUNK_SIZE bytes: + - Generate a fresh, random 12-byte IV. + - Store IV || AES-GCM(key, chunk) where ciphertext length is chunk_len + TAG_SIZE. +- To skip content without decryption: + ``` + num_chunks = ceil(SIZE / CHUNK_SIZE) + encrypted_size = SIZE + num_chunks * (IV_SIZE + TAG_SIZE) + ``` + and advance the file pointer by encrypted_size bytes. + +## 8. Paths and Timestamps + +- FULL_PATH format: POSIX-like absolute path, e.g., "/dir/a.txt". +- BASE_PATH is the parent directory path of FULL_PATH. +- All paths are UTF-8 strings and are stored only in encrypted form. +- TIMESTAMP (files only): big-endian IEEE-754 float64 of Unix mtime, then encrypted as a single AES-GCM blob (length TS_LEN bytes). +- Directories MUST have TS_LEN = 0 and no content. + +## 9. Root Directory Entry + +A well-formed container SHOULD include an explicit root directory entry during initialization: +- ENTRY_TYPE = 0x01 (directory) +- FULL_PATH = "/" +- SIZE = 0 +- BASE_PATH = "/" +- TS_LEN = 0 (no timestamp) + +## 10. Indexing and Scanning + +- Readers MAY scan sequentially starting after the 72-byte header: + - Read 4 bytes; if not SYNC_WORD, implement resynchronization by scanning forward for the next SYNC_WORD boundary. + - Read ENTRY_TYPE and header fields (FULLPATH_LEN, SIZE, BASEPATH_LEN, TS_LEN) and then the corresponding encrypted byte ranges. + - Decrypt FULL_PATH and BASE_PATH to reconstruct the directory tree. + - For files, decrypt TIMESTAMP to obtain mtime. Content decryption is optional for indexing. + - Compute content start position immediately after ENCRYPTED_TIMESTAMP. + - For files, skip content using the SIZE-based calculation above. +- Decryption is REQUIRED only to recover plaintext paths and (for files) timestamps. Content decryption is not required for indexing. + +## 11. Password Verification + +- To verify a password: + - Read SALT from the header. + - Derive a key via Argon2id with parameters: + - iterations = 3 + - memory_cost = 65536 (64 MiB) + - lanes = 4 + - salt = SALT + - length = 32 bytes + - Read PWV_IV and PWV_CT from the header and attempt decryption. + - Successful decryption yielding the 8-byte PWV_MARKER indicates the password is correct. + +## 12. Error Handling and Robustness + +- Readers SHOULD tolerate trailing or partial entries by: + - Resynchronizing on SYNC_WORD when an entry parse fails. + - Validating that all declared lengths are present. + - Treating incomplete or malformed entries as end-of-container or skipping them safely. +- Writers SHOULD flush after chunk writes to reduce risk of partial loss on interruption. +- Removal operations SHOULD rewrite to a temporary file and replace on success to maintain atomicity. + +## 13. Security Considerations + +- AES-GCM requires unique IVs per key; this format uses a fresh random IV per encrypted field and per content chunk. Implementations MUST use a CSPRNG. +- Full and base paths are always encrypted to avoid leaking names or structure. Only the 1-byte ENTRY_TYPE is plaintext. +- The password verification marker confirms correct key derivation without exposing the key or file contents. The marker value is not a secret; secrecy is provided by encryption. +- Argon2id parameters are chosen to provide strong resistance against both CPU and GPU/ASIC-based attacks: + - iterations = 3 provides reasonable iteration count + - memory_cost = 65536 (64 MiB) makes GPU attacks expensive + - lanes = 4 allows efficient use of modern multi-core systems +- The salt length of 32 bytes provides 256 bits of entropy, which is sufficient for all practical purposes. + +## 14. File Format + +- File extension: ".mfpk" (conventional). + +## 15. Versioning + +- MAGIC_VERSION embeds the format version in byte 4 (0x05 for V5). +- Backward compatibility with previous versions is NOT provided by V5. +- V5 containers are incompatible with V4 and earlier implementations. + +## 16. Constants Summary + +- MAGIC_VERSION: 89 4D 46 05 +- HEADER_SIZE: 72 bytes +- SALT_SIZE: 32 +- PWV_MARKER (plaintext, encrypted in header): "PWV5MARK" (8 bytes) +- SYNC_WORD: A4 45 4E 54 +- ENTRY_TYPE_FILE: 0x00 +- ENTRY_TYPE_DIRECTORY: 0x01 +- IV_SIZE: 12 +- TAG_SIZE: 16 +- CHUNK_SIZE: 1,048,576 +- Argon2id parameters: + - iterations: 3 + - memory_cost: 65536 (64 MiB) + - lanes: 4 + - length: 32 bytes + +## 17. Example Layout (File Entry, Schematic) + +``` +[GLOBAL HEADER 72B] +A4 45 4E 54 | 00 | 00 00 00 | [N1=fullpath_len u32] | +[SIZE u64] | [N2=basepath_len u32] | [N3=ts_len u16] | 00..00 (6B) +ENC(full_path) [N1] | ENC(base_path) [N2] | ENC(timestamp) [N3] | +(IV || ENC(chunk1)) ... (IV || ENC(chunkN)) +``` \ No newline at end of file