expand magic byte detection for common file formats
Some checks failed
CI / Lint & Format (push) Failing after 16s
CI / Unit Tests (push) Has been skipped
CI / Memory Leak Check (push) Has been skipped
CI / SBOM Generation (push) Has been skipped
CI / Security Scan (push) Failing after 22s
CI / Security Tests (push) Has been skipped

Add detection for:
- Images: BMP, TIFF, ICO
- Video: MP4, WebM, FLV, Matroska
- Audio: MP3, FLAC, OGG
- Documents: MS Office OLE (DOC/XLS/PPT)
- Executables: PE (EXE/DLL), ELF, Mach-O, WASM
- Archives: BZIP2, XZ, ZSTD, LZ4, 7z, RAR
- Data: SQLite

This improves REQUIRE_BINARY enforcement by detecting more
recognizable formats that should be encrypted before upload.
This commit is contained in:
Username
2025-12-25 19:47:33 +01:00
parent 9901649fd7
commit 764b831bb0
4 changed files with 52 additions and 10 deletions

View File

@@ -6,7 +6,7 @@ A lightweight, secure pastebin REST API built with Flask.
- **Simple REST API** - Create, retrieve, list, and delete pastes via HTTP
- **Binary support** - Upload text, images, archives, and other binary content
- **Automatic MIME detection** - Magic byte detection (PNG, JPEG, GIF, WebP, ZIP, PDF, GZIP)
- **Automatic MIME detection** - Magic byte detection (images, video, audio, documents, executables, archives)
- **Client certificate authentication** - mTLS or header-based via reverse proxy
- **Tiered expiry** - 1 day (anon), 7 days (untrusted), 30 days (trusted PKI)
- **Size limits** - 3 MiB anonymous, 50 MiB authenticated

View File

@@ -12,6 +12,7 @@ Unstructured intake buffer for ideas, issues, and observations. Items here are r
- Design: compress-then-encrypt only (not compress-only)
- Compressed data has high entropy → bypasses entropy enforcement
- Must enforce encryption when compression enabled (CLI-side)
- Server detects compression formats via magic bytes (REQUIRE_BINARY)
- ETag support for conditional requests
- Neovim/Vim plugin for editor integration
- Webhook notifications for paste events

View File

@@ -39,14 +39,52 @@ MIME_PATTERN = re.compile(r"^[a-z0-9][a-z0-9!#$&\-^_.+]*/[a-z0-9][a-z0-9!#$&\-^_
# Magic bytes for binary format detection
MAGIC_SIGNATURES: dict[bytes, str] = {
# Images
b"\x89PNG\r\n\x1a\n": "image/png",
b"\xff\xd8\xff": "image/jpeg",
b"GIF87a": "image/gif",
b"GIF89a": "image/gif",
b"RIFF": "image/webp",
b"PK\x03\x04": "application/zip",
b"RIFF": "image/webp", # RIFF container, verified as WEBP in detect_mime_type
b"BM": "image/bmp",
b"II\x2a\x00": "image/tiff", # Little-endian TIFF
b"MM\x00\x2a": "image/tiff", # Big-endian TIFF
b"\x00\x00\x01\x00": "image/x-icon",
# Video/Audio containers (checked for subtype in detect_mime_type)
b"\x1a\x45\xdf\xa3": "video/webm", # Matroska/WebM
b"FLV\x01": "video/x-flv",
b"\x00\x00\x00\x1c\x66\x74\x79\x70": "video/mp4", # ftyp box at standard offset
b"\x00\x00\x00\x20\x66\x74\x79\x70": "video/mp4", # ftyp with different size
b"\x00\x00\x00\x18\x66\x74\x79\x70": "video/mp4", # ftyp with different size
# Audio
b"ID3": "audio/mpeg", # MP3 with ID3 tag
b"\xff\xfb": "audio/mpeg", # MP3 frame sync
b"\xff\xfa": "audio/mpeg",
b"\xff\xf3": "audio/mpeg",
b"\xff\xf2": "audio/mpeg",
b"fLaC": "audio/flac",
b"OggS": "audio/ogg",
# Documents
b"%PDF": "application/pdf",
b"\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1": "application/msword", # OLE (DOC, XLS, PPT, MSI)
b"PK\x03\x04": "application/zip", # ZIP, DOCX, XLSX, PPTX, ODT, JAR, APK
# Executables
b"MZ": "application/x-msdownload", # EXE, DLL
b"\x7fELF": "application/x-executable", # ELF (Linux)
b"\xfe\xed\xfa\xce": "application/x-mach-binary", # Mach-O 32-bit
b"\xfe\xed\xfa\xcf": "application/x-mach-binary", # Mach-O 64-bit
b"\xcf\xfa\xed\xfe": "application/x-mach-binary", # Mach-O 64-bit (reversed)
b"\xca\xfe\xba\xbe": "application/x-mach-binary", # Mach-O fat/universal binary
b"\x00asm": "application/wasm", # WebAssembly
# Compression/Archives
b"\x1f\x8b": "application/gzip",
b"BZh": "application/x-bzip2",
b"\xfd7zXZ\x00": "application/x-xz",
b"\x28\xb5\x2f\xfd": "application/zstd",
b"\x04\x22\x4d\x18": "application/x-lz4",
b"7z\xbc\xaf\x27\x1c": "application/x-7z-compressed",
b"Rar!\x1a\x07": "application/vnd.rar",
# Data
b"SQLite format 3\x00": "application/x-sqlite3",
}
# Generic MIME types to override with detection

View File

@@ -464,13 +464,16 @@ X-SSL-Client-SHA1: a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2
| Images | PNG, JPEG, GIF, WebP, BMP, TIFF, ICO |
| Video | MP4, WebM, FLV, Matroska |
| Audio | MP3, FLAC, OGG |
- GZIP: `\x1f\x8b`
2. **Explicit Content-Type header** (if not generic)
3. **UTF-8 detection** (falls back to `text/plain`)
4. **Binary fallback** (`application/octet-stream`)
| Documents | PDF, MS Office (DOC/XLS/PPT), ZIP-based (DOCX/XLSX/ODT) |
| Executables | EXE/DLL (PE), ELF (Linux), Mach-O (macOS), WASM |
| Archives | ZIP, GZIP, BZIP2, XZ, ZSTD, LZ4, 7z, RAR |
| Data | SQLite |
2. **Explicit Content-Type header** (if not generic)
3. **UTF-8 detection** (falls back to `text/plain`)
4. **Binary fallback** (`application/octet-stream`)
---