Understanding GGML Files: A Comprehensive Visualization of File Structure
GGML Structure
Q8_0 Block Structure (34 bytes)
Basic block with highest precision
Block Header
• d (scale) - FP16
• 2 bytes
• 2 bytes
32 Quantized Values (8-bit each = 32 bytes)
q0
8-bit×dq1
8-bit×dq2
8-bit×dq3
8-bit×dq4
8-bit×dq5
8-bit×dq6
8-bit×dq7
8-bit×dRange: [-128 → 127] signed integer
Q4_K_M Superblock Structure (160 bytes)
Advanced structure with separate scales and minimums for better accuracy
8 Block Scales (16 bytes)
scale_0
(FP16)
(FP16)
scale_1
(FP16)
(FP16)
scale_2
(FP16)
(FP16)
scale_3
(FP16)
(FP16)
scale_4
(FP16)
(FP16)
scale_5
(FP16)
(FP16)
scale_6
(FP16)
(FP16)
scale_7
(FP16)
(FP16)
8 Block Minimums (16 bytes)
min_0
(FP16)
(FP16)
min_1
(FP16)
(FP16)
min_2
(FP16)
(FP16)
min_3
(FP16)
(FP16)
min_4
(FP16)
(FP16)
min_5
(FP16)
(FP16)
min_6
(FP16)
(FP16)
min_7
(FP16)
(FP16)
256 Quantized Values (4-bit each = 128 bytes)
q0
4-bit×s0q1
4-bit×s0q2
4-bit×s0q3
4-bit×s0q4
4-bit×s0q5
4-bit×s0q6
4-bit×s0q7
4-bit×s0q8
4-bit×s0q9
4-bit×s0q10
4-bit×s0q11
4-bit×s0Range: [0 → 15] unsigned integer
Q3_K_S Block Structure (14 bytes)
Simple 3-bit quantization with scaling
Block Header
• scale (FP16)
• 2 bytes
• 2 bytes
32 Quantized Values (3-bit each = 12 bytes)
q0
3-bit×scaleq1
3-bit×scaleq2
3-bit×scaleq3
3-bit×scaleq4
3-bit×scaleq5
3-bit×scaleq6
3-bit×scaleq7
3-bit×scaleRange: [-4 → 3] signed integer, packed bits
Q3_K_L Block Structure (16 bytes)
3-bit quantization with lookup table optimization
Block Headers
• scale (FP16)
• lookup (FP16)
• 4 bytes total
• lookup (FP16)
• 4 bytes total
32 Quantized Values (3-bit each = 12 bytes)
q0
3-bit×scale+lutq1
3-bit×scale+lutq2
3-bit×scale+lutq3
3-bit×scale+lutq4
3-bit×scale+lutq5
3-bit×scale+lutq6
3-bit×scale+lutq7
3-bit×scale+lutRange: [-4 → 3] with lookup table mapping
Q5_K_M Block Structure (22 bytes)
5-bit quantization with block minimum
Block Headers
• scale (FP16)
• min (FP16)
• 4 bytes total
• min (FP16)
• 4 bytes total
32 Quantized Values (5-bit each = 20 bytes)
q0
5-bit×scaleq1
5-bit×scaleq2
5-bit×scaleq3
5-bit×scaleq4
5-bit×scaleq5
5-bit×scaleq6
5-bit×scaleq7
5-bit×scaleRange: [0 → 31] unsigned with minimum offset
Q5_1 Block Structure (22 bytes)
5-bit quantization with zero-point optimization
Block Headers
• scale (FP16)
• zero-point (FP16)
• 4 bytes total
• zero-point (FP16)
• 4 bytes total
32 Quantized Values (5-bit each = 20 bytes)
q0
5-bit×scaleq1
5-bit×scaleq2
5-bit×scaleq3
5-bit×scaleq4
5-bit×scaleq5
5-bit×scaleq6
5-bit×scaleq7
5-bit×scaleRange: [0 → 31] with zero-point adjustment
Memory Layout Summary
8-bit Format
- • Q8_0: 34 bytes total
- • 2B header + 32B data
- • [-128 → 127] range
- • w = q × d
4/5-bit Formats
- • Q4_K_M: 160B superblock
- • Q5_K_M: 22B block
- • Q5_1: 22B block
- • w = q × scale + min/zero
3-bit Formats
- • Q3_K_S: 14B simple
- • Q3_K_L: 16B with LUT
- • [-4 → 3] range
- • w = q × scale (or LUT)