Understanding GGML Files: A Comprehensive Visualization of File Structure

GGML Structure

Q8_0 Block Structure (34 bytes)

Basic block with highest precision

Block Header
• d (scale) - FP16
• 2 bytes
32 Quantized Values (8-bit each = 32 bytes)
q0
8-bit×d
q1
8-bit×d
q2
8-bit×d
q3
8-bit×d
q4
8-bit×d
q5
8-bit×d
q6
8-bit×d
q7
8-bit×d
... (32 total values)
Range: [-128 → 127] signed integer

Q4_K_M Superblock Structure (160 bytes)

Advanced structure with separate scales and minimums for better accuracy

8 Block Scales (16 bytes)
scale_0
(FP16)
scale_1
(FP16)
scale_2
(FP16)
scale_3
(FP16)
scale_4
(FP16)
scale_5
(FP16)
scale_6
(FP16)
scale_7
(FP16)
8 Block Minimums (16 bytes)
min_0
(FP16)
min_1
(FP16)
min_2
(FP16)
min_3
(FP16)
min_4
(FP16)
min_5
(FP16)
min_6
(FP16)
min_7
(FP16)
256 Quantized Values (4-bit each = 128 bytes)
q0
4-bit×s0
q1
4-bit×s0
q2
4-bit×s0
q3
4-bit×s0
q4
4-bit×s0
q5
4-bit×s0
q6
4-bit×s0
q7
4-bit×s0
q8
4-bit×s0
q9
4-bit×s0
q10
4-bit×s0
q11
4-bit×s0
... (256 total values)
Range: [0 → 15] unsigned integer

Q3_K_S Block Structure (14 bytes)

Simple 3-bit quantization with scaling

Block Header
• scale (FP16)
• 2 bytes
32 Quantized Values (3-bit each = 12 bytes)
q0
3-bit×scale
q1
3-bit×scale
q2
3-bit×scale
q3
3-bit×scale
q4
3-bit×scale
q5
3-bit×scale
q6
3-bit×scale
q7
3-bit×scale
... (32 total values)
Range: [-4 → 3] signed integer, packed bits

Q3_K_L Block Structure (16 bytes)

3-bit quantization with lookup table optimization

Block Headers
• scale (FP16)
• lookup (FP16)
• 4 bytes total
32 Quantized Values (3-bit each = 12 bytes)
q0
3-bit×scale+lut
q1
3-bit×scale+lut
q2
3-bit×scale+lut
q3
3-bit×scale+lut
q4
3-bit×scale+lut
q5
3-bit×scale+lut
q6
3-bit×scale+lut
q7
3-bit×scale+lut
... (32 total values)
Range: [-4 → 3] with lookup table mapping

Q5_K_M Block Structure (22 bytes)

5-bit quantization with block minimum

Block Headers
• scale (FP16)
• min (FP16)
• 4 bytes total
32 Quantized Values (5-bit each = 20 bytes)
q0
5-bit×scale
q1
5-bit×scale
q2
5-bit×scale
q3
5-bit×scale
q4
5-bit×scale
q5
5-bit×scale
q6
5-bit×scale
q7
5-bit×scale
... (32 total values)
Range: [0 → 31] unsigned with minimum offset

Q5_1 Block Structure (22 bytes)

5-bit quantization with zero-point optimization

Block Headers
• scale (FP16)
• zero-point (FP16)
• 4 bytes total
32 Quantized Values (5-bit each = 20 bytes)
q0
5-bit×scale
q1
5-bit×scale
q2
5-bit×scale
q3
5-bit×scale
q4
5-bit×scale
q5
5-bit×scale
q6
5-bit×scale
q7
5-bit×scale
... (32 total values)
Range: [0 → 31] with zero-point adjustment

Memory Layout Summary

8-bit Format
  • • Q8_0: 34 bytes total
  • • 2B header + 32B data
  • • [-128 → 127] range
  • • w = q × d
4/5-bit Formats
  • • Q4_K_M: 160B superblock
  • • Q5_K_M: 22B block
  • • Q5_1: 22B block
  • • w = q × scale + min/zero
3-bit Formats
  • • Q3_K_S: 14B simple
  • • Q3_K_L: 16B with LUT
  • • [-4 → 3] range
  • • w = q × scale (or LUT)

Sources

Mastodon