Understanding Image Encoding: Lossy vs. Lossless Compression

A detailed exploration of image encoding, covering the fundamental concepts, differences between lossy (like JPEG) and lossless (like PNG) compression techniques, and their underlying mechanisms.

Abhik SarkarAbhik Sarkar
12 min read

Introduction: Why Encode Images?

At its core, an image is a grid of pixels (picture elements), each with color information (often represented by Red, Green, and Blue values - RGB). A raw, uncompressed image can be very large. For example, a 12-megapixel photo with 24 bits per pixel (8 bits for R, G, and B each) requires 12 million pixels * 3 bytes/pixel = 36 megabytes of storage!

Image encoding is the process of converting this raw pixel data into a standardized digital format. A primary goal of encoding is usually compression: reducing the file size to make images easier and cheaper to:

  • Store: Less disk space needed on servers or personal devices.
  • Transmit: Faster loading times on websites, quicker sending via email or messaging apps, reduced bandwidth consumption.
  • Process: Smaller files can sometimes be processed more quickly by software.

Encoding transforms the raw image data into a bitstream according to the rules of a specific image format (like JPEG, PNG, GIF, WebP, etc.). The reverse process, turning the encoded file back into viewable pixels, is called decoding.

There are two fundamental approaches to image compression during encoding: lossy and lossless.

Lossy Encoding: Trading Quality for Size

Lossy encoding achieves significant file size reduction by permanently discarding some image information that is considered less perceptible to the human eye. The key idea is that not all data in an image is equally important for visual perception.

How it Works (Example: JPEG)

The most ubiquitous lossy format is JPEG (Joint Photographic Experts Group). Its compression process typically involves several steps:

  1. Color Space Transformation: The image is often converted from RGB to a luminance/chrominance space like YCbCr. Y represents brightness (luminance), while Cb and Cr represent color difference components (chrominance). This is done because human vision is much more sensitive to changes in brightness than changes in color.

  2. Chroma Subsampling: Leveraging the lower sensitivity to color, the Cb and Cr channels are often downsampled (e.g., storing one color sample for every 2x2 block of luminance samples). This immediately reduces the amount of color data to store, often with little visible impact.

  3. Block Splitting: The image (especially the Y channel) is divided into small blocks, typically 8x8 pixels.

  4. Discrete Cosine Transform (DCT): Each block undergoes a DCT. This mathematical transform converts the spatial pixel values (brightness levels within the block) into frequency coefficients. It separates the block's information into low-frequency components (representing gradual changes, the block's general appearance) and high-frequency components (representing sharp details, edges, textures, and noise). Most of the visual energy is usually concentrated in the low-frequency coefficients.

  5. Quantization: This is the primary lossy step. Each of the 64 frequency coefficients (from the 8x8 DCT) is divided by a corresponding value from a quantization table and rounded to the nearest integer. The quantization values are larger for higher frequencies, meaning fine details and potential noise (high-frequency components) are treated more coarsely – more information is discarded here. The level of compression is controlled by adjusting the values in this table (often via a "quality" setting from 1-100). Higher compression means larger divisors and more data loss.

  6. Entropy Coding: The resulting quantized coefficients (many of which are now zero, especially for high frequencies) are arranged and then compressed using a lossless algorithm (like Huffman coding or Arithmetic coding) to efficiently store the remaining data.

Trade-offs and Artifacts

  • Pros: Can achieve very high compression ratios (e.g., 10:1 or much higher). Ideal for photographs and complex images where perfect accuracy isn't paramount. Widely supported.
  • Cons: Information is permanently lost. At high compression levels, visible artifacts can appear, such as:
    • Blocking: The 8x8 block structure becomes visible.
    • Ringing/Mosquito Noise: Fuzzy or noisy edges around sharp details.
    • Color Bleeding: Colors may not be perfectly precise due to chroma subsampling.
  • Other Formats: Newer lossy formats like WebP (lossy mode), HEIC/HEIF, and AVIF often provide better compression efficiency than JPEG at similar visual quality levels, using more advanced techniques.

Lossless Encoding: Perfect Fidelity

Lossless encoding reduces file size without discarding any information. The original image can be perfectly reconstructed from the compressed file, bit for bit.

How it Works (Example: PNG)

Lossless compression algorithms work by finding statistical redundancies in the data and representing them more efficiently.

The PNG (Portable Network Graphics) format is a widely used lossless format. It typically employs a two-stage process based on the DEFLATE algorithm:

  1. Prediction (Filtering): Before compression, PNG often applies a prediction filter. For each pixel, it predicts its value based on neighboring pixels (e.g., the pixel above, the pixel to the left). It then stores only the difference between the actual pixel value and the predicted value (the prediction error). If the prediction is good (e.g., in areas of solid color or smooth gradients), these difference values will be small or zero, making them highly compressible. PNG supports several filter types, and an encoder can choose the best filter for each scanline to maximize compressibility.

  2. DEFLATE Compression: The filtered data (prediction errors) is then compressed using DEFLATE, which itself combines:

    • LZ77 algorithm: Finds repeated sequences of bytes and replaces them with shorter references (pointers) to previous occurrences.
    • Huffman coding: Assigns shorter bit codes to frequently occurring byte values (or LZ77 symbols) and longer codes to less frequent ones.

Use Cases

  • Pros: Perfect image quality preservation. Ideal for graphics with sharp lines, text, logos, icons, transparent backgrounds (PNG supports alpha channels), and medical or technical images where accuracy is crucial. Archival purposes.
  • Cons: Compression ratios are generally much lower than lossy methods, especially for complex photographic images (often only 2:1 or 3:1). Files can still be quite large.
  • Other Formats: WebP (lossless mode) often offers better lossless compression than PNG. GIF is lossless but limited to 256 colors. TIFF can use lossless compression (like LZW or ZIP/DEFLATE).

Lossy vs. Lossless: A Quick Comparison

FeatureLossy Encoding (e.g., JPEG)Lossless Encoding (e.g., PNG)
GoalMaximize file size reductionPreserve exact original data
Data PreservationDiscards data (irreversible)Retains all data (reversible)
File SizeSignificantly smallerModerately smaller (highly content-dependent)
Image QualityReduced, potential for visible artifactsIdentical to the original
Compression RatioHigh (e.g., 10:1 to 100:1)Lower (e.g., 1.5:1 to 5:1)
Common FormatsJPEG, WebP (lossy), HEIC, AVIFPNG, WebP (lossless), GIF, TIFF (lossless), BMP
Best Use CasesPhotographs, complex natural scenesLogos, icons, line art, text, transparency, archives

The Role of the Discrete Cosine Transform (DCT)

As mentioned in the Lossy Encoding section, the DCT is a critical step in JPEG compression. It doesn't compress the data itself, but it transforms an 8x8 block of pixels from the spatial domain (where each value represents pixel brightness) to the frequency domain.

In the frequency domain:

  • The top-left coefficient (DC coefficient) represents the average brightness of the entire block.
  • Coefficients moving towards the bottom-right represent progressively higher spatial frequencies (finer details, edges, textures).

This transformation is powerful because:

  1. It concentrates energy: Most of the visually important information tends to be captured in a few low-frequency coefficients.

  2. It decorrelates data: The frequency coefficients are less correlated than the original pixel values, making them easier to compress individually.

  3. It aligns with human perception: The subsequent quantization step can aggressively discard high-frequency information, which the human eye is less sensitive to, minimizing perceived quality loss for a given amount of compression.

The visualization below likely demonstrates how an image block can be reconstructed by summing up these weighted DCT basis patterns (representing different frequencies), starting from the most significant (low-frequency) components.

DCT Component Addition

See how DCT components and their coefficients add up to form the image

Original Image

Active Components

Components in Use:

Understanding the Values:

  • Each component has a coefficient (shown above each pattern)
  • Larger coefficients mean stronger contribution to final image
  • The DC component (0,0) usually has the largest coefficient
  • Notice how coefficient values tend to decrease for higher frequencies

Conclusion

Image encoding is essential for managing digital images efficiently. The choice between lossy and lossless encoding depends entirely on the requirements:

  • Choose lossy (like JPEG, WebP-lossy, AVIF) when file size is a major concern and some loss of quality is acceptable (e.g., web photos).
  • Choose lossless (like PNG, WebP-lossless) when perfect fidelity is required (e.g., logos, technical diagrams, archival).

Understanding the basic principles behind these methods helps in selecting the right format and compression level for your specific needs.

Abhik Sarkar

Abhik Sarkar

Machine Learning Consultant specializing in Computer Vision and Deep Learning. Leading ML teams and building innovative solutions.

Share this article

If you found this article helpful, consider sharing it with your network

Related Articles

Mastodon