Complete Guide to PDF Compression
Blog

Complete Guide to PDF Compression

Choose the best compression solution based on PDF document type and usage scenario

Català

PDF Basics

Before compressing a PDF, it's crucial to understand the basic structure of PDF files and determine the purpose of compression, which will help us choose the most appropriate compression strategy.

Different types of PDFs require different compression methods

Black and white text scans: JBIG2 compression algorithm can reduce file size by 95-98% while keeping text clear and readable

Color text scans: MRC compression technology can reduce file size by 70-85% while preserving colors

Text-based PDFs: Native text content is already highly optimized; compression is mainly achieved by optimizing embedded images, typically reducing file size by 10-30%

Vector graphic documents: Such as CAD drawings, can be compressed by rasterizing to bitmap, then applying black and white or grayscale encoding, reducing file size by 60-80%

Choosing the right compression method is key to ensuring both reduced file size and preserved document quality.

Difference Between Text-Based and Image-Based PDFs

PDF files contain various elements, mainly text, images, and vector graphics. Sometimes content that appears to be text might actually be an image or vector graphic.

Quick tips for identifying PDF types

Use Chrome or Edge browser to open the PDF; content that can be selected and highlighted is actual text elements.

Text-based PDFs: Content consists of actual text elements that can be selected and searched

Image-based PDFs: Content exists as images, such as scanned documents or text presented as images/vectors

PDF Elements Suitable for Compression

Different elements in PDFs have varying compression potential:

  • Text elements: PDF format has already compressed these initially, with limited additional compression space (typically only 5-10% reduction)
  • Repeated objects, attachments, and fonts: Most compression tools can handle these effectively, but compression space is limited (generally 10-20% reduction)
  • Images and vectors: Core objects for PDF compression, offering the greatest compression potential (commonly 50-90% volume reduction)

Compression Strategies

Text-Based PDF Compression

For PDFs that are primarily text:

  1. Clean redundant data: Remove comments, form fields, repeated objects
  2. Font optimization: Use subset fonts or standard fonts
  3. Image quality adjustment: Appropriately reduce the quality of images in the document

Recommended tool: PDF Compression Tool

Note that text-based PDFs are usually already optimized during creation, with additional compression space mainly coming from images and vector elements in the document.

Vector Graphics Compression

Vector graphics remain clear even when enlarged and are essentially a series of data instructions describing graphics.

CAD drawings are typical examples containing large amounts of vectors. Although vectors themselves are difficult to compress, significant compression can be achieved by converting them to bitmaps (especially black and white images). Converting to black and white bitmaps typically reduces the original file by 60-85%, while converting to grayscale can reduce it by 50-70%.

Operation steps:

CAD Drawing Conversion Considerations

Complex vectors in PDFs can affect format conversion results.

PDFs containing CAD drawings are prone to errors when converting to Office formats because the large number of vectors in CAD drawings interfere with the conversion process.

The solution is to first rasterize pages containing CAD to images, then perform format conversion.

Image Compression Techniques

PDF supports multiple image compression algorithms; choose the appropriate algorithm based on different scenarios.

Black and White Document Compression

Suitable for documents that can be presented in black and white (such as pure text scans), the JBIG2 algorithm does not store color information and can achieve compression rates of over 98%.

Best for: Text scans, line drawings
Compression ratio: 2-5% of original file size (95-98% reduction)

Processing steps:

Black and White Conversion Compression Processing

Grayscale Document Compression

Suitable for documents primarily used for black and white or grayscale printing, compression is achieved by abandoning color information.

Best for: Documents for printing, content that doesn't need to preserve colors
Compression ratio: 15-30% of original file size (70-85% reduction)

Processing steps:

Grayscale Conversion Compression Processing

Color Document Compression

For digital documents that need to preserve colors, Mixed Raster Content (MRC) technology is the best choice. MRC processes images by dividing them into three layers:

  • Foreground layer: Contains text and lines, processed at high resolution
  • Background layer: Contains photos or image backgrounds, processed at lower resolution
  • Mask layer: Defines the boundary areas between foreground and background

MRC simplifies image information by merging similar pixel areas, particularly suitable for color scans, significantly reducing file size while maintaining clarity. Compared to traditional compression methods, MRC technology typically produces 30-50% better compression results for color documents.

Best for: Color scans, mixed documents containing text and images
Compression ratio: 15-40% of original file size (60-85% reduction)

Recommended tool: Powerful Compression (Select "Powerful Compression" level to enable MRC technology)

Compression Effect Comparison

The following table summarizes the expected effects of various compression methods for different types of documents:

Document TypeRecommended Compression MethodExpected Compression RatioQuality ImpactApplicable Scenarios
Pure Text PDFStandard Compression90-95% of original sizeAlmost no impactElectronic documents primarily containing text
Text PDF with few imagesStandard Compression70-90% of original sizeSlight impactReports, articles
Black and white text scansJBIG2 algorithm2-5% of original sizeText edges may be slightly blurredScanned documents, contracts
Grayscale documentsGrayscale conversion + compression15-30% of original sizeLoss of color informationDocuments for printing
Color scanned documentsMRC technology15-40% of original sizeSlight loss of detailScanned magazines, color reports
CAD drawingsRasterization + black and white conversion15-40% of original sizeLoss of vector properties, cannot be editedEngineering drawings for viewing only
PDF with mainly photosStandard image compression30-60% of original sizeDepends on compression levelPhoto collections, product catalogs

Choosing the Best Solution

Based on PDF document characteristics and purpose, follow this decision flow to choose the best solution:

  • Documents primarily containing text: Use standard compression, manually adjust image resolution and compression method if necessary
  • Documents containing numerous CAD vectors: First rasterize vector pages, then convert to black and white or grayscale
  • Black and white scanned documents: Apply JBIG2 algorithm
  • Documents for grayscale printing: Convert to grayscale images
  • Documents requiring color preservation: Use MRC technology (powerful compression)

Compression Considerations

Important Risk Notice

Compression and quality trade-off: Higher compression rates usually mean lower image quality; find a balance between file size and quality

Backup original files: It's advisable to keep the original files before applying high compression rates

Choose strategy based on purpose: Select different compression strategies based on the document's final purpose (screen reading, printing, or archiving)

Summary

PDF compression is an art of balance, requiring the selection of the best compression method based on document type and purpose. By understanding the basic composition of PDFs and characteristics of different compression algorithms, we can maximize file size reduction while maintaining appropriate clarity.

Recommended strategies:

  • Text-based PDFs: Use standard compression
  • Black and white scans: Apply JBIG2 algorithm
  • Important color documents: Use MRC technology

Practical Tools