Identifying Fonts In Pdf Documents

What Font is Used in My PDF?

When working with PDF documents, it is often necessary to identify the fonts used in order to edit, reuse, determine licensing requirements, or troubleshoot formatting issues. Unlike digital text documents like Word or Text Edit files, PDFs lock down the appearance of text, including fonts, using a mixture of vector graphics, raster images, and textual metadata. Finding what fonts are used requires extracting or inspecting this metadata in various ways.

There are several techniques that can be used to determine some or all the fonts used within a PDF document:

  • Copying text from the PDF and pasting into a font identification tool
  • Using PDF parsing tools like pdffonts or pdf2txt
  • Inspecting font metadata and definitions with pdfinfo
  • Visually matching font glyphs to identify unknown fonts

The best approach depends on needs - whether one needs to determine all fonts, spot check for specific fonts, extract reusable text, or confirm metadata. This guide covers the key methods and tools available to interrogate PDFs and identify fonts within them.

PDF Font Extraction Methods

A common need is to extract the textual content of a PDF while preserving font styling in some way. This usually involves an extraction or parsing tool to save the text along with font information. Here are some options:

Copy-paste text to extract fonts

The most basic approach is to manually select and copy text from the PDF using the mouse, then paste into a target document. This will bring along basic style and font information to the extent supported by the destination document type. For example, pasting into a Word document will attempt to map the fonts and styling used in the PDF - though it is limited by available fonts installed on the system. The result is a rough approximation for simple extraction needs.

Use pdf2txt to extract all text

For more advanced batch extraction, a utility like pdf2txt can convert all text in a PDF to a Text file while retaining some rudimentary styling like bold and italic formatting. It cannot preserve actual font names or custom styling, but provides a rough text representation. The txt file references unknown fonts using aliases like "UnknownFont03050" so fonts cannot be reliably identified. But it serves well for extracting raw PDF text en masse for search or index purposes.

Extract fonts with pdffonts utility

To directly extract font resource information from a PDF, the pdffonts utility can be used in a terminal. pdffonts will output a complete list of fonts referenced and utilized within the PDF file, along with details like font names, encodings, embedding status, and font types. This provides the most precision in identifying what fonts are actually embedded or referenced by a PDF document short of analyzing font metadata.

Identify fonts with online PDF font checker tools

Online PDF tools now provide automated font identification capabilities. By uploading a PDF document to an online font checker tool, OCR analysis attempts to identify primary fonts within the document text, along with percentage usage. This provides an approximate overview of main fonts in use without needing to install desktop software. However, precision may vary widely depending on document contents and layout complexity.

Finding Font Names and Styles in PDF Metadata

In addition toextraction tools, we can also directly inspect the font metadata within a PDF itself to identify fonts used, as well as their type and encoding.

Description of font info stored in PDF metadata

Behind the text visible on a PDF page there lies a complex layer of font definitions and mappings which tie written characters to visual glyphs in specific fonts and encodings. Text without fonts explicitly defined instead falls back system default fonts. The PDF metadata stores details on every font referenced or utilized within the file. This includes:

  • Font name - The actual named font used, like Times New Roman or Arial
  • Font type - Such as TrueType or Type 1 PostScript fonts
  • Encoding - The character encoding scheme mapping bytes to glyphs
  • Fallback fonts - Alternatives if fonts are missing/unavailable
  • Embedding status - Whether included/embedded or left as a reference

This metadata can be inspected to identify precisely what textual assets a PDF file relies on for presenting content. We'll look at how next.

Using pdfinfo to view font names and styles

The pdfinfo utility, commonly included in most PDF software suites, offers the easiest way to directly output PDF font metadata to the terminal or command line. The output contains sections enumerating every font referenced by the PDF file, along with key details like PostScript name, font type, encoding scheme used, whether the font is embedded and subsetting status.

For example, running "pdfinfo document.pdf" could generate output as follows for the Times New Roman font:

Name: FRMAEB+TimesNewRomanPSMT
Type: Font
Encoding: WinAnsiEncoding  
Embedded: yes
Subset: no

The "Name" field provides the actual PostScript font name used internally by PDF viewers for precise glyph lookup. We can see Times New Roman is identified by the abstract reference "FRMAEB+TimesNewRomanPSMT" rather than its familiar public name. The following fields tell us it uses standard Windows character encoding, is fully embedded in the PDF, and not subsetted - meaning all glyphs included. This level of detail from pdfinfo helps pinpoint the exact font assets utilized in a PDF for troubleshooting or identification.

Identifying Unknown Fonts Visually

In many cases a PDF utilizes unusual or custom fonts which are visually recognizable but lack readable metadata entries. In these cases visual font recognition techniques can help identify fonts where text-based lookup fails.

Techniques for visually matching fonts to determine name/version

The first step with an unknown font face is to visually scan letters or samples and compare to reference databases:

  • Match glyph sets - Examine letter and number glyphs in detail to match shapes to known typefaces.
  • Compare relative size - Determine metrics like x-height and cap heights relative to lowercase glyphs.
  • Identify variants - Recognizebold, italic, condensed variants to narrow down matches.
  • Scan metadata - Check PDF metadata fields for any font copyright or registered trademark references.

These characteristics can then be matched against font aggregators and online databases such as MyFonts.com, Fontspring, or Fontsquirrel to narrow down to a specific font family and variant.

Resources like Fontsquirrel for matching unknown fonts

Resources like Fontsquirrel provide a database of over 200,000 fonts to visually browse, search and match unrecognized fonts. Their Font Matcherator tool also lets users upload font samples or images to assist in identifying font names and styles. Such services make short work of determining custom or obscure fonts embedded in existing PDF artwork where metadata alone fails to identify the fonts used.

Embedding Custom Fonts in PDF Exports

To maximize text reuse and accuracy in PDF exports, it is best practice to properly embed any non-standard fonts used rather than rely on fallback system fonts later. Here are guidelines on handling custom font usage when generating PDFs from source documents.

Benefits of font embedding vs. font subsetting

When adding custom fonts to a PDF export there are two choices - either embed the entire font, or subset the font. Font subsetting only includes specific glyphs used in the document text, reducing file size. However this loses the ability to edit or add text using glyphs not originally included. There are advantages to embedding full fonts vs subsetting:

  • Editability - Entire character maps available for text insertions/amendments.
  • Metadata - Accurate font information reflected throughout.
  • Reusability - Fonts preserved independently of source documents.

The downside is larger file sizes. As a rule of thumb, subset custom fonts below 100 kB, but embed commonly reused fonts like Arial and Times New Roman in full.

How to export PDFs from programs like Word while embedding fonts

Most applications with PDF export functionality provide options to embed fonts. For example in Microsoft Word's PDF export dialog, under Options > Fonts you can check boxes to embed all fonts or embeddings non-standard fonts. This will force fonts like Times New Roman, Arial etc to embed in full, while subsetting fonts over 100 kB.

Adobe Acrobat's PDF optimizer tool can also analyze font usage within an existing PDF and determine which appear unsafe to subset vs embed fully before finalizing a PDF intended for distribution.

Troubleshooting Font Problems in PDFs

Utilizing the techniques explored here allows diagnosing and fixing underlying font issues which may cause viewing or editing problems in PDF files.

Fixes for missing/unsupported fonts and font substitution

When a PDF tries utilizing a font not present on the viewing system it falls back to a default substitute font. This can override carefully chosen typefaces and branding. To avoid this:

  • Embed fonts - Add needed fonts directly to the PDF itself.
  • Update system fonts - Install missing fonts matching those referenced.
  • Change substitution preferences - In Acrobat, adjust Edit > Preferences > General to prompt when substituting.

embedding full fonts prevents cascade failures from trying to simulate missing typefaces. Update Windows/Mac fonts regularly to match common legacy typefaces still relied upon in PDF artifacts.

Methods to edit texts by adding/replacing fonts

To directly edit text content within an existing PDF, options to add replacement or missing fonts include:

  • OCR-based workflows - Scan document to extract fonts along with text.
  • Font linking in editing tools - Apps like Acrobat can link in fonts while amending texts.
  • Advanced font resource copying - Copy font assets between PDFs via command line utilities.

Best results come from using fonts which closely replicate the original typeface conventions used to author the PDF content, or OCR extraction followed by substitutions.

Leave a Reply

Your email address will not be published. Required fields are marked *