Input and compilation/PDF/PDF Validation
Contents
Introduction
PDF stands for portable document format. As such is a published standard.[1]
A file format specifies how data are organized for a type of file. This includes which kinds of data (such as fonts, video, sound, images…) are allowed and how those data are structured in relation to each other.
PDF is no exception. Since the format specification takes much more than five hundred pages, it is easy to make minor mistakes writing PDF generators.
This is not something special either for PDF, for ConTeXt or for Lua(Meta)TeX. As a general rule, we humans make mistakes and commit errors.
This page aims to present the habit of validating (or testing) PDF documents when they misbehave. As any other habit, it gets better with time (and practice).
Of course, validating PDF documents investigates errors in the format itself and not in your ConTeXt (or Lua) source.
Functional or Perfect?
Depending on how a PDF document is validated (at how many bytes it contains), it might be really hard to have a document that contains no deviation from the standard.
Perfection is a goal, not an starting point. Even PDF documents generated by Adobe products may contain what are called “deviations from the PDF model”.
Needless to say, it would be stupid to use validation tools to blame any product (or worst, any person) for errors or mistakes.
It is helpful to check (or validate) PDF documents, only when they don’t work as they would be supposed to.
With ConTeXt, the validation information may be added to a message to the mailing list, but only as a clue clue of what may be wrong with the issue at stake.
Of course, validation is only a way to help to solve issues, not to make ConTeXt completely perfect (in a perfect world, there would be no human being).
Before Sharing Any PDF Documents
It is not uncommon that PDF documents with format issues may contain also personal or other sensitive data.
Please, think twice before sharing them in public forums (such as the mailing list or GitHub).
It may be problematic to replicate the format issue in a minimal document, but you may end regretting not having considered this in advance.
Tools
Use the Most Current Version
There are mainly not so many options for validating PDF documents (that I have used and I can explain).
These ones are actively developed, so having its latest version is crucial to provide accurate information.
pdfcpu
pdfcpu
(latest version at https://github.com/pdfcpu/pdfcpu/releases/latest) enables strict testing with the following command:
pdfcpu validate --mode=strict document.pdf
What pdfcpu
calls relaxed validation, may be achieved with:
pdfcpu validate --mode=strict document.pdf
qpdf
qpdf
(latest version at https://github.com/qpdf/qpdf/releases/latest) allows testing with:
qpdf --check document.pdf
The Arlington PDF Model
There is a formal specification of PDF features called “the Arlington PDF model” (40′–presentation by Peter Wyatt).[2]
Be warned that this is the hardest way of all. It is also the most accurate way of validation.
It requires some previous knowledge of PDF itself as format (otherwise, all references may be unintelligible).
File Checker
The veraPDF consortium releases an open source file checker at https://software.verapdf.org/develop/arlington/.
It is written in Java and it checks the conformance to the format version for the given PDF document (from 1.0 to 2.0, with extensions).
PDF/A
There is also a document validator from the veraPDF consortium specifically targeting PDF/A and PDF/UA formats.
Also written in Java, it is available at https://software.verapdf.org/develop/.
It checks conformance in files for PDF/A from versions 1A to 4F, for PDF/UA versions 1 and for WTPDF Accessibility and Reuse (well–tagged PDF for accessibility and reuse).
Notes
- ↑ It had been developed and made available on internet by Adobe. Since version 1.7 (of the format itself, not of Acrobat), it is being published by the International Standardization Organization (and not made available on the net).
So the most current version publicly available is 1.7 (as typeset by the ISO and in as typeset by Adobe) - ↑ Just in case you wonder, the link points to links for an alternative frontend to YouTube.
If you are not comfortable with that, the first link to the video on that page points to YouTube itself.