Developing PDF Techniques for Accessibility

This page summarizes certain findings from the PDF Techniques Accessibility Summit conducted December 10-11, 2018 in Edinburgh, Scotland. It will be periodically updated with process changes and other details.

Contents

Questions?

Email the Liaison Working Group members at pdf-techniques-for-accessibility-lwg@googlegroups.com, or Duff Johnson at duff.johnson@pdfa.org.

Example files

Text in two columns: Examples should be not be real-world documents. Extremely specific use-cases are preferred, in which the only content on the PDF page is that which is pertinent to the example itself. The example to the right could, for example, be used to demonstrate the right way to tag two-column content. To keep things simple, this would be the only content in the document.

Examples of examples

In general, it’s easy to know what is and what is not correctly tagged. For example, it’s incorrect to use two (paragraph) tags enclosing different parts of a single paragraph. This fact, however, does not stop many implementations and users from getting this basic aspect of semantic structure wrong. Based on PDF/UA and WCAG 2.1, the Summit seeks to review and address not only the more difficult cases for tagging, but also to cover common cases which nonetheless generate many errors. Examples of examples for input into the summit could include:

  • Content that spans multiple pages
  • Page numbers
  • Footnotes
  • Use of ActualText
  • Lists within the of a list item

Both conforming (accessible) and fail (inaccessible) examples are welcome. Specifically excluded as cases of examples for input are purely content-related cases such as:

  • Color-contrast choices
  • Quality of alternative text
  • Complexity of the content
  • Choice of fonts
  • Choice of layout

Creating Examples

  • Example PDFs must be fully atomic, including the least amount of content necessary to the example's purpose. However, ALL content necessary to the example's purpose is necessary. For example, a blockquote only makes sense, semantically, when there's a paragraph to distinguish it. Likewise, it would be incorrect to provide, e.g., a single <TD>, since the element only makes sense in the context of a <Table>.
  • Use fully generic language and images in all examples

Templates

There is no specific template for examples, but in addition to being extremely simple, as stated above, we do want example PDF files to be stylistically similar.

  • This DOCX file is a legitimate place to start. Note that this file is NOT a pass or a fail example! It's just a starting-point for your PDF examples, pass, fail or otherwise.
  • This InDesign example, kindly provided by Klaas Posselt, is a fine option for those starting from InDesign.

Fail examples

  • The basic requirement for FAIL examples is that the example is always and without exception a FAIL.
  • All FAIL examples must include the term "FAIL" in sans-serif font, all-CAPS, red boldface text at 18 points or larger in the top-left corner of the page. The FAIL text must be correctly tagged (i.e., it cannot be part of why the file fails).
  • Ensure FAIL examples are 100% conforming ISO 32000 files except for the exact fail condition

Technical considerations

In particular, participants identified the following checklist of technical problems to be avoided in all examples created for this purpose:

  • Container titles not aligned with structure elements
  • Empty attribute tables on structure elements
  • Redundant lang attributes (on content containers in addition to document-level)
  • Unaligned content and logical ordering (unless it serves the purposes of the example)
  • T key present but empty
  • Extraneous elements such as AcroForms or OutputIntents
  • Fonts not embedded or subsetted
  • Fail cases with the PDF/UA flag
  • Pass cases without the PDF/UA flag
  • Unnecessary owner dictionaries
  • Uncompressed content streams
  • Uncompressed object streams

TagChecker

To maximize the utility of Techniques examples to developers, it was determined that all examples should conform to best practices in terms of ISO 32000. Specifically, that files should exhibit conformance with not only ISO 32000's requirements, but also to its strong recommendations ("should" statements).

Roman Toda has prepared a plug in for Adobe Acrobat (both Windows and Mac!) that should help with the manual process of checking and cleaning samples to ensure they are as canonically correct as possible.

TagChecker performs a few tasks (like removing empty ClassMap entries) identified in the summit, and more improvements are possible based on requests.  The software is available from Github; check the readme, download the binary, test for yourself: https://github.com/Normex/TagCheckerPI 

Jira operations - best practice

The following best practices should be used to guide your approach to working with the Jira project for PDF Techniques development:

  • Ensure the issue title / description is tight (e.g. “Nested list”, “LBody containing paragraphs”)
  • Ensure that new examples do not duplicate existing examples
  • Ensure the metadata fields for each example are complete
  • If you work on a given example, assign it to yourself
  • When working on an example, add comments in the Jira to indicate all changes you make to the file
  • If you stop working on a given example, remove the assignment while leaving a Jira comment to indicate the problems that remain
  • Do not move examples to Deliberation unless the checklist (see above) is cleared

Downloading test files in batch

To download attachments from multiple issues:
  1. Login to the Jira, find “View all issues and filters” to expose the issue search dialog.
  2. Enter a search query in Jira Query Language that describes your search request (see JQL documentation). An example query: project = "PDF/UA" and "Pass / Fail" = Pass and status in ("Initial Review", Deliberation)
  3. From the “Export” menu, select “Attachments”.
  4. Select a folder configuration
  5. Optionally, filter your attachments for just the PDFs.

Workflow

Any member of the LWG should feel free to email the list with questions about this process!

2019 February updated workflow.

The summit’s workflow depends on a stream of candidate example files input by summit attendees. The process of the summit is to review, refine and disposition these examples.  The workflow stages are defined as follows:

Step 1: Reported

Check to determine whether the input is useful in principle.

Workflow

  1. Screen for duplicates
  2. Validate the contents of the Summary / Description field
  3. Validate the Components field

Step 2: Normalization (If necessary)

Re-create PDF examples to make them suitable for tagging.

Workflow

  1. Check to ensure the content is neutral.
  2. Check semantics (e.g., a Heading without an actual page content)

Step 3. Reviewed / Tag Ready

Develop and apply the tagging technique.

Workflow

  1. Tag the document

Step 4. Deliberation

LWG discussion regarding the tagging technique.

Workflow

  1. Accept or reject the technique
OR
  1. Postpone if further information/work is required

Step 5 Postponed

Provide further information or rework the tagging technique. The main purpose is to keep the Deliberation clean of previously-discussed cases.

Workflow

  1. Wait for additional feedback
  2. Return back to Deliberation when ready

Annex: Elements of an Example

Each example in the system includes the following fields. Ideally, every field would be completed for each example.

Field Usage
Summary A very concise description of the example, such as: “Page numbers”
Component(s) One or more relevant subclauses from PDF/UA-1
Description The purpose of the example. If it's clear from the summary you can leave this field empty.
Attachment Attach the example file. Please also attach the source file, to make it easier to recreate the example if necessary!
Use cases This field is optional, and may be filled as part of Deliberation
Concerning Identifies all structure elements relevant to the example
Example type Helps process examples from various origins
Reason Used when dispositioning the example
AT support AT (e.g., NVDA, JAWS) which support the example
 Pass / Fail Indicate if the example is an example of a passing or failing file. This field also allows for "should" and "may" values to capture best practice examples in addition to purely validity considerations.
PDF Technique If applicable, indicate the relevant existing WCAG Technique for PDF
Matterhorn Protocol Indicate the relevant Matterhorn Protocol checkpoint
WCAG 2.1 SC Indicate the relevant WCAG 2.1 Success Criteria
Comment Any additional information you want to provide