Digging for information by extracting data from a PDF document

Nadine Schuppisser // December 8, 2016

Member News

Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple - but also tricky.

Among the easiest things to extract you'll find metadata. The document metadata can usually be extracted as a short XMP stream. Even if the document contains an old fashioned information dictionary then the extraction of the key / value pairs is not a big deal. Similar are outlines (bookmarks), navigation aids such as named destinations, links and the like.

Read more on how to extract information from a PDF document in our PDF expert blog