Extract Links from PDF in Java

In this short how-to article, you will learn how to extract links from PDF in Java. It contains the IDE settings, a list of steps, and a sample code to extract hyperlinks from PDF in Java. You will learn to fetch link type annotations and transform them to URIAction for fetching the URI.

Steps to Extract URL from PDF in Java

  1. Set the IDE to use Aspose.PDF for Java to extract links
  2. Load the source PDF file, iterate through all the pages, and create an annotation selector for the page
  3. Extract all the annotations from the page and save them in the Selected collection
  4. Iterate through all the annotations and typecast each annotation to the GoToURIAction
  5. Invoke the getURI() method to access the link and display it on the console

This guide has shown how to extract all links from PDF in Java. Load the source PDF file, access the target pages, and create an annotation selector for each page. Call the accept() method using the defined selector, fetch the list of link annotations and fetch the URI by typecasting it to GoToURIAction class.

The above code has demonstrated a PDF link extractor in Java. You may skip or select a page by analyzing its contents using the Page class object while iterating through the pages in the PDF. The getAction() method is used to fetch the URIAction that contains the URI of the link.

In this article, we have learned the process of fetching hyperlinks from a PDF. To create hyperlinks in a PDF, refer to the article on how to create hyperlink in PDF using Java.

 English