Create Searchable PDFs

Create Searchable PDFs

We worked for months on it, and now it is available: Make searchable PDFs from scans and images for free with our Online OCR form and the OCR API.

This works with PDFs with images inside (such as the ones you get from scanning documents) but also works with images (jpeg/png/gif). The image is converted to a searchable PDF that you can download.

Try it here and select the “Create visible text layer” option. This way you can easily see the text layer that gets added during the conversion.

A searchable PDF file is a PDF file that includes text that can be searched upon using the standard PDF Reader “search” functionality. In addition, the text can be selected and copied from the PDF. Generally, PDF files created from Microsoft Office Word and other documents are by their nature searchable as the source document contains text which is replicated in the PDF, but when creating a PDF from a scanned document it only contains images of the text and an OCR process needs to be applied to recognize the characters within the image.

Searching a searchable PDF

OCR API and Searchable PDF

Embed creating searchable PDFs in your workflow with the API. You can create searchable PDFs (sometimes also called Sandwich PDFs) directly via the API. The PDF is returned as download link in the API JSON response the form of “SearchablePDFURL”: “…”. The download link is valid for one hour, after this time the document is deleted from our OCR servers.

The isCreateSearchablePdf = true switch triggers the generation of the searchable PDF. Creating a searchable PDF takes additional processing time, so you should only activate this feature if you need the OCR result in PDF format.

For more details please see https://ocr.space/ocrapi#searchablepdf

When used with the free OCR API tier, the generated PDF contains a watermark “Generated by OCR.space” in the lower right corner. With the PRO OCR API, no watermark is added to the PDF.

PRO OCR API: As always, the free OCR API gets the update first. We know that our PRO API customers value stability and reliability above new features, so the PRO OCR API endpoints will get this update a few weeks later, once we are 100% sure everything runs rock-stable. But if you have a PRO API key, you can also connect to the free API endpoint at https://api.ocr.space/parse/image and start using the new features right away.