PHP Classes

File: examples/text-capture/README.md

Recommend this page to a friend!
  Classes of Christian Vigh   PHP PDF to Text   examples/text-capture/README.md   Download  
File: examples/text-capture/README.md
Role: Documentation
Content type: text/markdown
Description: Documentation
Class: PHP PDF to Text
Extract text contents from PDF files
Author: By
Last change: Added more information for this example
Date: 6 years ago
Size: 1,182 bytes
 

Contents

Class file image Download

This example shows you how to capture text areas and table lines/columns from a PDF document.

The directory includes the following files :

  • sample-report.pdf : the sample PDF file used in this example.
  • sample-report.doc : the original Microsoft Word document that was used to generate sample-report.pdf
  • sample-report.xml : the Capture definitions file that specifies what is to be captured (in XML format)
  • example.php : the PHP script that takes as input sample-report.pdf and sample-report.xml to extract only the information you want
  • sample-report.txt : the output of a previous run of the PdfToText class against file sample-report.pdf, with the PDFOPT\_DEBUG\_SHOW\_COORDINATES option. It gives every block of text found in the input document, with its (x,y) coordinates and width/height. This information is really useful when you have to design a Capture definitions file because it requires such information.

This example may not be the best for you, because in the current version (1.6.0), all the columns in file sample-report.pdf are interpreted as a single column. This issue will be fixed in a future release, probably 1.6.1