Extract Tables From PDF As CSV and TSV Using Tabula

0 Comments
Editor Ratings:
User Ratings:
[Total: 0   Average: 0/5]




Tabula is a handy tool to extract tables from PDF file as CSV and TSV (tab separated values) file. It supports multipage PDF documents and provides facility to select multiple tables together to generate a single file. What makes it better than many similar tools is that it doesn’t convert whole PDF file or selected PDF pages to CSV format. You can select only required area from PDF tables (containing data in rows and columns).

It could be handy if you need to extract your bank statement or any other data available in tabular form in a PDF file. Do note that it can extract tables from only those PDF files which are formatted as text, and not from scanned PDF files (for that, you can try OCR tools like i2OCRFree Online OCR, and PDF OCR X).

Tabula- extract tables from PDF in CSV

Screenshot above shows extracted tabular data that I saved in CSV format and opened with Microsoft Excel. Same thing can be done by you with this useful tool.

Also check out: List of Best Free PDF Readers.

How To Extract Tables From PDF As CSV and TSV File?

Tabula comes as an executable jar file which you can run with your default web browser. So, everything will happen in web browser, but in offline mode. Make sure java is also installed to work with Tabula. Download its zip file (around 36 MB) using the link available at the end of this review to access jar file and execute it.

Now you need to add a PDF file available on PC and submit it to generate thumbnails of PDF pages. An interesting thing is that Tabula comes with auto detect tables feature. But it may slow down the thumbnails generating process.

run Tabula and upload a PDF

During my testing, it selected table on first page only. So, it is helpful to manually select PDF tables.

As soon as thumbnails are added, you can click on any thumbnail and select required area from a table. It will immediately extract selected table which you can download.

If you need to extract multiple tables simultaneously, you can enable Multi-Select mode, and can select tables one by one. After that, you can tap on Download All Data button to generate tabular data.

select PDF tables and extract

Tabular data can be generated as original format or in spreadsheet format. Select any method, and use Download data button to download tables in comma-separated values (CSV) file or TSV file to open with any spreadsheet software or with MS Excel.

select extraction method and download data

Conclusion:

Extracting tabular data from scanned PDFs is missing here, otherwise Tabula is a good tool to get tabular data available in your PDF documents. It is worthy to try. And feature of selecting multiple tables and extracting them as a CSV or TSV file is interesting.

Get Tabula free.

Also look at these free tools to extract images from PDF.

Editor Ratings:
User Ratings:
[Total: 0   Average: 0/5]
Works With: Windows, Mac, and Linux
Free/Paid: Free