The ground-truthed datasets of PDF tables
License:
Unknown
Overview
Two ground-truthed datasets of natively-digital PDF documents containing tables.
On this page you will find two ground-truthed datasets of natively-digital PDF documents
containing tables. These documents have been collected systematically from the European
Union and US Government websites, and we therefore expect them to have public domain
status. Each PDF document is accompanied by three XML (or CSV) file containing its
ground truth in the following models:
- table regions (for evaluating table location)
- cell structures (for evaluating table structure recognition)
- functional representation (for evaluating table interpretation)