Reports

I would like to provide a more generic solution for all types of table, pdf and pdf OCR formats. I thought that you need a "cell split" tool that puts the cells, with the cells there you already have the tables, Look at the example of “cell split”

It also has to add coordinates, that is, there are cells that can be completely empty, but you still have to take them into account within the table. Example https://miro.medium.com/v2/resize:fit:720/format:webp/1*reGHxSpu0h5MwMnZc-ptqw.png

With this I want to tell you that there are more than 1000 table formats in .pdf, making a minicode for each format is 💀 ⚰️ .

Tools “cell split OCR” list:

https://medium.com/@chloe_yl4616/table-extraction-and-text-recognition-from-images-via-eye-gaze-tracking-ae22a707263
https://nanonets.com/blog/table-extraction-deep-learning/
https://medium.com/analytics-vidhya/tablenet-deep-learning-model-for-end-to-end-table-detection-and-tabular-data-extraction-from-a26790097a50

79278111