79281229

Date: 2024-12-14 19:24:45
Score: 3
Natty:
Report link

To run OCR on an unsearchable PDF to make it searchable, you can try PdfOCRer, a python program in GitHub.

It uses Ghoshscript to convert pdf page to image and PaddleOCR as the OCR engine. PaddleOCR seems to have comparable performance as Tesseract according to discussion in this Stackoverflow thread.

Reasons:
  • Blacklisted phrase (1): Stackoverflow
  • Low length (0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Mark Front