Reports

Check out a userscript which highlights deleted posts. GitHub

79652389

Date: 2025-06-04 08:38:03

Score: 10 🚩

Natty: 6.5

We have the same problem.

The issue with Tika when processing PDF do not contain selectable text — they appear to be image-based scans or flattened documents.

When these files are parsed by Tika, the extracted content looks corrupted or unreadable. Even when manually copying and pasting from the original PDF, the resulting text appears as strange or triangular symbols.

Do you have any idea how we could solve this issue?

Reasons:

RegEx Blacklisted phrase (1.5): solve this issue?
RegEx Blacklisted phrase (2.5): Do you have any
No code block (0.5):
Me too answer (2.5): have the same problem
Ends in question mark (2):
Low reputation (1):

Posted by: Firas KETATA