We have the same problem.
The issue with Tika when processing PDF do not contain selectable text — they appear to be image-based scans or flattened documents.
When these files are parsed by Tika, the extracted content looks corrupted or unreadable. Even when manually copying and pasting from the original PDF, the resulting text appears as strange or triangular symbols.
Do you have any idea how we could solve this issue?