I'm assuming that your files are not machine readable - which would allow you to scrape them directly.
Have you tried pytesseract? https://pypi.org/project/pytesseract/
Is it an option to add a step where you first batch convert the documents to .md and only then extract and load them to excel?