I use a combo of claude and openai. You can upload PDFs directly to Claude api and it will extract desired info, but it can't do structured outputs. So I tell Claude to give json about the contents of the pdf (the best it can), then i pass the response to openai to convert Claude's response into actual json. Takes two api calls instead of one, but the upside is no first converting PDFs to images (no quick and easy way to do this in Python or nodejs), no extracting only text or only image (Claude can interpret both), and it's fast enough to run in a serverless lambda without the need for lambda layers or special binaries, just first upload the file to s3, then convert to base64 before passing to Claude. Works wonders