I am facing a similar issue in extracting text content from complex-layout PDFs. The PDFs are not large; each is about 2 to 3 pages.
Thanks to @Davide Fiocco, I was able to find a better solution for my project.
However, I have a few follow-up questions:
The reason I need to use CURL is that I must develop this project with pure JavaScript, without other npm packages like pdfjs-dist, canvas, or openai.
Currently, I am attempting to convert PDFs to images using the PDF.co API and then send the images to OpenAI endpoints using fetch. However, I would prefer a solution that doesn’t require conversion to images. Again, the PDF layout is quite complex.