Yes Genspark , Its very good at parsing the HTML's . While for pdf, images it becomes very expensive so prefer to use poppler and tesseract for extraction and then feed data to LLM's.