79427186

Date: 2025-02-10 13:21:00
Score: 1
Natty:
Report link

Training your own model on this will be a very hard task and propably goes outside of the scope of your internship. You would need large amounts of training data because the results are dependant on a lot of factors. One idea that I got reading your question though is to use a big LLM, such as ChatGPT or Google Gemini and provide it the full webpage you want to scrape in the prompt. These LLMs are capable of returning structured output (for example json), so you can basically describe them the structure of the desired output and what it should fill into each field using the information from the webpage.

Here is a link from google showing how to make Gemini return a json: https://ai.google.dev/gemini-api/docs/structured-output?hl=de&lang=python

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Carl Philip