If you're scraping a website, via HTTP request, to later feed into an LLM then I highly recommend you to use the Jina reader API.
It's completely free and it will output the scraped data as LLM friendly text instead of messy HTML.
So in your HTTP request you just add "https://r.jina.ai/" before the URL to scrape. So something like this: 👇Example video where I test it to scrape a bookstore website...