Vicente Silva

I’m working on a small project to create a research agent that can: 1. Crawl an entire website (including all subpages under the same domain). 2. Extract and save all the data into a single text file. 3. Download every attachment available on the site (PDFs, docs, etc.). 4. Later, I’ll feed all this collected data into an LLM-powered notebook for deep analysis and insights. The idea is to make information gathering automatic and efficient, so I can focus on using the data instead of spending hours collecting it manually. If anyone has experience building similar agents or optimizing crawlers, I’d love to hear your tips and feedback!

New comment 17d ago

Vicente Silva

2 likes • 18d

For the web crawling and scraping part, you can use the HTTP Request node configured for the Firecrawl API or, more directly, the dedicated Firecrawl node within n8n. This will return the raw text content of each page. Firecrawl is also great at automatically identifying and extracting links to files like PDFs and other documents. Firecrawl itself doesn't download the PDFs and attachments. However, you can use an HTTP Request node to download the binary files using the URLs that Firecrawl provides. In terms of data processing and storage, you first need to process the scraped information. I would recommend splitting it into smaller, manageable chunks. You should also filter out any irrelevant noise from the web pages to improve the quality of your data. You'll then need an Embeddings node to convert these text chunks into numerical vectors before storing them in a vector database of your choice. For the LLM Integration (The Q&A Engine), which handles the Q&A part, you'll use a few key nodes. Start with an Embeddings node to convert the user's question into a vector. This vector is then sent to your Vector Database node to perform a similarity search, which retrieves the most relevant text chunks from the corpus you built. These retrieved documents are then passed to an AI Chat Model node along with the original question. This process provides the LLM with the context it needs to generate a specific and accurate answer. An excellent example of this is the "Open Deep Research" workflow template available on n8n.io, which automates deep research by using AI-driven search, web scraping, content evaluation, and iterative refinement.

Vicente Silva

18d •

▶️ Youtube Videos

Turn Any Product Into a Viral Ad With AI & n8n

This workflow turns average product photos into high quality ads

Tiago Lemos

Aug 21 •

👋 Introduction

Hey everyone, welcome to the community! 🎉

We're all here to master N8N and AI Automation together, no question is too basic and no win is too small to share! Start by introducing yourself: - How long have you been learning N8N/Ai Automations for? - What made you interested in N8N/Ai Automations? - Biggest win so far? (Or biggest frustration if you're just starting out) Looking forward to seeing what you build!

New comment 14d ago

Vicente Silva

1 like • 18d

@Claudia Dell'Acqua Have you been able to fix your issue? If not shoot me a message and ill be glad to help!

1-3 of 3

Level 2

14points to level up

Vicente Silva

@vicente-silva-7519

N8N & AI solutions

Active 6h ago

Joined Aug 9, 2025

Contributions

Followers

Following