Today I looked at the issue from earlier again to really understand what was going on.
Someone in the community gave me a simple and very helpful tip.Just extract the text from the PDF and convert it into a .md file.For now this is the easiest solution.
I tried it and it worked. I pulled the text out of the PDF, saved it as a markdown file, uploaded it and Pinecone finally accepted it. The text was processed without any problems.
I still don’t fully understand why the original PDF didn’t work.But I will figure it out sooner or later.
In the meantime I started thinking about how to automate this.Maybe I can use an OCR step to extract text from any PDF and then pass that text to Pinecone.Mistral has OCR. Do you know any other good options I should look at?
I also want to test Supabase later to compare how it works with a simple RAG setup.
Now the next step is to send everything into n8n and see what happens.
Let’s keep going my next fight