I need help with a PPT → RAG pipeline issue
Hi everyone,
I’ve built an AI automation workflow that converts PDF, DOC, and PPT files into text and stores them in Pinecone.
PDF and DOC files work correctly and generate good embeddings.
PowerPoint files convert to text successfully, but the text is not stored properly as vectors, and the embeddings are inaccurate.
I suspect the issue is related to PPT text structure, chunking, or preprocessing.
My questions:
  • What is the best way to structure PPT text (slide-wise, bullet-wise, or section-wise) before embedding?
  • Are there recommended chunk sizes or metadata formats for PPT files?
  • Has anyone built or seen a working RAG workflow for PowerPoint documents?
Any guidance, examples, or references would be greatly appreciated.
Thanks in advance for your support 🙏
5
2 comments
Kishan Shukla
4
I need help with a PPT → RAG pipeline issue
AI Automation Society
skool.com/ai-automation-society
A community built to master no-code AI automations. Join to learn, discuss, and build the systems that will shape the future of work.
Leaderboard (30-day)
Powered by