Need help with multiple-page PDF parsing in n8n

Hey, guys, I would very much appreciate any help on this n8n build I’m doing right now. I’ve been stuck on it for 2-3 days as it’s my first Upwork client. Been trying to figure out how to build it as simply and efficiently as possible, but the process just keeps throwing new challenges at me and it’s been getting pretty overwhelming.

Let me explain what needs to be done real quick:

The client is looking for a build that -> triggers whenever a new document is uploaded to OneDrive (It’s a multiple page (can be up to 100) PDF checklist in German) -> extracts the relevant info which in this case are the checklist items (along with the dropdown triangles) -> takes this info and places it in to excel sheets (one question per one row of the document, all in one column, no Matrix for the doc).

*If you check out the PDF attached, the checklist items are the ones in the column on the very left and numerated (the other stuff is irrelevant).

This is what I’ve come up with so far:

Loom: https://www.loom.com/share/8e545f2604344cbaa892d37906792953?sid=e2c913cb-2bac-41f7-b44c-55e09fc44b67

I attached: 1) The example PDF from which the questions need to be extracted 2) An image of the n8n flow 3) The JSON blueprint of the flow, so feel free to check it out if you have time.

I feel like somebody with experience parsing multiple page PDFs could have great understanding on how best to approach this.

Comment under the post/DM me privately if you have any tips for this.

Again, thank you if you’re reading this, have a great day/night to everybody!

0 comments