Help Needed: Streaming Token-by-Token Response with Vertex AI streamQuery API

Hi all, I'm struggling to achieve token-by-token streaming with the Vertex AI streamQuery API (streamQuery?alt=sse). Both my Python script (using requests.post with stream=True) and curl command return the full response after a delay (~4.3s) instead of streaming token-by-token. I’ve also tried integrating with a Next.js frontend, but the output still appears all at once. Has anyone successfully implemented token-by-token streaming with this API? Any working examples for Python, curl, or Next.js would be greatly appreciated!

0 comments