Efficient Parallel LLM Execution in FastAPI?

9d (edited) • 💡 Help

In our pipeline, a FastAPI endpoint receives the request. In one request, different agents can run — 2 from the Orchestrator (planning + combining), 4 from the ExcelAgent, 1 from the Researcher (which handles RAG and web search), and 1 from the Image Generator. That means there can be up to 8 LLM calls in total. Right now, I’m using concurrent.futures to run the agents in parallel, and everything runs on Celery. Is this the best approach, or is there a more standard/efficient way to handle this concurrency?

3 comments