One query is never enough: why top RAG systems search three times
LangChain has MultiQueryRetriever. LlamaIndex has SubQuestionQueryEngine. Every serious RAG framework decomposes user questions into multiple search queries before hitting the vector database. Why?...

Source: DEV Community
LangChain has MultiQueryRetriever. LlamaIndex has SubQuestionQueryEngine. Every serious RAG framework decomposes user questions into multiple search queries before hitting the vector database. Why? Because a single embedding compresses your entire question into one point in vector space. And one point can only land in one neighborhood. Take this question: "How do I fix a slow database connection in my Flask app?" Three concepts, three clusters in embedding space: Database connections - pooling, timeouts, driver configuration Flask-specific patterns - SQLAlchemy setup, app factory patterns, teardown handling Performance diagnostics - profiling, query logging, bottleneck identification Embed the full question, and the resulting vector lands in the "Flask + database" neighborhood. The performance diagnostics cluster is invisible. You get back five results about Flask and database setup, zero about profiling or bottleneck identification. This is not about relationships between entities (th