When you ask a model to pull a relevant paragraph, the slow part is often the vector lookup, not the inference itself.

An embedding is just a list of numbers that captures meaning; the same technique can turn words, whole documents or images into comparable points in space.

If you compare a million of those points one by one you’ll spend seconds or minutes, which defeats the purpose of real‑time AI.

Vector stores avoid that by building indexes such as HNSW or IVF, turning a brute‑force scan into a millisecond‑scale nearest‑neighbor query.

For instance, using HNSW with 1000 trees can provide a significant speedup, but at the cost of increased memory usage and indexing time, as seen in tools like Faiss and Annoy, where the choice of indexing parameters can greatly affect query performance, with a typical query taking around 10 milliseconds with the right settings.

By mid‑2023 the market offered roughly a dozen credible choices, from purpose‑built services like Pinecone, Weaviate, Milvus and Qdrant to extensions on Postgres (pgvector) and Elasticsearch.

I have seen projects where the choice of vector database was based on the scalability needs, such as handling 10 million vectors with 512 dimensions, and requiring sub-second latency, in which case services like Pinecone and Qdrant were preferred due to their ability to handle high-dimensional data and provide low-latency queries, with Qdrant's filtering capabilities being particularly useful for certain use cases.

Each option makes different compromises: some excel at horizontal scalability, others bundle full‑text search, and the pricing models range from generous free tiers to enterprise contracts.

In my experience, the cost of a vector database can be a significant factor, especially when dealing with large datasets, with costs ranging from a few hundred dollars per month for a small-scale setup to tens of thousands of dollars per month for a large-scale deployment, and the choice of database should be based on a thorough evaluation of the trade-offs, including the cost of storage, query performance, and maintenance, with tools like OpenSearch and Vespa providing a good balance between cost and performance.

If you are just tinkering, an embedded engine may be enough; when you need sub‑second latency at billions of vectors you’ll gravitate toward a service built for that scale.

Furthermore, the choice of vector database also depends on the specific use case, such as image or text search, and the required level of accuracy, with some databases like Weaviate providing specialized features for certain use cases, like image search with convolutional neural networks, and others like Milvus providing support for a wide range of data types and algorithms.

The hard part isn’t storing the vectors; it’s stitching the embedding generator into your pipeline, keeping vectors fresh as data changes, and handling version bumps without breaking queries.

In practice, this means that the database choice is only one part of the equation, and a well-designed pipeline with automated testing and monitoring can make a big difference in the overall performance and reliability of the system, as I have seen in projects where the focus was on building a scalable and maintainable pipeline, rather than just choosing a vector database.

In practice the database choice matters far less than a well‑engineered flow that keeps embeddings in sync with the models that consume them.