In Part 1 I simplified how a Vector DB works. Now, we’ll dive deeper.
How does a “chunk” of text get converted into numbers?
- First the words are converted into tokens.
- Next, the embedding model evaluates these tokens and generates positive or negative coordinate values for each of its internal dimensions. Note: it does NOT just look at words in isolation; it pays deep attention to the context and surroundings of each token.
- If the embedding model has 1024 dimensions then the overall chunk will have exactly 1 tuple of 1024 dimensions. This is done by collapsing the tuples for each fragment of the chunk until 1 tuple is formed.
Let’s scale this down to understand this better. Let’s assume that we use an embedding model with just 3 dimensions. Let’s assume that these dimensions are:
- nature (-1) vs electronics (1)
- edible (-1) vs non-edible (1)
- farm (-1) vs retail (1)
For this fictitious example, let’s consider an input chunk like “I took a bite out of the crisp, juicy apple from the orchard.” This could lead to a 3 co-ordinate tuple like [-0.9, -0.7, -1.0]. Because the sentence is related to nature, it’s talking about an edible item and it is related to farms.
Now, let’s consider another input chunk – “I bought a new Apple laptop with a crisp display.” This could lead to a different tuple like [1.0, 0.9, 1.0].
So, how does this actually help our users? Imagine someone types a prompt into your RAG application: “What are some good crunchy fruits I can pick and eat?” Here is where the final piece of the puzzle clicks into place: the user’s question is also converted into a tuple. The exact same embedding model processes the prompt. Because the question heavily features concepts like “fruits,” “eat,” and “pick,” its resulting coordinates will be strongly pulled toward nature, edibility, and agriculture. Let’s say the question’s tuple ends up at [-0.8, -0.9, -0.7].
Next, the Vector DB performs a Similarity Search. It doesn’t read the words to look for matches; it simply grabs a “metaphorical” tape measure and calculates the distance between the question’s coordinates and every chunk in the database.
- The Vector DB calculates that our first chunk (the orchard apple sentence) is very close to the question.
- It also calculates that the second chunk (the Apple laptop sentence) is on the complete opposite side of the vector space.
As a result, the database confidently retrieves the first chunk about the orchard and completely ignores the second chunk about the laptop. By converting both the documents and the search query into mathematical coordinates, we’ve successfully replaced primitive string-matching with true semantic (meaning) matching.
Do you see the value of embeddings now? They are superior to naive string matches for most scenarios when dealing with unstructured data.
But this is not a silver bullet for all of our retrieval problems. There are several challenges with implementing RAG with accuracy in production.
- What is the right chunk size?
- Which is the most appropriate embedding model?
- How to handle visuals/images present in the documents?
- How to handle multi-column layout in documents?
- How many chunks to send to the LLM post retrieval?
- What if some chunks are relevant for the user’s query but not getting retrieved?
- How to handle “updating” of documents in the vector DB?
- How to enforce RBAC on a common vector DB?
These are all good questions to ponder on. There are several techniques to overcome these challenges and improve the overall efficacy of a RAG pipeline. But for now, I hope you have clearly understood how Vector DBs work. So the next time you use a RAG-bot at work to answer questions based on a corpus of information, you can reflect back on how it is actually working under the hood.
Keep Learning, Keep Excelling!


