And what leftover chicken can teach us about semantic search
Modern language models can generate incredible responses, but they don’t work in isolation. Behind the scenes, the best systems combine large models with fast, context-aware search powered by vector databases.
This article walks through what vector databases actually are, how they relate to how LLMs “think,” and why they’ve become critical to making AI systems reliable, explainable, and fast.
Let’s start with a simple example.
Use Case: A Recipe Assistant That Understands Intent
Say you’re building a cooking assistant. A user types:
“I’ve got leftover chicken and some spinach. What can I cook that’s quick?”
With a traditional keyword search, this is hard to match. Most recipes don’t contain the phrase “leftover chicken” or “quick + spinach.” They use adjacent ideas — “shredded poultry,” “greens,” or “weeknight meals.”
You need a way to search for meaning, not just words.
Before we continue, it’s important to understand vectors and vector space.
How Vector Space Actually Works
Or why “close” doesn’t always mean what you think
To understand how vector search works, it helps to start simple.
1D Space
In one dimension, imagine a number line. The number 5 is closer to 6 than it is to 20. That’s obvious — you can count the distance.
2D Space
Now move to two dimensions. Imagine each point is defined by (x, y) — like coordinates on a map. You can now say that (3, 4) is close to (4, 5), and far from (100, 200). You’re measuring distance across a plane.
This is still easy to visualise.
3D Space
Add a third axis — z — and you’ve got a cube. You can still picture this with your hands. Objects in a room. Locations on Earth.
Now imagine this pattern continues…
What About 768D?
When we talk about language model embeddings, we’re often talking about 768, 1024, or even 4096 dimensions. You can’t picture it anymore, but the math still works the same.
Each sentence, image, or input becomes a point in this space. When two points are close together, it means they express similar meaning — not in human terms like synonyms, but in the model’s internal representation of concepts.
The model might put “How’s the weather?” and “What’s it like outside?” very close to each other, even though the wording is different. That’s what lets the system reason about meaning, not just matching characters.
When you run a sentence like “I’ve got leftover chicken and spinach” through a model like OpenAI’s text-embedding-3-small, it returns a vector: a long list of numbers (usually 768 or more). These numbers represent the semantic meaning of the sentence in high-dimensional space.
You can think of each vector as a point in a massive conceptual landscape — where distance isn’t based on spelling or phrasing, but on meaning. The closer two points are, the more semantically similar they are.
This space is what allows AI to reason about unstructured data: not by comparing words directly, but by comparing what those words represent.
What’s a Vector Database?
A vector database is built to store and search these high-dimensional points efficiently.
You feed in thousands (or millions) of items — product descriptions, images, support tickets, recipes — each one encoded as a vector. Then, when you get a new query, you encode it too, and the database quickly returns the most similar vectors.
It’s fast, scalable, and doesn’t rely on exact matches. Instead, it uses mathematical proximity — often called approximate nearest neighbor (ANN) search.
In our recipe example, this means your assistant can return:
-
- “Garlic chicken with wilted kale”
-
- “Wraps made from leftover grilled meat”
-
- “Spinach frittata with roast chicken chunks”
Even though none of these contain the exact words from the query, they’re close in meaning. That’s what makes the interaction feel natural — the assistant understands what the person meant, not just what they typed.
Where LLMs Fit In
So far, we’ve just been retrieving results. But language models shine when you give them something to work with.
This is the retrieval-augmented generation (RAG) pattern:
-
- A user makes a request.
-
- The system encodes the request as a vector.
-
- A vector database retrieves the most relevant content.
-
- The LLM generates a response using only that context.
This makes the model more grounded, more accurate, and more efficient. You don’t need the model to “memorise” every possible recipe — you just give it the right ones when needed.
It also makes things more transparent. You can log what the model was looking at. You can audit the retrieved inputs. You can improve quality by fixing your retrieval layer, not retraining your model.
Why It Matters
Vector databases are more than just a backend tool. They’re a foundational part of how modern AI systems deliver usable, consistent results. Without them, large language models are just guessing based on a giant blob of training data.
When used well, vector search creates systems that are:
-
- Faster — You feed the model less, but more relevant, data.
-
- Safer — You control what the model sees, and when.
-
- More useful — You can map user intent to real knowledge, even if the phrasing is messy.
Whether you’re building a recipe assistant, a support chatbot, or a search engine for internal docs, this pattern keeps showing up.
And once you understand how it works, it’s hard to imagine building LLM-powered tools without it.
Where Alto Apto Fits In
This is the kind of work we do at Alto Apto .
We’ve helped teams move past surface-level tooling and into real, production-ready AI systems — the kind that combine language models, retrieval layers, vector infrastructure, and clear thinking around what actually needs to be built.
We don’t just plug in a model and hope for the best. We stay close to the problem, understand the edge cases, and design systems that hold up under pressure.
If you’re working on something in this space — or trying to get a clearer path forward — we’re happy to talk.
No pitch. No jargon. Just smart people solving hard problems properly.