Semantic search rests on one idea: turn text into numbers so that closeness in numbers means closeness in meaning. Those numbers are embeddings. Here’s the whole concept, end to end.
A vector per piece of text
An embedding model maps a string to a fixed-length list of numbers called a vector. all-MiniLM-L6-v2 produces 384 of them per input. The model is trained so that texts with similar meaning land near each other in this 384-dimensional space, even when they share no words. “Feline companion” and “pet cat” end up close; “pet cat” and “tax law” end up far.
Distance = similarity
Once everything is a vector, “find relevant text” becomes “find nearby vectors.” The usual measure is cosine similarity: the cosine of the angle between two vectors. Same direction → cosine 1 (very similar); perpendicular → 0 (unrelated). It measures orientation, not length, which is what you want: meaning is about direction in embedding space, not magnitude.
The L2 shortcut
If you normalize every vector to length 1 (L2 normalization), cosine similarity becomes a plain dot product: multiply matching components, sum them. Same answer, less computation. It’s why retrieval pipelines normalize up front and why cosine vector stores expect it.
Searching
To answer a query, embed the query with the same model, then find the chunks whose vectors are closest. The vector store does this efficiently and returns them ranked by similarity. That ranked list, passages nearest in meaning to your question, is the search result.
Takeaway
Embeddings turn text into vectors where distance encodes meaning; cosine similarity measures that distance by angle; L2 normalization makes it a fast dot product. Get those three ideas and you understand how every semantic search and RAG system actually finds what it finds.
