Embeddings get the attention, but chunking quietly decides what your retrieval system can find. Split text badly and even a great embedding model returns fragments. Three choices shape it: size, overlap, and where you cut.
Size
Chunk size is a trade. Too large and each chunk covers several ideas, so its single vector is a blurry average. A query about one of those ideas matches weakly. Too small and chunks lose the context that makes them meaningful, and you store far more of them. A common sweet spot for prose is ~512 characters (roughly 2–3 sentences): big enough to hold a complete thought, small enough to stay focused.
Overlap
Adjacent chunks should share a little text, say 10–15%. The reason is boundary blindness: a concept, named entity, or clause that falls exactly on a split would otherwise be fractured across two chunks and findable in neither. Overlap (e.g. 64 characters on a 512 chunk) makes sure anything straddling a cut appears whole in at least one chunk.
Where you cut
The worst splitter cuts every N characters blindly, slicing mid-sentence. A recursive splitter cuts on a hierarchy of separators: paragraph breaks first, then line breaks, then sentence ends, falling back to spaces only when forced:
["\n\n", "\n", ". ", "! ", "? ", " ", ""]
Chunks land on natural boundaries, so each one is a coherent unit instead of a ragged fragment.
Takeaway
Chunking is three decisions: size (big enough for one complete thought, small enough to stay focused), overlap (~10–15%, to save concepts on the boundary), and split point (a separator hierarchy, never blind offsets). Tune these before you blame the embedding model. Bad chunks sink good vectors.
