Vector Similarity Search Tutorial

<h2 id="vector-similarity-search-tutorial" class="position-relative d-flex align-items-center group"> Vector Similarity Search Tutorial <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="vector-similarity-search-tutorial" aria-haspopup="dialog" aria-label="Share link: Vector Similarity Search Tutorial"> Share link </button> </h2><div id="headingShareModal" class="heading-share-modal" role="dialog" aria-modal="true" aria-labelledby="headingShareTitle" hidden> <div class="hsm-dialog" role="document"> <div class="hsm-header"> <h2 id="headingShareTitle" class="h6 mb-0 fw-bold">Share this section</h2> <button type="button" class="hsm-close" aria-label="Close"> </button> </div> <div class="hsm-body"> <label for="headingShareInput" class="form-label small text-muted mb-1 text-uppercase fw-bold" style="font-size: 0.7rem; letter-spacing: 0.5px;">Permalink</label> <div class="input-group mb-4 hsm-url-group"> <input id="headingShareInput" type="text" class="form-control font-monospace" readonly aria-readonly="true" style="font-size: 0.85rem;" /> <button class="btn btn-primary hsm-copy" type="button" aria-label="Copy" title="Copy"> </button> </div> <div class="small fw-bold mb-2 text-muted text-uppercase" style="font-size: 0.7rem; letter-spacing: 0.5px;">Share via</div> <div class="hsm-share-grid"> <a id="share-twitter" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Twitter </a> <a id="share-linkedin" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> LinkedIn </a> <a id="share-facebook" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Facebook </a> </div> </div> </div> </div> <style> .heading-share-modal { position: fixed; inset: 0; display: flex; justify-content: center; align-items: center; background: rgba(0, 0, 0, 0.6); z-index: 1050; padding: 1rem; backdrop-filter: blur(4px); -webkit-backdrop-filter: blur(4px); } .heading-share-modal[hidden] { display: none !important; } .hsm-dialog { max-width: 420px; width: 100%; background: var(--bs-body-bg, #fff); color: var(--bs-body-color, #212529); border: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); border-radius: 1rem; box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25); overflow: hidden; animation: hsm-fade-in 0.2s ease-out; } @keyframes hsm-fade-in { from { opacity: 0; transform: scale(0.95); } to { opacity: 1; transform: scale(1); } } [data-bs-theme="dark"] .hsm-dialog { background: #1e293b; border-color: rgba(255,255,255,0.1); color: #f8f9fa; } .hsm-header { display: flex; justify-content: space-between; align-items: center; padding: 1rem 1.5rem; border-bottom: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); background: rgba(0,0,0,0.02); } [data-bs-theme="dark"] .hsm-header { background: rgba(255,255,255,0.02); border-color: rgba(255,255,255,0.1); } .hsm-close { background: transparent; border: none; color: inherit; opacity: 0.5; padding: 0.25rem 0.5rem; border-radius: 0.25rem; font-size: 1.2rem; line-height: 1; transition: opacity 0.2s; } .hsm-close:hover { opacity: 1; } .hsm-body { padding: 1.5rem; } .hsm-url-group { display: flex !important; align-items: stretch; } .hsm-url-group .form-control { flex: 1; min-width: 0; margin: 0; background: var(--bs-secondary-bg, #f8f9fa); border-color: var(--bs-border-color, #dee2e6); border-top-right-radius: 0; border-bottom-right-radius: 0; height: 42px; } .hsm-url-group .btn { flex: 0 0 auto; margin: 0; margin-left: -1px; border-top-left-radius: 0; border-bottom-left-radius: 0; height: 42px; display: flex; align-items: center; justify-content: center; padding: 0 1.25rem; z-index: 2; } [data-bs-theme="dark"] .hsm-url-group .form-control { background: #0f172a; border-color: #334155; color: #e2e8f0; } .hsm-share-grid { display: flex; flex-direction: column; gap: 0.5rem; } .hsm-share-grid .btn { display: flex; align-items: center; justify-content: center; font-size: 0.9rem; padding: 0.6rem; border-color: var(--bs-border-color); width: 100%; } [data-bs-theme="dark"] .hsm-share-grid .btn { color: #e2e8f0; border-color: #475569; } [data-bs-theme="dark"] .hsm-share-grid .btn:hover { background: #334155; border-color: #cbd5e1; } </style> <script> (function(){ const modal = document.getElementById('headingShareModal'); if(!modal) return; const input = modal.querySelector('#headingShareInput'); const copyBtn = modal.querySelector('.hsm-copy'); const twitter = modal.querySelector('#share-twitter'); const linkedin = modal.querySelector('#share-linkedin'); const facebook = modal.querySelector('#share-facebook'); const closeBtn = modal.querySelector('.hsm-close'); let lastFocus=null; let trapBound=false; function buildUrl(id){ return window.location.origin + window.location.pathname + '#' + id; } function isOpen(){ return !modal.hasAttribute('hidden'); } function hydrate(id){ const url=buildUrl(id); input.value=url; const enc=encodeURIComponent(url); const text=encodeURIComponent(document.title); if(twitter) twitter.href=`https://twitter.com/intent/tweet?url=${enc}&text=${text}`; if(linkedin) linkedin.href=`https://www.linkedin.com/sharing/share-offsite/?url=${enc}`; if(facebook) facebook.href=`https://www.facebook.com/sharer/sharer.php?u=${enc}`; } function openModal(id){ lastFocus=document.activeElement; hydrate(id); if(!isOpen()){ modal.removeAttribute('hidden'); } requestAnimationFrame(()=>{ input.focus(); }); trapFocus(); } function closeModal(){ if(!isOpen()) return; modal.setAttribute('hidden',''); if(lastFocus && typeof lastFocus.focus==='function') lastFocus.focus(); } function copyCurrent(){ try{ navigator.clipboard.writeText(input.value).then(()=>feedback(true),()=>fallback()); } catch(e){ fallback(); } } function fallback(){ input.select(); try{ document.execCommand('copy'); feedback(true);}catch(e){ feedback(false);} } function feedback(ok){ if(!copyBtn) return; const icon=copyBtn.querySelector('i'); if(!icon) return; const prev=copyBtn.getAttribute('data-prev')||icon.className; if(!copyBtn.getAttribute('data-prev')) copyBtn.setAttribute('data-prev',prev); icon.className= ok ? 'fa-duotone fa-clipboard-check':'fa-duotone fa-circle-exclamation'; setTimeout(()=>{ icon.className=prev; },1800); } function handleShareClick(e){ e.preventDefault(); const btn=e.currentTarget; const id=btn.getAttribute('data-share-target'); if(id) openModal(id); } function bindShareButtons(){ document.querySelectorAll('.h-share').forEach(btn=>{ if(!btn.dataset.hShareBound){ btn.addEventListener('click', handleShareClick); btn.dataset.hShareBound='1'; } }); } bindShareButtons(); if(document.readyState==='loading'){ document.addEventListener('DOMContentLoaded', bindShareButtons); } else { requestAnimationFrame(bindShareButtons); } document.addEventListener('click', function(e){ const shareBtn=e.target.closest && e.target.closest('.h-share'); if(shareBtn && !shareBtn.dataset.hShareBound){ handleShareClick.call(shareBtn, e); } }, true); document.addEventListener('click', e=>{ if(e.target===modal) closeModal(); if(e.target.closest && e.target.closest('.hsm-close')){ e.preventDefault(); closeModal(); } if(copyBtn && (e.target===copyBtn || (e.target.closest && e.target.closest('.hsm-copy')))) { e.preventDefault(); copyCurrent(); } }); document.addEventListener('keydown', e=>{ if(e.key==='Escape' && isOpen()) closeModal(); }); function trapFocus(){ if(trapBound) return; trapBound=true; modal.addEventListener('keydown', f=>{ if(f.key==='Tab' && isOpen()){ const focusable=[...modal.querySelectorAll('a[href],button,input,textarea,select,[tabindex]:not([tabindex="-1"])')].filter(el=>!el.hasAttribute('disabled')); if(!focusable.length) return; const first=focusable[0]; const last=focusable[focusable.length-1]; if(f.shiftKey && document.activeElement===first){ f.preventDefault(); last.focus(); } else if(!f.shiftKey && document.activeElement===last){ f.preventDefault(); first.focus(); } } }); } if(closeBtn) closeBtn.addEventListener('click', e=>{ e.preventDefault(); closeModal(); }); })(); </script>Learn to implement semantic search using vector embeddings and HNSW indexing in this hands-on 25-minute tutorial. <h3 id="prerequisites" class="position-relative d-flex align-items-center group"> Prerequisites <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prerequisites" aria-haspopup="dialog" aria-label="Share link: Prerequisites"> Share link </button> </h3><ul> <li>Completed <a href="/docs/tutorials/match-basics/" >MATCH Basics Tutorial</a> </li> <li>Python 3.9+ with pip installed</li> <li>Geode server running (<code>geode serve</code>)</li> <li>Basic understanding of embeddings (helpful but not required)</li> </ul> <h3 id="tutorial-overview" class="position-relative d-flex align-items-center group"> Tutorial Overview <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="tutorial-overview" aria-haspopup="dialog" aria-label="Share link: Tutorial Overview"> Share link </button> </h3>Time: 25 minutes Difficulty: Intermediate Topics: Vector embeddings, HNSW indexing, similarity metrics, semantic search By the end of this tutorial, you’ll be able to: <ul> <li>Generate vector embeddings from text</li> <li>Store vectors in Geode</li> <li>Create HNSW indexes for fast similarity search</li> <li>Find semantically similar items</li> <li>Optimize vector search performance</li> </ul> <h3 id="what-are-vector-embeddings" class="position-relative d-flex align-items-center group"> What are Vector Embeddings? <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="what-are-vector-embeddings" aria-haspopup="dialog" aria-label="Share link: What are Vector Embeddings?"> Share link </button> </h3>Vector embeddings convert data (text, images, audio) into numerical arrays that capture semantic meaning. Similar items have vectors that are close in vector space. Example: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">"cat" → [0.2, 0.8, 0.1, ...] "dog" → [0.3, 0.7, 0.2, ...] (close to "cat") "car" → [0.9, 0.1, 0.8, ...] (far from "cat") </code></pre></div> <h3 id="step-1-setup-environment" class="position-relative d-flex align-items-center group"> Step 1: Setup Environment <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-1-setup-environment" aria-haspopup="dialog" aria-label="Share link: Step 1: Setup Environment"> Share link </button> </h3> <h4 id="install-dependencies" class="position-relative d-flex align-items-center group"> Install Dependencies <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="install-dependencies" aria-haspopup="dialog" aria-label="Share link: Install Dependencies"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Install Python client and embedding library pip install geode-client sentence-transformers # Or if using requirements file cat > requirements.txt <<EOF geode-client>=0.1.0 sentence-transformers>=2.2.0 numpy>=1.21.0 EOF pip install -r requirements.txt </code></pre></div> <h4 id="import-libraries" class="position-relative d-flex align-items-center group"> Import Libraries <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="import-libraries" aria-haspopup="dialog" aria-label="Share link: Import Libraries"> Share link </button> </h4> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">import asyncio from geode_client import Client from sentence_transformers import SentenceTransformer import numpy as np # Create client (use an async connection for queries) client = Client(host="localhost", port=3141) # Load embedding model (downloads on first use, ~80MB) model = SentenceTransformer('all-MiniLM-L6-v2') </code></pre></div>All Python steps below assume you are inside an async context, for example: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">async def main(): async with client.connection() as conn: # Run the steps below with await conn.execute(...) / await conn.query(...) ... asyncio.run(main()) </code></pre></div> <h4 id="verify-setup" class="position-relative d-flex align-items-center group"> Verify Setup <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="verify-setup" aria-haspopup="dialog" aria-label="Share link: Verify Setup"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Test embedding generation text = "Hello, world!" embedding = model.encode(text) print(f"Text: {text}") print(f"Embedding shape: {embedding.shape}") print(f"Embedding (first 5 dims): {embedding[:5]}") # Expected output: # Text: Hello, world! # Embedding shape: (384,) # Embedding (first 5 dims): [0.123, -0.456, 0.789, ...] </code></pre></div> <h3 id="step-2-create-sample-dataset" class="position-relative d-flex align-items-center group"> Step 2: Create Sample Dataset <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-2-create-sample-dataset" aria-haspopup="dialog" aria-label="Share link: Step 2: Create Sample Dataset"> Share link </button> </h3> <h4 id="create-movie-database" class="position-relative d-flex align-items-center group"> Create Movie Database <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="create-movie-database" aria-haspopup="dialog" aria-label="Share link: Create Movie Database"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Sample movies with descriptions movies = [ { "id": "mov_1", "title": "The Matrix", "description": "A computer hacker learns about the true nature of reality and his role in the war against its controllers.", "genre": "Sci-Fi" }, { "id": "mov_2", "title": "Inception", "description": "A thief who steals corporate secrets through dream-sharing technology is given the task of planting an idea.", "genre": "Sci-Fi" }, { "id": "mov_3", "title": "The Shawshank Redemption", "description": "Two imprisoned men bond over years, finding solace and eventual redemption through acts of common decency.", "genre": "Drama" }, { "id": "mov_4", "title": "The Dark Knight", "description": "Batman faces the Joker, a criminal mastermind who wants to plunge Gotham City into anarchy.", "genre": "Action" }, { "id": "mov_5", "title": "Interstellar", "description": "A team of explorers travel through a wormhole in space in an attempt to ensure humanity's survival.", "genre": "Sci-Fi" }, { "id": "mov_6", "title": "Forrest Gump", "description": "The presidencies of Kennedy and Johnson unfold through the perspective of an Alabama man with an IQ of 75.", "genre": "Drama" } ] # Generate embeddings print("Generating embeddings...") for movie in movies: # Create embedding from title + description text = f"{movie['title']}. {movie['description']}" movie['embedding'] = model.encode(text).tolist() print(f"✓ {movie['title']}") </code></pre></div> <h4 id="load-into-geode" class="position-relative d-flex align-items-center group"> Load into Geode <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="load-into-geode" aria-haspopup="dialog" aria-label="Share link: Load into Geode"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Create graph await conn.execute("CREATE GRAPH MovieSearch; USE MovieSearch;") # Insert movies with embeddings for movie in movies: await conn.execute(""" CREATE (:Movie { id: $id, title: $title, description: $description, genre: $genre, embedding: $embedding }) """, movie) print(f"\n✓ Loaded {len(movies)} movies into Geode") </code></pre></div> <h3 id="step-3-create-hnsw-index" class="position-relative d-flex align-items-center group"> Step 3: Create HNSW Index <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-3-create-hnsw-index" aria-haspopup="dialog" aria-label="Share link: Step 3: Create HNSW Index"> Share link </button> </h3> <h4 id="understanding-hnsw" class="position-relative d-flex align-items-center group"> Understanding HNSW <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="understanding-hnsw" aria-haspopup="dialog" aria-label="Share link: Understanding HNSW"> Share link </button> </h4>HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search: <ul> <li>Hierarchical: Multi-layer graph structure</li> <li>Navigable: Efficiently finds approximate neighbors</li> <li>Small World: Short paths between any two points</li> <li>Performance: O(log n) search time</li> </ul> <h4 id="create-vector-index" class="position-relative d-flex align-items-center group"> Create Vector Index <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="create-vector-index" aria-haspopup="dialog" aria-label="Share link: Create Vector Index"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-gql" data-lang="gql">-- Create HNSW index on embedding field CREATE INDEX movie_embedding_idx ON Movie(embedding) USING vector; </code></pre></div>In Python: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Create vector index await conn.execute(""" CREATE INDEX movie_embedding_idx ON Movie(embedding) USING vector """) print("✓ Created HNSW vector index") </code></pre></div> <h4 id="index-parameters-optional" class="position-relative d-flex align-items-center group"> Index Parameters (Optional) <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="index-parameters-optional" aria-haspopup="dialog" aria-label="Share link: Index Parameters (Optional)"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Advanced: Configure HNSW parameters await conn.execute(""" CREATE INDEX movie_embedding_advanced_idx ON Movie(embedding) USING vector WITH { m: 16, -- Max connections per layer (default: 16) ef_construction: 200, -- Size of dynamic candidate list (default: 200) ef_search: 100 -- Search-time candidate list size (default: 100) } """) </code></pre></div>Parameters: <ul> <li><code>m</code>: Higher = better recall, more memory (typical: 12-48)</li> <li><code>ef_construction</code>: Higher = better index quality, slower build (typical: 100-400)</li> <li><code>ef_search</code>: Higher = better accuracy, slower search (typical: 50-200)</li> </ul> <h3 id="step-4-similarity-search" class="position-relative d-flex align-items-center group"> Step 4: Similarity Search <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-4-similarity-search" aria-haspopup="dialog" aria-label="Share link: Step 4: Similarity Search"> Share link </button> </h3> <h4 id="find-similar-movies" class="position-relative d-flex align-items-center group"> Find Similar Movies <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="find-similar-movies" aria-haspopup="dialog" aria-label="Share link: Find Similar Movies"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Query: "space exploration adventure" query_text = "space exploration adventure" query_embedding = model.encode(query_text).tolist() # Find similar movies using cosine distance page, _ = await conn.query(""" MATCH (m:Movie) WHERE vector_distance_cosine(m.embedding, $query_vec) < 0.5 RETURN m.title, m.description, vector_distance_cosine(m.embedding, $query_vec) AS distance ORDER BY distance ASC LIMIT 5 """, {'query_vec': query_embedding}) print(f"\nQuery: '{query_text}'") print("\nMost similar movies:") for i, row in enumerate(page.rows, 1): title = row["m.title"].raw_value distance = row["distance"].raw_value description = row["m.description"].raw_value print(f"{i}. {title} (distance: {distance:.3f})") print(f" {description[:80]}...") </code></pre></div>Expected output: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">Query: 'space exploration adventure' Most similar movies: 1. Interstellar (distance: 0.156) A team of explorers travel through a wormhole in space in an attempt to ensure... 2. The Matrix (distance: 0.312) A computer hacker learns about the true nature of reality and his role in the... 3. Inception (distance: 0.389) A thief who steals corporate secrets through dream-sharing technology is given... </code></pre></div> <h4 id="understanding-distance-metrics" class="position-relative d-flex align-items-center group"> Understanding Distance Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="understanding-distance-metrics" aria-haspopup="dialog" aria-label="Share link: Understanding Distance Metrics"> Share link </button> </h4> <h5 id="cosine-distance" class="position-relative d-flex align-items-center group"> Cosine Distance <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="cosine-distance" aria-haspopup="dialog" aria-label="Share link: Cosine Distance"> Share link </button> </h5><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Cosine distance: 1 - cosine_similarity # Range: [0, 2], where 0 = identical, 2 = opposite # Best for: Text embeddings (direction matters more than magnitude) page, _ = await conn.query(""" MATCH (m:Movie) RETURN m.title, vector_distance_cosine(m.embedding, $query_vec) AS cosine_dist ORDER BY cosine_dist ASC LIMIT 5 """, {'query_vec': query_embedding}) </code></pre></div> <h5 id="l2-distance-euclidean" class="position-relative d-flex align-items-center group"> L2 Distance (Euclidean) <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="l2-distance-euclidean" aria-haspopup="dialog" aria-label="Share link: L2 Distance (Euclidean)"> Share link </button> </h5><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># L2 distance: sqrt(sum((a-b)^2)) # Range: [0, ∞], where 0 = identical # Best for: When magnitude matters (e.g., image embeddings) page, _ = await conn.query(""" MATCH (m:Movie) RETURN m.title, vector_distance_l2(m.embedding, $query_vec) AS l2_dist ORDER BY l2_dist ASC LIMIT 5 """, {'query_vec': query_embedding}) </code></pre></div> <h5 id="inner-product" class="position-relative d-flex align-items-center group"> Inner Product <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="inner-product" aria-haspopup="dialog" aria-label="Share link: Inner Product"> Share link </button> </h5><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Inner product: sum(a * b) # Range: [-∞, ∞], where higher = more similar # Best for: When embeddings are normalized page, _ = await conn.query(""" MATCH (m:Movie) RETURN m.title, vector_inner_product(m.embedding, $query_vec) AS similarity ORDER BY similarity DESC LIMIT 5 """, {'query_vec': query_embedding}) </code></pre></div> <h3 id="step-5-hybrid-search" class="position-relative d-flex align-items-center group"> Step 5: Hybrid Search <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-5-hybrid-search" aria-haspopup="dialog" aria-label="Share link: Step 5: Hybrid Search"> Share link </button> </h3> <h4 id="combine-vector-and-keyword-search" class="position-relative d-flex align-items-center group"> Combine Vector and Keyword Search <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="combine-vector-and-keyword-search" aria-haspopup="dialog" aria-label="Share link: Combine Vector and Keyword Search"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Hybrid search: Vector similarity + keyword filter query_text = "mind-bending reality" query_embedding = model.encode(query_text).tolist() page, _ = await conn.query(""" MATCH (m:Movie) WHERE m.genre = 'Sci-Fi' AND vector_distance_cosine(m.embedding, $query_vec) < 0.6 RETURN m.title, m.genre, vector_distance_cosine(m.embedding, $query_vec) AS distance ORDER BY distance ASC """, {'query_vec': query_embedding}) print(f"\nQuery: '{query_text}' (Genre: Sci-Fi)") for row in page.rows: title = row["m.title"].raw_value distance = row["distance"].raw_value print(f"• {title} (distance: {distance:.3f})") </code></pre></div> <h4 id="weighted-combination" class="position-relative d-flex align-items-center group"> Weighted Combination <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="weighted-combination" aria-haspopup="dialog" aria-label="Share link: Weighted Combination"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Combine vector similarity with rating score page, _ = await conn.query(""" MATCH (m:Movie) WITH m, vector_distance_cosine(m.embedding, $query_vec) AS vec_dist, m.rating AS rating RETURN m.title, vec_dist, rating, (vec_dist * 0.7 + (1 - rating/10) * 0.3) AS combined_score ORDER BY combined_score ASC LIMIT 5 """, {'query_vec': query_embedding}) </code></pre></div> <h3 id="step-6-advanced-patterns" class="position-relative d-flex align-items-center group"> Step 6: Advanced Patterns <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-6-advanced-patterns" aria-haspopup="dialog" aria-label="Share link: Step 6: Advanced Patterns"> Share link </button> </h3> <h4 id="batch-similarity-search" class="position-relative d-flex align-items-center group"> Batch Similarity Search <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="batch-similarity-search" aria-haspopup="dialog" aria-label="Share link: Batch Similarity Search"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Find similar items for multiple queries at once queries = [ "space adventure", "prison drama", "superhero action" ] for query_text in queries: query_emb = model.encode(query_text).tolist() page, _ = await conn.query(""" MATCH (m:Movie) RETURN m.title, vector_distance_cosine(m.embedding, $query_vec) AS distance ORDER BY distance ASC LIMIT 3 """, {'query_vec': query_emb}) print(f"\nQuery: '{query_text}'") for row in page.rows: title = row["m.title"].raw_value distance = row["distance"].raw_value print(f" • {title} ({distance:.3f})") </code></pre></div> <h4 id="find-items-similar-to-an-existing-item" class="position-relative d-flex align-items-center group"> Find Items Similar to an Existing Item <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="find-items-similar-to-an-existing-item" aria-haspopup="dialog" aria-label="Share link: Find Items Similar to an Existing Item"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># "More like this" - Find movies similar to The Matrix page, _ = await conn.query(""" MATCH (ref:Movie {title: 'The Matrix'}) MATCH (similar:Movie) WHERE similar <> ref AND vector_distance_cosine(similar.embedding, ref.embedding) < 0.4 RETURN similar.title, vector_distance_cosine(similar.embedding, ref.embedding) AS distance ORDER BY distance ASC LIMIT 5 """) print("\nMovies similar to 'The Matrix':") for row in page.rows: title = row["similar.title"].raw_value distance = row["distance"].raw_value print(f"• {title} (distance: {distance:.3f})") </code></pre></div> <h4 id="clustering-similar-items" class="position-relative d-flex align-items-center group"> Clustering Similar Items <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="clustering-similar-items" aria-haspopup="dialog" aria-label="Share link: Clustering Similar Items"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Use K-means clustering (external library) from sklearn.cluster import KMeans # Get all embeddings page, _ = await conn.query(""" MATCH (m:Movie) RETURN m.id, m.title, m.embedding """) embeddings = np.array([row["m.embedding"].raw_value for row in page.rows]) movie_ids = [row["m.id"].raw_value for row in page.rows] titles = [row["m.title"].raw_value for row in page.rows] # Cluster into 2 groups kmeans = KMeans(n_clusters=2, random_state=42) clusters = kmeans.fit_predict(embeddings) # Store cluster assignments for movie_id, cluster in zip(movie_ids, clusters): await conn.execute(""" MATCH (m:Movie {id: $id}) SET m.cluster = $cluster """, {'id': movie_id, 'cluster': int(cluster)}) # View clusters for cluster_id in range(2): print(f"\nCluster {cluster_id}:") cluster_titles = [t for t, c in zip(titles, clusters) if c == cluster_id] for title in cluster_titles: print(f" • {title}") </code></pre></div> <h3 id="step-7-performance-optimization" class="position-relative d-flex align-items-center group"> Step 7: Performance Optimization <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-7-performance-optimization" aria-haspopup="dialog" aria-label="Share link: Step 7: Performance Optimization"> Share link </button> </h3> <h4 id="index-tuning" class="position-relative d-flex align-items-center group"> Index Tuning <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="index-tuning" aria-haspopup="dialog" aria-label="Share link: Index Tuning"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Profile query performance page, _ = await conn.query(""" PROFILE MATCH (m:Movie) WHERE vector_distance_cosine(m.embedding, $query_vec) < 0.5 RETURN m.title, vector_distance_cosine(m.embedding, $query_vec) AS dist ORDER BY dist ASC LIMIT 10 """, {'query_vec': query_embedding}) # Check if index is used # Look for "IndexScan [movie_embedding_idx]" in profile output </code></pre></div> <h4 id="adjust-hnsw-parameters" class="position-relative d-flex align-items-center group"> Adjust HNSW Parameters <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="adjust-hnsw-parameters" aria-haspopup="dialog" aria-label="Share link: Adjust HNSW Parameters"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Rebuild index with different parameters for better recall await conn.execute("DROP INDEX movie_embedding_idx") await conn.execute(""" CREATE INDEX movie_embedding_idx ON Movie(embedding) USING vector WITH { m: 32, -- More connections = better recall ef_construction: 400, -- Higher quality index ef_search: 200 -- More thorough search } """) # Trade-off: Better accuracy, but slower search and more memory </code></pre></div> <h4 id="limit-search-space" class="position-relative d-flex align-items-center group"> Limit Search Space <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="limit-search-space" aria-haspopup="dialog" aria-label="Share link: Limit Search Space"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Pre-filter with fast index before vector search page, _ = await conn.query(""" MATCH (m:Movie) WHERE m.genre = 'Sci-Fi' -- Fast index lookup first AND vector_distance_cosine(m.embedding, $query_vec) < 0.5 RETURN m.title ORDER BY vector_distance_cosine(m.embedding, $query_vec) ASC LIMIT 10 """, {'query_vec': query_embedding}) </code></pre></div> <h3 id="step-8-production-best-practices" class="position-relative d-flex align-items-center group"> Step 8: Production Best Practices <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="step-8-production-best-practices" aria-haspopup="dialog" aria-label="Share link: Step 8: Production Best Practices"> Share link </button> </h3> <h4 id="normalize-embeddings" class="position-relative d-flex align-items-center group"> Normalize Embeddings <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="normalize-embeddings" aria-haspopup="dialog" aria-label="Share link: Normalize Embeddings"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Normalize vectors for consistent comparisons def normalize_embedding(embedding): """L2 normalization""" norm = np.linalg.norm(embedding) return (embedding / norm).tolist() if norm > 0 else embedding.tolist() # Use normalized embeddings text = "example text" embedding = model.encode(text) normalized_emb = normalize_embedding(embedding) await conn.execute(""" CREATE (:Item { text: $text, embedding: $embedding }) """, {'text': text, 'embedding': normalized_emb}) </code></pre></div> <h4 id="handle-missing-embeddings" class="position-relative d-flex align-items-center group"> Handle Missing Embeddings <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="handle-missing-embeddings" aria-haspopup="dialog" aria-label="Share link: Handle Missing Embeddings"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Gracefully handle items without embeddings page, _ = await conn.query(""" MATCH (m:Movie) WHERE m.embedding IS NOT NULL AND vector_distance_cosine(m.embedding, $query_vec) < 0.5 RETURN m.title ORDER BY vector_distance_cosine(m.embedding, $query_vec) ASC """, {'query_vec': query_embedding}) </code></pre></div> <h4 id="batch-embedding-generation" class="position-relative d-flex align-items-center group"> Batch Embedding Generation <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="batch-embedding-generation" aria-haspopup="dialog" aria-label="Share link: Batch Embedding Generation"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Generate embeddings in batches for efficiency def batch_generate_embeddings(texts, batch_size=32): """Generate embeddings in batches""" embeddings = [] for i in range(0, len(texts), batch_size): batch = texts[i:i + batch_size] batch_embeddings = model.encode(batch) embeddings.extend(batch_embeddings) return embeddings # Use for large datasets texts = [f"{m['title']}. {m['description']}" for m in movies] embeddings = batch_generate_embeddings(texts, batch_size=32) for movie, embedding in zip(movies, embeddings): movie['embedding'] = embedding.tolist() </code></pre></div> <h3 id="complete-example-product-recommendations" class="position-relative d-flex align-items-center group"> Complete Example: Product Recommendations <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="complete-example-product-recommendations" aria-haspopup="dialog" aria-label="Share link: Complete Example: Product Recommendations"> Share link </button> </h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># E-commerce product recommendation system # Sample products products = [ {"id": "p1", "name": "Wireless Headphones", "desc": "Premium noise-cancelling wireless headphones with 30-hour battery", "price": 299}, {"id": "p2", "name": "Bluetooth Speaker", "desc": "Portable waterproof Bluetooth speaker with deep bass", "price": 89}, {"id": "p3", "name": "Smart Watch", "desc": "Fitness tracking smartwatch with heart rate monitor and GPS", "price": 399}, {"id": "p4", "name": "Running Shoes", "desc": "Lightweight running shoes with advanced cushioning technology", "price": 120}, {"id": "p5", "name": "Fitness Tracker", "desc": "Activity tracker with sleep monitoring and step counting", "price": 79}, ] # Generate and store embeddings await conn.execute("CREATE GRAPH ProductRecommendations; USE ProductRecommendations;") for product in products: text = f"{product['name']}. {product['desc']}" embedding = model.encode(text).tolist() await conn.execute(""" CREATE (:Product { id: $id, name: $name, description: $desc, price: $price, embedding: $embedding }) """, {**product, 'embedding': embedding}) # Create index await conn.execute("CREATE INDEX product_emb_idx ON Product(embedding) USING vector") # User query: "fitness gadget for running" query = "fitness gadget for running" query_emb = model.encode(query).tolist() page, _ = await conn.query(""" MATCH (p:Product) WHERE vector_distance_cosine(p.embedding, $query_vec) < 0.7 RETURN p.name, p.price, vector_distance_cosine(p.embedding, $query_vec) AS relevance ORDER BY relevance ASC LIMIT 3 """, {'query_vec': query_emb}) print(f"\nRecommendations for: '{query}'") for i, row in enumerate(page.rows, 1): name = row["p.name"].raw_value price = row["p.price"].raw_value relevance = row["relevance"].raw_value print(f"{i}. {name} - ${price} (score: {1-relevance:.2f})") # Expected output: # Recommendations for: 'fitness gadget for running' # 1. Fitness Tracker - $79 (score: 0.85) # 2. Smart Watch - $399 (score: 0.78) # 3. Running Shoes - $120 (score: 0.72) </code></pre></div> <h3 id="troubleshooting" class="position-relative d-flex align-items-center group"> Troubleshooting <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="troubleshooting" aria-haspopup="dialog" aria-label="Share link: Troubleshooting"> Share link </button> </h3> <h4 id="index-not-being-used" class="position-relative d-flex align-items-center group"> Index Not Being Used <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="index-not-being-used" aria-haspopup="dialog" aria-label="Share link: Index Not Being Used"> Share link </button> </h4>Problem: Queries slow despite having vector index Solutions: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># 1. Verify index exists await conn.query("SHOW INDEXES ON Movie") # 2. Check index is ready # (Index builds asynchronously) # 3. Use EXPLAIN to verify await conn.query(""" EXPLAIN MATCH (m:Movie) WHERE vector_distance_cosine(m.embedding, $vec) < 0.5 RETURN m """, {'vec': query_embedding}) # Look for "IndexScan [movie_embedding_idx]" </code></pre></div> <h4 id="out-of-memory" class="position-relative d-flex align-items-center group"> Out of Memory <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="out-of-memory" aria-haspopup="dialog" aria-label="Share link: Out of Memory"> Share link </button> </h4>Problem: Large embeddings cause memory issues Solutions: <ul> <li>Use smaller embedding models (384 dims vs 768 dims)</li> <li>Reduce <code>ef_construction</code> and <code>m</code> parameters</li> <li>Index only frequently searched items</li> <li>Use dimensionality reduction (PCA)</li> </ul> <h4 id="poor-recall" class="position-relative d-flex align-items-center group"> Poor Recall <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="poor-recall" aria-haspopup="dialog" aria-label="Share link: Poor Recall"> Share link </button> </h4>Problem: Missing relevant results Solutions: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Increase ef_search parameter await conn.execute(""" CREATE INDEX better_recall_idx ON Movie(embedding) USING vector WITH {ef_search: 300} -- Higher = better recall """) # Increase distance threshold page, _ = await conn.query(""" MATCH (m:Movie) WHERE vector_distance_cosine(m.embedding, $query_vec) < 0.8 -- More lenient RETURN m.title """, {'query_vec': query_embedding}) </code></pre></div> <h3 id="next-steps" class="position-relative d-flex align-items-center group"> Next Steps <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="next-steps" aria-haspopup="dialog" aria-label="Share link: Next Steps"> Share link </button> </h3><ul> <li><a href="/docs/analytics/real-time-analytics/" >Real-Time Analytics</a> - Stream embeddings with CDC</li> <li><a href="/docs/analytics/graph-algorithms/" >Graph Algorithms</a> - Combine vector search with graph traversal</li> <li><a href="/docs/performance/" >Performance Tuning</a> - Optimize large-scale vector search</li> <li><a href="/docs/data-types/" >Data Types Reference</a> - VectorF32 and VectorI32 types</li> </ul> <h3 id="quick-reference" class="position-relative d-flex align-items-center group"> Quick Reference <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="quick-reference" aria-haspopup="dialog" aria-label="Share link: Quick Reference"> Share link </button> </h3> <h4 id="distance-functions" class="position-relative d-flex align-items-center group"> Distance Functions <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="distance-functions" aria-haspopup="dialog" aria-label="Share link: Distance Functions"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-gql" data-lang="gql">-- Cosine distance (0-2, lower = more similar) vector_distance_cosine(vec1, vec2) -- L2 distance (Euclidean) vector_distance_l2(vec1, vec2) -- Inner product (higher = more similar) vector_inner_product(vec1, vec2) -- Manhattan distance (L1) vector_distance_l1(vec1, vec2) </code></pre></div> <h4 id="index-commands" class="position-relative d-flex align-items-center group"> Index Commands <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="index-commands" aria-haspopup="dialog" aria-label="Share link: Index Commands"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-gql" data-lang="gql">-- Create vector index CREATE INDEX idx_name ON Label(property) USING vector; -- Create with parameters CREATE INDEX idx_name ON Label(property) USING vector WITH {m: 16, ef_construction: 200, ef_search: 100}; -- Drop index DROP INDEX idx_name; -- Show indexes SHOW INDEXES ON Label; </code></pre></div> <h4 id="python-helpers" class="position-relative d-flex align-items-center group"> Python Helpers <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="python-helpers" aria-haspopup="dialog" aria-label="Share link: Python Helpers"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># Normalize vector def normalize(vec): return vec / np.linalg.norm(vec) # Cosine similarity def cosine_sim(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) # L2 distance def l2_dist(a, b): return np.linalg.norm(a - b) </code></pre></div><hr> Tutorial Complete! You now understand vector similarity search in Geode. Next: <a href="/docs/tutorials/transaction-patterns-tutorial/" >Transaction Patterns Tutorial</a>