Index Types Reference
Geode provides six specialized index families optimized for different access patterns. This reference documents all index types, their configuration options, and performance characteristics.
Index Overview
| Index Type | Best For | Complexity | Storage Overhead |
|---|---|---|---|
| B-tree | Range queries, sorting | O(log N) | Low-Medium |
| Hash | Exact equality | O(1) | Low |
| Full-text | Text search with ranking | O(log N) | Medium-High |
| Spatial | Geographic queries | O(log N) | Medium |
| Vector (HNSW) | Similarity search | O(log N) | High |
| Composite | Multi-property queries | O(log N) | Medium |
B-tree Index
B-tree indexes are the default index type, providing balanced performance for range queries and equality lookups.
Syntax
CREATE INDEX [index_name] ON :Label(property) USING BTREE;
CREATE INDEX [index_name] ON :Label(property); -- BTREE is default
Configuration Options
CREATE INDEX person_age_idx ON :Person(age) USING BTREE
OPTIONS {
fill_factor: 0.9, -- Page fill factor (default: 0.9)
unique: false, -- Enforce uniqueness
nulls_distinct: true -- Treat NULLs as distinct values
};
| Option | Type | Default | Description |
|---|---|---|---|
fill_factor | Float | 0.9 | Page fill percentage (0.5-1.0) |
unique | Boolean | false | Enforce unique constraint |
nulls_distinct | Boolean | true | Whether NULL values are distinct |
Supported Operations
| Operation | Supported | Example |
|---|---|---|
Equality (=) | Yes | WHERE p.age = 30 |
Range (<, >, <=, >=) | Yes | WHERE p.age > 25 |
| BETWEEN | Yes | WHERE p.age BETWEEN 20 AND 40 |
| IS NULL | Yes | WHERE p.age IS NULL |
| STARTS WITH | Yes | WHERE p.name STARTS WITH 'Al' |
| ORDER BY | Yes | ORDER BY p.age |
| IN list | Yes | WHERE p.age IN [25, 30, 35] |
Performance Characteristics
Operation | Time Complexity | Notes
-------------------|-----------------|----------------------------------
Point lookup | O(log N) | Tree traversal
Range scan | O(log N + K) | K = result count
Insert | O(log N) | May trigger page split
Delete | O(log N) | May trigger page merge
Bulk load | O(N log N) | Sequential insert optimization
Examples
-- Range query with ordering
CREATE INDEX person_age_idx ON :Person(age) USING BTREE;
MATCH (p:Person)
WHERE p.age >= 25 AND p.age <= 65
RETURN p.name, p.age
ORDER BY p.age DESC;
-- Prefix search
CREATE INDEX person_name_idx ON :Person(name) USING BTREE;
MATCH (p:Person)
WHERE p.name STARTS WITH 'Alex'
RETURN p.name;
-- Unique constraint
CREATE INDEX person_email_unique ON :Person(email) USING BTREE
OPTIONS { unique: true };
Hash Index
Hash indexes provide O(1) lookup performance for exact equality queries.
Syntax
CREATE INDEX [index_name] ON :Label(property) USING HASH;
Configuration Options
CREATE INDEX user_id_idx ON :User(id) USING HASH
OPTIONS {
bucket_count: 1024, -- Initial hash buckets
load_factor: 0.75 -- Resize threshold
};
| Option | Type | Default | Description |
|---|---|---|---|
bucket_count | Integer | 1024 | Initial number of hash buckets |
load_factor | Float | 0.75 | Resize when load exceeds this |
Supported Operations
| Operation | Supported | Example |
|---|---|---|
Equality (=) | Yes | WHERE u.email = '[email protected]' |
| IN list | Yes | WHERE u.id IN [1, 2, 3] |
| IS NULL | Yes | WHERE u.phone IS NULL |
Range (<, >) | No | Use B-tree instead |
| ORDER BY | No | Use B-tree instead |
Performance Characteristics
Operation | Time Complexity | Notes
-------------------|-----------------|----------------------------------
Point lookup | O(1) average | O(N) worst case with collisions
Insert | O(1) average | May trigger rehash
Delete | O(1) average | Tombstone-based
Bulk load | O(N) | Linear time
Examples
-- High-cardinality equality lookup
CREATE INDEX user_email_idx ON :User(email) USING HASH;
MATCH (u:User)
WHERE u.email = 'alice@example.com'
RETURN u;
-- UUID lookups
CREATE INDEX session_id_idx ON :Session(id) USING HASH;
MATCH (s:Session)
WHERE s.id = '550e8400-e29b-41d4-a716-446655440000'
RETURN s;
When to Use Hash vs B-tree
| Scenario | Recommendation |
|---|---|
| Only equality queries | Hash (faster) |
| Need range queries | B-tree |
| Need sorting | B-tree |
| High-cardinality unique values | Hash |
| Low-cardinality (few distinct values) | B-tree |
Full-text Index
Full-text indexes enable text search with BM25 relevance ranking.
Syntax
CREATE INDEX [index_name] ON :Label(property) USING FULLTEXT;
Configuration Options
CREATE INDEX article_content_idx ON :Article(content) USING FULLTEXT
OPTIONS {
language: 'english', -- Stemming language
min_word_length: 3, -- Minimum word length
stop_words: 'default', -- Stop word list
k1: 1.2, -- BM25 term frequency saturation
b: 0.75 -- BM25 length normalization
};
| Option | Type | Default | Description |
|---|---|---|---|
language | String | ’english' | Stemming/stopwords language |
min_word_length | Integer | 2 | Minimum indexed word length |
stop_words | String | ‘default’ | Stop word list (‘default’, ’none’, or custom) |
k1 | Float | 1.2 | BM25 term frequency saturation |
b | Float | 0.75 | BM25 length normalization (0-1) |
Supported Operations
| Operation | Supported | Example |
|---|---|---|
fulltext_match() | Yes | WHERE fulltext_match(a.content, 'graph database') |
bm25_score() | Yes | RETURN bm25_score(a.content, 'query') AS score |
| Boolean operators | Yes | 'graph AND database', 'graph OR network' |
| Phrase search | Yes | '"graph database"' |
| Wildcard | Yes | 'graph*' |
| Fuzzy search | Yes | 'grph~' (edit distance 1) |
BM25 Scoring
The BM25 algorithm provides relevance ranking:
BM25(d, q) = SUM_i [ IDF(qi) * f(qi,d) * (k1+1) / (f(qi,d) + k1*(1-b+b*|d|/avgdl)) ]
Where:
- f(qi,d) = frequency of term qi in document d
- |d| = document length (word count)
- avgdl = average document length
- k1 = term frequency saturation (higher = more weight to frequency)
- b = length normalization (higher = penalize long documents more)
Examples
-- Basic full-text search
CREATE INDEX doc_content_idx ON :Document(content) USING FULLTEXT;
MATCH (d:Document)
WHERE fulltext_match(d.content, 'graph database performance')
RETURN d.title, bm25_score(d.content, 'graph database performance') AS relevance
ORDER BY relevance DESC
LIMIT 10;
-- Boolean operators
MATCH (d:Document)
WHERE fulltext_match(d.content, 'graph AND (database OR storage) NOT sql')
RETURN d.title;
-- Phrase search
MATCH (d:Document)
WHERE fulltext_match(d.content, '"property graph"')
RETURN d.title;
-- Fuzzy search (typo-tolerant)
MATCH (d:Document)
WHERE fulltext_match(d.content, 'databse~') -- Matches "database"
RETURN d.title;
Spatial Index
Spatial indexes (R-tree) enable efficient geographic queries.
Syntax
CREATE INDEX [index_name] ON :Label(property) USING SPATIAL;
Configuration Options
CREATE INDEX location_idx ON :Store(location) USING SPATIAL
OPTIONS {
srid: 4326, -- Spatial Reference ID (WGS84)
precision: 'high' -- Index precision
};
| Option | Type | Default | Description |
|---|---|---|---|
srid | Integer | 4326 | Spatial Reference System ID |
precision | String | ‘high’ | Index precision (’low’, ‘medium’, ‘high’) |
Supported Operations
| Operation | Supported | Example |
|---|---|---|
geo_distance() | Yes | WHERE geo_distance(s.location, point) < 5000 |
geo_within_box() | Yes | WHERE geo_within_box(s.location, sw, ne) |
geo_within_polygon() | Yes | WHERE geo_within_polygon(s.location, polygon) |
geo_contains() | Yes | WHERE geo_contains(region, point) |
geo_intersects() | Yes | WHERE geo_intersects(area1, area2) |
Performance Characteristics
Operation | Time Complexity | Notes
-------------------|-----------------|----------------------------------
Point-in-region | O(log N) | R-tree traversal
k-nearest | O(log N + K) | Priority queue search
Range query | O(log N + K) | Bounding box filter
Insert | O(log N) | May trigger split
Examples
-- Find stores within 5km
CREATE INDEX store_location_idx ON :Store(location) USING SPATIAL;
MATCH (s:Store)
WHERE geo_distance(s.location, latlon(40.7128, -74.0060)) < 5000
RETURN s.name, geo_distance(s.location, latlon(40.7128, -74.0060)) AS meters
ORDER BY meters
LIMIT 10;
-- Bounding box query
MATCH (s:Store)
WHERE geo_within_box(s.location,
latlon(40.70, -74.05), -- Southwest corner
latlon(40.75, -73.95)) -- Northeast corner
RETURN s.name;
-- Polygon containment
WITH polygon([
latlon(40.70, -74.05),
latlon(40.70, -73.95),
latlon(40.75, -73.95),
latlon(40.75, -74.05)
]) AS nyc_area
MATCH (s:Store)
WHERE geo_within_polygon(s.location, nyc_area)
RETURN s.name;
Vector Index (HNSW)
Vector indexes enable approximate nearest neighbor (ANN) search using HNSW (Hierarchical Navigable Small World) algorithm with SIMD acceleration.
Syntax
CREATE INDEX [index_name] ON :Label(property) USING VECTOR;
Configuration Options
CREATE INDEX embedding_idx ON :Document(embedding) USING VECTOR
OPTIONS {
dimensions: 512, -- Vector dimensions (required if not inferred)
metric: 'l2', -- Distance metric
m: 16, -- HNSW max connections per layer
ef_construction: 200, -- Build-time search width
ef_search: 100 -- Query-time search width
};
| Option | Type | Default | Description |
|---|---|---|---|
dimensions | Integer | (inferred) | Vector dimensions |
metric | String | ’l2’ | Distance metric: ’l2’, ‘cosine’, ‘dot’ |
m | Integer | 16 | Max connections per node per layer |
ef_construction | Integer | 200 | Construction search width (accuracy vs speed) |
ef_search | Integer | 100 | Query search width (accuracy vs speed) |
Distance Metrics
| Metric | Function | Best For |
|---|---|---|
l2 | vector_distance_l2() | Euclidean distance (general purpose) |
cosine | vector_distance_cosine() | Normalized vectors (text embeddings) |
dot | vector_distance_dot() | Inner product (maximum inner product search) |
Performance Characteristics
Operation | Time Complexity | Notes
-------------------|-----------------|----------------------------------
k-NN search | O(log N) | Approximate, SIMD-accelerated
Insert | O(log N) | Builds graph structure
Delete | O(log N) | Marks as deleted (lazy cleanup)
Build | O(N log N) | Initial index construction
HNSW Tuning Guide
| Parameter | Low Value | High Value |
|---|---|---|
m | Faster build, less memory | Better recall, more memory |
ef_construction | Faster build | Better index quality |
ef_search | Faster queries | Better recall |
Recommended settings by use case:
| Use Case | m | ef_construction | ef_search |
|---|---|---|---|
| Low latency | 8 | 100 | 50 |
| Balanced | 16 | 200 | 100 |
| High recall | 32 | 400 | 200 |
Examples
-- Create vector index
CREATE INDEX doc_embedding_idx ON :Document(embedding) USING VECTOR
OPTIONS { metric: 'cosine', m: 16, ef_search: 100 };
-- k-NN similarity search
MATCH (d:Document)
WITH d, vector_distance_cosine(d.embedding, $query_vector) AS distance
WHERE distance < 0.5
RETURN d.title, distance
ORDER BY distance
LIMIT 10;
-- Find similar documents to a given document
MATCH (source:Document {id: 123})
MATCH (d:Document)
WHERE d.id <> source.id
WITH d, vector_distance_cosine(d.embedding, source.embedding) AS similarity
RETURN d.title, similarity
ORDER BY similarity
LIMIT 5;
-- Hybrid search: vector + filter
MATCH (d:Document)
WHERE d.category = 'technical'
WITH d, vector_distance_l2(d.embedding, $query_vector) AS distance
RETURN d.title, distance
ORDER BY distance
LIMIT 10;
Composite Index
Composite indexes span multiple properties for complex query patterns.
Syntax
CREATE INDEX [index_name] ON :Label(prop1, prop2, ...) USING BTREE;
Configuration Options
Same as B-tree index.
Column Ordering
Important: Column order affects which queries can use the index.
-- Index on (lastname, firstname)
CREATE INDEX person_name_idx ON :Person(lastname, firstname);
-- Can use index:
WHERE p.lastname = 'Smith' -- Leading column
WHERE p.lastname = 'Smith' AND p.firstname = 'John' -- Both columns
-- Cannot use index efficiently:
WHERE p.firstname = 'John' -- Non-leading column only
Examples
-- Composite index for common query pattern
CREATE INDEX order_customer_date_idx ON :Order(customer_id, order_date);
-- Query using both columns (optimal)
MATCH (o:Order)
WHERE o.customer_id = 123 AND o.order_date >= date('2026-01-01')
RETURN o;
-- Query using leading column only (still uses index)
MATCH (o:Order)
WHERE o.customer_id = 123
RETURN o;
Specialized Indexes
Geode implements additional specialized indexes automatically for certain data types.
JSONB Inverted Index
Automatically created for JSONB properties to support path queries.
-- JSONB path queries use inverted index
MATCH (d:Document)
WHERE d.metadata->'author' = '"Alice"'
RETURN d;
MATCH (d:Document)
WHERE d.metadata->'tags'->0 = '"ml"'
RETURN d;
Interval Tree Index
Automatically used for temporal overlap queries.
-- Interval overlap queries
MATCH (e:Event)
WHERE e.duration OVERLAPS interval('[2026-01-01, 2026-01-31]')
RETURN e;
Patricia Trie Index
Automatically used for IP address prefix queries.
-- IP subnet containment
MATCH (s:Server)
WHERE subnet_contains('10.0.0.0/8'::subnet, s.ip)
RETURN s;
Index Management
Listing Indexes
-- Show all indexes
SHOW INDEXES;
-- Show indexes for a label
SHOW INDEXES ON :Person;
-- Show index details
SHOW INDEX person_email_idx;
Dropping Indexes
-- Drop by name
DROP INDEX person_email_idx;
-- Drop with IF EXISTS
DROP INDEX IF EXISTS person_email_idx;
Index Statistics
-- View index statistics
CALL db.index.stats('person_email_idx');
-- Update index statistics
CALL db.index.analyze('person_email_idx');
-- Rebuild index
CALL db.index.rebuild('person_email_idx');
Best Practices
1. Index Selection Guidelines
| Query Pattern | Recommended Index |
|---|---|
| Equality on unique values | Hash |
| Equality on non-unique values | B-tree or Hash |
| Range queries | B-tree |
| Text search | Full-text |
| Geographic queries | Spatial |
| Vector similarity | Vector (HNSW) |
| Multi-property queries | Composite B-tree |
2. Avoid Over-Indexing
- Each index adds write overhead (INSERT/UPDATE/DELETE)
- Each index consumes disk space
- Only index properties used in WHERE, ORDER BY, or JOIN conditions
3. Monitor Index Usage
-- Check if index is being used
EXPLAIN
MATCH (p:Person)
WHERE p.email = 'alice@example.com'
RETURN p;
-- Look for "uses index: <name>" in output
4. Consider Covering Indexes
Include all needed columns to avoid table lookups:
-- Composite index covers the query
CREATE INDEX person_name_email_idx ON :Person(name, email);
-- Query returns indexed columns only (no table lookup)
MATCH (p:Person)
WHERE p.name = 'Alice'
RETURN p.email;
Performance Comparison
| Index Type | Read | Write | Memory | Best Use Case |
|---|---|---|---|---|
| B-tree | Fast | Fast | Medium | General purpose |
| Hash | Fastest | Fast | Low | Unique lookups |
| Full-text | Fast | Slow | High | Text search |
| Spatial | Fast | Medium | Medium | Geographic |
| Vector | Fast | Slow | Very High | ML embeddings |
Related Documentation
- Indexing and Optimization - Query optimization guide
- Data Types Reference - Type-specific indexing
- Storage Engine Architecture - Index storage internals
- Performance Tuning - Optimization strategies
Last Updated: January 28, 2026 Geode Version: v0.1.3+ ISO GQL Conformance Profile: ISO/IEC 39075:2024 compliance (see conformance profile)