Index Strategy Guide
Indexes are the foundation of query performance in Geode. This guide helps you analyze your query patterns, choose appropriate index types, and maintain indexes effectively.
Index Fundamentals
What Indexes Do
Indexes create auxiliary data structures that allow Geode to find nodes quickly without scanning all nodes with a given label. Without an index, finding a user by email requires examining every User node:
Without index - O(n) scan:
MATCH (u:User)
WHERE u.email = "[email protected]"
RETURN u
// Scans all User nodes
With index - O(log n) lookup:
// After: CREATE INDEX user_email ON :User(email)
MATCH (u:User {email: "[email protected]"})
RETURN u
// Direct index lookup
Index Types in Geode
Geode supports several index types:
| Type | Use Case | Example |
|---|---|---|
| B-tree | Equality, range queries | CREATE INDEX ON :User(age) |
| Hash | Equality only (faster) | CREATE HASH INDEX ON :User(id) |
| Composite | Multi-property lookups | CREATE INDEX ON :User(country, city) |
| Full-text | Text search | CREATE FULLTEXT INDEX ON :Product(name, description) |
| Unique | Enforce uniqueness + index | CREATE CONSTRAINT ON :User(email) ASSERT UNIQUE |
Analyzing Query Patterns
Identify High-Frequency Queries
Before creating indexes, understand your query patterns:
- Log production queries to identify the most frequent
- Profile critical paths in your application
- Review slow query logs for optimization candidates
Example analysis:
Query Pattern Frequency Avg Time
-------------------------------------------- --------- --------
MATCH (u:User {email: ?}) 100k/day 45ms
MATCH (p:Product) WHERE p.price < ? 50k/day 120ms
MATCH (o:Order {status: ?}) 30k/day 80ms
MATCH (u:User)-[:PURCHASED]->(p:Product) 20k/day 15ms
Identify Index Candidates
From the analysis above, prioritize:
- High frequency + high latency: Product price queries
- High frequency: User email lookups
- Moderate frequency + moderate latency: Order status queries
Check Existing Index Usage
See which indexes exist and their usage:
// Show all indexes
SHOW INDEXES
// Output:
// | name | type | label | properties | state |
// |--------------|---------|---------|---------------|--------|
// | user_email | BTREE | User | [email] | ONLINE |
// | product_name | BTREE | Product | [name] | ONLINE |
Use EXPLAIN to verify index usage:
EXPLAIN
MATCH (u:User {email: "[email protected]"})
RETURN u
// Look for "NodeIndexSeek" instead of "NodeByLabelScan"
Choosing the Right Index Type
B-tree Indexes
The default and most versatile index type. Use for:
- Equality comparisons (
=) - Range queries (
<,>,<=,>=) - Prefix matching (
STARTS WITH) - Sorting (
ORDER BY)
// Create B-tree index
CREATE INDEX product_price ON :Product(price)
// Supports these queries:
MATCH (p:Product {price: 99.99}) RETURN p // Equality
MATCH (p:Product) WHERE p.price < 100 RETURN p // Range
MATCH (p:Product) WHERE p.price BETWEEN 50 AND 100 RETURN p // Range
Hash Indexes
Faster than B-tree for equality only. Use when:
- Only equality lookups needed
- No range queries on this property
- Maximum lookup performance required
// Create hash index
CREATE HASH INDEX user_id ON :User(id)
// Optimal for:
MATCH (u:User {id: "usr_12345"}) RETURN u
// NOT suitable for:
MATCH (u:User) WHERE u.id > "usr_10000" RETURN u // Won't use hash index
Composite Indexes
Index multiple properties together. Use when:
- Queries filter on multiple properties
- Specific property combinations are common
- Single-property indexes aren’t selective enough
// Create composite index
CREATE INDEX user_location ON :User(country, city)
// Optimal for:
MATCH (u:User {country: "USA", city: "NYC"}) RETURN u // Uses full index
MATCH (u:User {country: "USA"}) RETURN u // Uses index prefix
// NOT optimal for:
MATCH (u:User {city: "NYC"}) RETURN u // Can't use index (missing leading property)
Full-Text Indexes
For text search operations. Use when:
- Searching within text content
- Need CONTAINS or fuzzy matching
- Building search functionality
// Create full-text index
CREATE FULLTEXT INDEX product_search ON :Product(name, description)
// Query with full-text search
CALL db.index.fulltext.queryNodes("product_search", "wireless bluetooth")
YIELD node, score
RETURN node.name, score
ORDER BY score DESC
LIMIT 10
Unique Constraints
Enforce uniqueness while creating an implicit index:
// Create unique constraint (also creates index)
CREATE CONSTRAINT user_email_unique ON :User(email) ASSERT UNIQUE
// Automatically indexed and enforces uniqueness
MATCH (u:User {email: "[email protected]"}) RETURN u
Index Coverage and Selectivity
Understanding Selectivity
Selectivity measures how well an index narrows down results. High selectivity is better.
| Property | Unique Values | Total Nodes | Selectivity |
|---|---|---|---|
| 1,000,000 | 1,000,000 | 100% (excellent) | |
| country | 200 | 1,000,000 | 0.02% (poor) |
| status | 5 | 1,000,000 | 0.0005% (very poor) |
High selectivity (good index candidates):
- User IDs, emails, usernames
- Product SKUs, ISBNs
- Order numbers, transaction IDs
Low selectivity (poor index candidates alone):
- Boolean flags (active/inactive)
- Status fields (few values)
- Country codes
Improving Selectivity with Composite Indexes
Combine low-selectivity properties for better performance:
// Poor: Low selectivity on country alone
CREATE INDEX user_country ON :User(country)
// Better: Composite improves selectivity
CREATE INDEX user_country_city ON :User(country, city)
// Query benefits from higher combined selectivity
MATCH (u:User {country: "USA", city: "San Francisco"})
RETURN u
Index Coverage
An index “covers” a query when all needed data is in the index itself, avoiding node lookup:
// Covered query - index has all needed properties
// Index: CREATE INDEX ON :User(email, name)
MATCH (u:User {email: "[email protected]"})
RETURN u.name // Retrieved from index, no node access
// Non-covered query - requires node access
MATCH (u:User {email: "[email protected]"})
RETURN u.name, u.created_at // created_at not in index
Consider covering indexes for your most critical queries:
// Create covering index for common query
CREATE INDEX user_email_profile ON :User(email, name, avatar_url)
// This query is fully covered:
MATCH (u:User {email: $email})
RETURN u.name, u.avatar_url
Composite Index Ordering
Property Order Matters
In composite indexes, the order of properties affects which queries can use the index:
// Index: (country, city, zip)
CREATE INDEX user_geo ON :User(country, city, zip)
// Can use index (matches prefix):
MATCH (u:User {country: "USA"}) ... // Uses index
MATCH (u:User {country: "USA", city: "NYC"}) ... // Uses index
MATCH (u:User {country: "USA", city: "NYC", zip: "10001"}) ... // Uses index
// Cannot use index (skips prefix):
MATCH (u:User {city: "NYC"}) ... // Full scan
MATCH (u:User {zip: "10001"}) ... // Full scan
MATCH (u:User {country: "USA", zip: "10001"}) ... // Partial scan
Ordering Guidelines
- Most filtered first: Put the most commonly filtered property first
- High selectivity first: Put more selective properties earlier
- Equality before range: Equality predicates should precede range predicates
Example:
// Query pattern:
MATCH (o:Order)
WHERE o.status = "pending" AND o.created_at > $since
RETURN o
// Good index order (equality first, then range):
CREATE INDEX order_status_date ON :Order(status, created_at)
// Poor index order (range first):
CREATE INDEX order_date_status ON :Order(created_at, status) // Less efficient
Multiple Index Strategies
For different query patterns, you may need multiple indexes:
// Query 1: Filter by status, sort by date
MATCH (o:Order {status: "pending"})
RETURN o ORDER BY o.created_at DESC
// Query 2: Filter by customer, sort by date
MATCH (o:Order {customer_id: $id})
RETURN o ORDER BY o.created_at DESC
// Create indexes for both patterns:
CREATE INDEX order_status_date ON :Order(status, created_at)
CREATE INDEX order_customer_date ON :Order(customer_id, created_at)
Index Maintenance Overhead
Write Performance Impact
Every index adds overhead to write operations:
| Operation | Without Index | With 1 Index | With 3 Indexes |
|---|---|---|---|
| CREATE | 1x | 1.2x | 1.6x |
| SET | 1x | 1.1x (if indexed prop) | 1.3x |
| DELETE | 1x | 1.2x | 1.6x |
Memory Usage
Indexes consume memory. Monitor index memory:
// Check index sizes
CALL db.indexes() YIELD name, size
RETURN name, size
ORDER BY size DESC
Balancing Read vs Write Performance
Consider your workload characteristics:
Read-heavy workloads (90% reads, 10% writes):
- More indexes are acceptable
- Prioritize query performance
- Consider covering indexes
Write-heavy workloads (30% reads, 70% writes):
- Minimize indexes
- Only index critical query paths
- Avoid redundant indexes
Balanced workloads:
- Index selectively
- Monitor both read and write latencies
- Review index usage periodically
Monitoring Index Usage
Check Index Statistics
// View index usage statistics
CALL db.indexes() YIELD name, type, state, populationPercent, size, usageCount
RETURN name, type, state, populationPercent, size, usageCount
ORDER BY usageCount DESC
Identify Unused Indexes
Indexes with zero or low usage count may be candidates for removal:
// Find potentially unused indexes
CALL db.indexes() YIELD name, usageCount
WHERE usageCount < 100 // Threshold depends on your usage
RETURN name, usageCount
Profile Query Index Usage
PROFILE
MATCH (u:User {country: "USA", city: "NYC"})
RETURN u.name
// Check output for:
// - "NodeIndexSeek" = index used
// - "NodeByLabelScan" = index NOT used
When NOT to Index
Avoid Indexing
Low-selectivity properties:
// Poor: Boolean with ~50/50 distribution
CREATE INDEX user_active ON :User(active) // Only 2 values
// Usually better: Combine with other properties
CREATE INDEX user_active_created ON :User(active, created_at)
Rarely queried properties:
// Avoid: Internal metadata rarely searched
CREATE INDEX user_internal_notes ON :User(internal_notes)
Frequently updated properties:
// Avoid: Properties that change constantly
CREATE INDEX user_last_seen ON :User(last_seen_at) // Updates every request
// Better: Batch updates or use different strategy
Very large text properties:
// Avoid: Large text blobs
CREATE INDEX article_content ON :Article(content) // May be megabytes
// Better: Use full-text index for searchable content
CREATE FULLTEXT INDEX article_search ON :Article(title, summary)
Alternative Strategies
Denormalization instead of indexing:
// Instead of indexing deep traversals, denormalize
// Add commonly needed data directly to nodes
SET user.company_name = "Acme Corp" // Avoid JOIN/traverse
Materialized aggregates:
// Instead of counting on every query
// Maintain a count property
SET user.post_count = user.post_count + 1
Application-level caching:
# Cache frequently accessed, rarely changed data
@cache(ttl=3600)
def get_user_by_email(email):
return db.query("MATCH (u:User {email: $email}) RETURN u", email=email)
Index Creation Best Practices
Naming Conventions
Use descriptive, consistent names:
// Good: label_property pattern
CREATE INDEX user_email ON :User(email)
CREATE INDEX product_sku ON :Product(sku)
CREATE INDEX order_status_date ON :Order(status, created_at)
// Good: Include purpose for special indexes
CREATE INDEX user_search_name ON :User(normalized_name) // For search
CREATE INDEX user_unique_email ON :User(email) // Unique constraint
// Avoid: Cryptic names
CREATE INDEX idx1 ON :User(email)
CREATE INDEX ue ON :User(email)
Creating Indexes in Production
Create indexes asynchronously to avoid blocking:
// Create index (may block queries during build)
CREATE INDEX user_email ON :User(email)
// Monitor index build progress
SHOW INDEXES
WHERE name = "user_email"
// State will be "POPULATING" then "ONLINE"
For large databases, schedule index creation during low-traffic periods.
Testing Index Effectiveness
Before deploying new indexes:
- Test in staging with production-like data
- Profile before and after:
// Before index
PROFILE
MATCH (u:User {email: "[email protected]"})
RETURN u
// Note: NodeByLabelScan, rows examined, time
// Create index
CREATE INDEX user_email ON :User(email)
// After index
PROFILE
MATCH (u:User {email: "[email protected]"})
RETURN u
// Note: NodeIndexSeek, rows examined, time
- Monitor write performance for regression
Index Strategy Checklist
When designing your index strategy:
- Analyzed query patterns - Know your most frequent and critical queries
- Identified selective properties - Focus on high-cardinality fields
- Chosen appropriate types - B-tree, hash, composite, or full-text
- Ordered composite indexes correctly - Equality before range, selective first
- Considered write impact - Balance read performance with write overhead
- Planned for monitoring - Track index usage and query performance
- Documented decisions - Record why each index exists
- Scheduled reviews - Periodically audit index effectiveness
Common Index Patterns
User Lookup Pattern
// Primary lookup by ID (unique)
CREATE CONSTRAINT user_id_unique ON :User(id) ASSERT UNIQUE
// Login by email (unique)
CREATE CONSTRAINT user_email_unique ON :User(email) ASSERT UNIQUE
// Search by username (frequent lookups)
CREATE INDEX user_username ON :User(username)
// Filter by status and date
CREATE INDEX user_status_created ON :User(status, created_at)
E-commerce Pattern
// Product lookup
CREATE CONSTRAINT product_sku_unique ON :Product(sku) ASSERT UNIQUE
CREATE INDEX product_category_price ON :Product(category, price)
CREATE FULLTEXT INDEX product_search ON :Product(name, description)
// Order lookup
CREATE CONSTRAINT order_number_unique ON :Order(order_number) ASSERT UNIQUE
CREATE INDEX order_customer_date ON :Order(customer_id, created_at)
CREATE INDEX order_status_date ON :Order(status, created_at)
Social Network Pattern
// User lookup
CREATE CONSTRAINT user_handle_unique ON :User(handle) ASSERT UNIQUE
CREATE INDEX user_name ON :User(name)
// Post lookup
CREATE INDEX post_author_date ON :Post(author_id, created_at)
CREATE INDEX post_visibility_date ON :Post(visibility, created_at)
CREATE FULLTEXT INDEX post_content ON :Post(text)
Summary
Effective index strategy requires:
- Understanding your queries: Know what you search for and how often
- Choosing the right type: Match index type to query pattern
- Optimizing selectivity: Index high-cardinality properties
- Ordering composites correctly: Equality first, then range, most selective first
- Balancing trade-offs: Consider write overhead and memory usage
- Monitoring continuously: Track usage and remove unused indexes
Related Guides
- Query Performance Guide - Write efficient queries
- Query Profiling Guide - Analyze execution plans
- Graph Modeling Guide - Schema design for performance
Questions? Discuss indexing strategies in our forum .