Query Optimization
Master query optimization techniques to achieve maximum performance from Geode. This section covers indexing strategies, query analysis with EXPLAIN/PROFILE, materialized views, and performance tuning best practices.
Overview
Query performance is critical for production applications. Geode provides powerful optimization tools including a cost-based optimizer (CBO), six index types, EXPLAIN/PROFILE analysis, and materialized views. Understanding these tools and applying optimization techniques can improve query performance by orders of magnitude.
The query optimizer automatically selects optimal execution plans based on statistics, but understanding index selection, query patterns, and execution plans enables you to design schemas and write queries that maximize performance. Whether you’re optimizing a slow query or designing for performance from the start, this section provides the knowledge you need.
Key Optimization Strategies
Index Selection
Choose the right index type for your access patterns:
- B-tree: Range queries, sorting, ordered traversal
- Hash: Exact match lookups with O(1) average case
- HNSW: Vector similarity search (K-NN)
- R-tree: Geospatial queries (bounding box, radius)
- Full-text: Text search with BM25 ranking
- Patricia Trie: IP prefix matching (CIDR)
Query Analysis
Use EXPLAIN and PROFILE to understand query execution:
- EXPLAIN: Shows query plan without execution
- PROFILE: Executes query and shows timing for each step
- Statistics: Review cardinality estimates and index selection
Materialized Views
Cache expensive query results for fast repeated access:
- Immediate Refresh: Updates on data changes
- Deferred Refresh: Batch updates on schedule
- On-Demand Refresh: Manual refresh control
Query Patterns
Write queries that leverage indexes and minimize work:
- Start patterns with selective filters
- Use indexes for joins and lookups
- Avoid unnecessary traversals
- Leverage prepared statements
Topics in This Section
- EXPLAIN and PROFILE - Understand query execution plans, analyze performance bottlenecks, and interpret optimizer decisions
- Indexing and Optimization - Complete indexing guide covering all six index types, creation strategies, and optimization techniques
- Performance Tuning - Query performance tuning best practices including query rewriting, schema optimization, and system tuning
- Full-Text Search - Full-text search capabilities with BM25 ranking, tokenization, stemming, and search optimization
- Materialized Views - Materialized view patterns including creation, refresh strategies, and query rewriting
Query Performance Basics
Using EXPLAIN
Analyze query plans before execution:
-- Show query plan
EXPLAIN
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.age > 30
RETURN p.name, friend.name;
Output shows:
- Execution plan steps
- Index selection
- Estimated cardinality
- Join strategies
Using PROFILE
Execute and measure query performance:
-- Execute with timing
PROFILE
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.age > 30
RETURN p.name, friend.name;
Output includes:
- All EXPLAIN information
- Actual execution time per step
- Rows processed at each stage
- Cache hit ratios
See EXPLAIN and PROFILE for detailed usage.
Index Optimization
Creating Indexes
-- B-tree index for range queries
CREATE INDEX person_age_btree ON :Person(age);
-- Hash index for exact lookups
CREATE INDEX person_email_hash ON :Person(email) USING HASH;
-- HNSW index for vector search
CREATE INDEX product_embedding_hnsw
ON :Product(embedding)
USING HNSW(distance='cosine', m=16, ef_construction=200);
-- Full-text index
CREATE INDEX article_content_fulltext
ON :Article(content)
USING FULLTEXT;
Index Selection Guidelines
Use B-tree when:
- Range queries (age > 30, created_at BETWEEN …)
- Sorting (ORDER BY)
- Prefix matching (name STARTS WITH ‘A’)
Use Hash when:
- Exact equality (email = ‘[email protected] ’)
- High cardinality unique columns
- No range or sorting requirements
Use HNSW when:
- Vector similarity search
- Approximate K-NN queries
- Embedding-based retrieval
Use R-tree when:
- Geospatial queries
- Bounding box searches
- Radius searches
Use Full-text when:
- Text search queries
- Relevance ranking needed
- Natural language queries
Use Patricia Trie when:
- IP address queries
- CIDR prefix matching
- Longest Prefix Match (LPM)
See Indexing and Optimization for comprehensive guidance.
Query Tuning Techniques
1. Start with Selective Filters
-- Good: Start with selective filter
MATCH (p:Person {email: 'alice@example.com'})-[:KNOWS]->(friend)
RETURN friend.name;
-- Bad: Scan all people first
MATCH (p:Person)-[:KNOWS]->(friend:Person {email: 'alice@example.com'})
RETURN p.name;
2. Use Indexes for Joins
-- Ensure both sides of join have indexes
CREATE INDEX person_id ON :Person(id);
CREATE INDEX order_customer_id ON :Order(customer_id);
-- Query leverages both indexes
MATCH (p:Person {id: 123})<-[:PLACED_BY]-(o:Order)
RETURN o.order_id, o.total;
3. Limit Early
-- Good: LIMIT early reduces work
MATCH (p:Person)
WHERE p.age > 30
RETURN p.name
ORDER BY p.age DESC
LIMIT 10;
-- Better: Use WITH to limit before expensive operations
MATCH (p:Person)
WHERE p.age > 30
WITH p
ORDER BY p.age DESC
LIMIT 10
MATCH (p)-[:KNOWS]->(friend)
RETURN p.name, collect(friend.name) AS friends;
4. Avoid Cartesian Products
-- Bad: Cartesian product (all people × all orders)
MATCH (p:Person), (o:Order)
WHERE p.id = o.customer_id
RETURN p.name, o.order_id;
-- Good: Use relationship pattern
MATCH (p:Person)<-[:PLACED_BY]-(o:Order)
RETURN p.name, o.order_id;
5. Use Prepared Statements
-- Prepared statement (parsed once, executed many times)
PREPARE find_person AS
MATCH (p:Person {email: $email})
RETURN p.name, p.age;
-- Execute with parameters
EXECUTE find_person(email='alice@example.com');
See Performance Tuning for advanced techniques.
Materialized Views
Cache expensive query results:
-- Create materialized view
CREATE MATERIALIZED VIEW popular_products AS
MATCH (p:Product)<-[:PURCHASED]-(o:Order)
RETURN p.product_id, p.name, count(*) AS purchase_count
ORDER BY purchase_count DESC
LIMIT 100
REFRESH IMMEDIATE;
-- Query automatically uses materialized view
MATCH (p:Product)<-[:PURCHASED]-(o:Order)
RETURN p.product_id, p.name, count(*) AS purchase_count
ORDER BY purchase_count DESC
LIMIT 100;
Refresh strategies:
- IMMEDIATE: Updates on data changes (real-time)
- DEFERRED: Batch updates on schedule (periodic)
- ON DEMAND: Manual refresh (controlled)
See Materialized Views for details.
Full-Text Search
Optimize text search queries:
-- Create full-text index
CREATE INDEX article_fulltext ON :Article(title, content) USING FULLTEXT;
-- Text search with ranking
MATCH (a:Article)
WHERE a.content CONTAINS 'graph database'
RETURN a.title, score(a) AS relevance
ORDER BY relevance DESC
LIMIT 10;
Features:
- BM25 ranking algorithm
- Tokenization with stop words
- Stemming support
- Multi-field search
See Full-Text Search for comprehensive coverage.
Performance Monitoring
Track query performance over time:
-- Enable query timing
\timing on
-- View slow query log
SELECT query_text, execution_time
FROM system.slow_queries
WHERE execution_time > duration({seconds: 1})
ORDER BY execution_time DESC;
Monitor metrics:
- Query latency (p50, p95, p99)
- Query throughput (QPS)
- Error rates
- Cache hit ratios
See Observability for monitoring setup.
Best Practices
Schema Design
- Create indexes for frequent query patterns
- Use unique constraints to prevent duplicates
- Consider denormalization for performance-critical paths
- Model relationships for primary query patterns
Query Writing
- Start patterns with selective filters
- Use parameters instead of literals
- Avoid SELECT * patterns (return only needed properties)
- Use LIMIT to prevent runaway queries
Index Management
- Create indexes for high-cardinality columns
- Drop unused indexes (overhead on writes)
- Monitor index usage statistics
- Rebuild indexes periodically
Testing
- Profile queries before deploying to production
- Test with realistic data volumes
- Benchmark critical query paths
- Monitor performance regressions
Troubleshooting Slow Queries
Step 1: Identify Slow Queries
# Check slow query log
grep "slow query" /var/log/geode/geode.log
# Or query system tables
SELECT * FROM system.slow_queries ORDER BY execution_time DESC;
Step 2: Analyze with PROFILE
PROFILE <your slow query>;
Look for:
- Steps with high execution time
- Large row counts (potential full scans)
- Missing index usage
- Expensive joins
Step 3: Review Index Coverage
-- Check available indexes
SHOW INDEXES;
-- Check index usage
SELECT * FROM system.index_usage
WHERE table_name = 'Person';
Step 4: Optimize
Based on PROFILE analysis:
- Add missing indexes
- Rewrite query to use indexes
- Break complex query into steps
- Consider materialized views
See Troubleshooting Guide for more scenarios.
Learn More
- Architecture: Query Planning - How the query planner works
- GQL Guide - Query language reference
- Schema Design - Schema optimization patterns
- Performance Benchmarking - Benchmark methodology
Advanced Topics
- Graph Algorithms - Optimize algorithm execution
- Real-time Analytics - Streaming query optimization
- Vector Search Tutorial - Optimize vector queries
- Index Optimization Tutorial - Hands-on index tuning
Next Steps
- Learn EXPLAIN/PROFILE - Master query analysis
- Create Indexes - Optimize your schema
- Tune Queries - Apply optimization techniques
- Use Materialized Views - Cache expensive queries
- Monitor Performance - Track query metrics