Query Optimization

Master query optimization techniques to achieve maximum performance from Geode. This section covers indexing strategies, query analysis with EXPLAIN/PROFILE, materialized views, and performance tuning best practices.

Overview

Query performance is critical for production applications. Geode provides powerful optimization tools including a cost-based optimizer (CBO), six index types, EXPLAIN/PROFILE analysis, and materialized views. Understanding these tools and applying optimization techniques can improve query performance by orders of magnitude.

The query optimizer automatically selects optimal execution plans based on statistics, but understanding index selection, query patterns, and execution plans enables you to design schemas and write queries that maximize performance. Whether you’re optimizing a slow query or designing for performance from the start, this section provides the knowledge you need.

Key Optimization Strategies

Index Selection

Choose the right index type for your access patterns:

B-tree: Range queries, sorting, ordered traversal
Hash: Exact match lookups with O(1) average case
HNSW: Vector similarity search (K-NN)
R-tree: Geospatial queries (bounding box, radius)
Full-text: Text search with BM25 ranking
Patricia Trie: IP prefix matching (CIDR)

Query Analysis

Use EXPLAIN and PROFILE to understand query execution:

EXPLAIN: Shows query plan without execution
PROFILE: Executes query and shows timing for each step
Statistics: Review cardinality estimates and index selection

Materialized Views

Cache expensive query results for fast repeated access:

Immediate Refresh: Updates on data changes
Deferred Refresh: Batch updates on schedule
On-Demand Refresh: Manual refresh control

Query Patterns

Write queries that leverage indexes and minimize work:

Start patterns with selective filters
Use indexes for joins and lookups
Avoid unnecessary traversals
Leverage prepared statements

Topics in This Section

EXPLAIN and PROFILE - Understand query execution plans, analyze performance bottlenecks, and interpret optimizer decisions
Indexing and Optimization - Complete indexing guide covering all six index types, creation strategies, and optimization techniques
Performance Tuning - Query performance tuning best practices including query rewriting, schema optimization, and system tuning
Full-Text Search - Full-text search capabilities with BM25 ranking, tokenization, stemming, and search optimization
Materialized Views - Materialized view patterns including creation, refresh strategies, and query rewriting

Query Performance Basics

Using EXPLAIN

Analyze query plans before execution:

-- Show query plan
EXPLAIN
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.age > 30
RETURN p.name, friend.name;

Output shows:

Execution plan steps
Index selection
Estimated cardinality
Join strategies

Using PROFILE

Execute and measure query performance:

-- Execute with timing
PROFILE
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.age > 30
RETURN p.name, friend.name;

Output includes:

All EXPLAIN information
Actual execution time per step
Rows processed at each stage
Cache hit ratios

See EXPLAIN and PROFILE for detailed usage.

Index Optimization

Creating Indexes

-- B-tree index for range queries
CREATE INDEX person_age_btree ON :Person(age);

-- Hash index for exact lookups
CREATE INDEX person_email_hash ON :Person(email) USING HASH;

-- HNSW index for vector search
CREATE INDEX product_embedding_hnsw
ON :Product(embedding)
USING HNSW(distance='cosine', m=16, ef_construction=200);

-- Full-text index
CREATE INDEX article_content_fulltext
ON :Article(content)
USING FULLTEXT;

Index Selection Guidelines

Use B-tree when:

Range queries (age > 30, created_at BETWEEN …)
Sorting (ORDER BY)
Prefix matching (name STARTS WITH ‘A’)

Use Hash when:

Exact equality (email = ‘[email protected] ’)
High cardinality unique columns
No range or sorting requirements

Use HNSW when:

Vector similarity search
Approximate K-NN queries
Embedding-based retrieval

Use R-tree when:

Geospatial queries
Bounding box searches
Radius searches

Use Full-text when:

Text search queries
Relevance ranking needed
Natural language queries

Use Patricia Trie when:

IP address queries
CIDR prefix matching
Longest Prefix Match (LPM)

See Indexing and Optimization for comprehensive guidance.

Query Tuning Techniques

1. Start with Selective Filters

-- Good: Start with selective filter
MATCH (p:Person {email: 'alice@example.com'})-[:KNOWS]->(friend)
RETURN friend.name;

-- Bad: Scan all people first
MATCH (p:Person)-[:KNOWS]->(friend:Person {email: 'alice@example.com'})
RETURN p.name;

2. Use Indexes for Joins

-- Ensure both sides of join have indexes
CREATE INDEX person_id ON :Person(id);
CREATE INDEX order_customer_id ON :Order(customer_id);

-- Query leverages both indexes
MATCH (p:Person {id: 123})<-[:PLACED_BY]-(o:Order)
RETURN o.order_id, o.total;

3. Limit Early

-- Good: LIMIT early reduces work
MATCH (p:Person)
WHERE p.age > 30
RETURN p.name
ORDER BY p.age DESC
LIMIT 10;

-- Better: Use WITH to limit before expensive operations
MATCH (p:Person)
WHERE p.age > 30
WITH p
ORDER BY p.age DESC
LIMIT 10
MATCH (p)-[:KNOWS]->(friend)
RETURN p.name, collect(friend.name) AS friends;

4. Avoid Cartesian Products

-- Bad: Cartesian product (all people × all orders)
MATCH (p:Person), (o:Order)
WHERE p.id = o.customer_id
RETURN p.name, o.order_id;

-- Good: Use relationship pattern
MATCH (p:Person)<-[:PLACED_BY]-(o:Order)
RETURN p.name, o.order_id;

5. Use Prepared Statements

-- Prepared statement (parsed once, executed many times)
PREPARE find_person AS
MATCH (p:Person {email: $email})
RETURN p.name, p.age;

-- Execute with parameters
EXECUTE find_person(email='alice@example.com');

See Performance Tuning for advanced techniques.

Materialized Views

Cache expensive query results:

-- Create materialized view
CREATE MATERIALIZED VIEW popular_products AS
MATCH (p:Product)<-[:PURCHASED]-(o:Order)
RETURN p.product_id, p.name, count(*) AS purchase_count
ORDER BY purchase_count DESC
LIMIT 100
REFRESH IMMEDIATE;

-- Query automatically uses materialized view
MATCH (p:Product)<-[:PURCHASED]-(o:Order)
RETURN p.product_id, p.name, count(*) AS purchase_count
ORDER BY purchase_count DESC
LIMIT 100;

Refresh strategies:

IMMEDIATE: Updates on data changes (real-time)
DEFERRED: Batch updates on schedule (periodic)
ON DEMAND: Manual refresh (controlled)

See Materialized Views for details.

Full-Text Search

Optimize text search queries:

-- Create full-text index
CREATE INDEX article_fulltext ON :Article(title, content) USING FULLTEXT;

-- Text search with ranking
MATCH (a:Article)
WHERE a.content CONTAINS 'graph database'
RETURN a.title, score(a) AS relevance
ORDER BY relevance DESC
LIMIT 10;

Features:

BM25 ranking algorithm
Tokenization with stop words
Stemming support
Multi-field search

See Full-Text Search for comprehensive coverage.

Performance Monitoring

Track query performance over time:

-- Enable query timing
\timing on

-- View slow query log
SELECT query_text, execution_time
FROM system.slow_queries
WHERE execution_time > duration({seconds: 1})
ORDER BY execution_time DESC;

Monitor metrics:

Query latency (p50, p95, p99)
Query throughput (QPS)
Error rates
Cache hit ratios

See Observability for monitoring setup.

Best Practices

Schema Design

Create indexes for frequent query patterns
Use unique constraints to prevent duplicates
Consider denormalization for performance-critical paths
Model relationships for primary query patterns

Query Writing

Start patterns with selective filters
Use parameters instead of literals
Avoid SELECT * patterns (return only needed properties)
Use LIMIT to prevent runaway queries

Index Management

Create indexes for high-cardinality columns
Drop unused indexes (overhead on writes)
Monitor index usage statistics
Rebuild indexes periodically

Testing

Profile queries before deploying to production
Test with realistic data volumes
Benchmark critical query paths
Monitor performance regressions

Troubleshooting Slow Queries

Step 1: Identify Slow Queries

# Check slow query log
grep "slow query" /var/log/geode/geode.log

# Or query system tables
SELECT * FROM system.slow_queries ORDER BY execution_time DESC;

Step 2: Analyze with PROFILE

PROFILE <your slow query>;

Look for:

Steps with high execution time
Large row counts (potential full scans)
Missing index usage
Expensive joins

Step 3: Review Index Coverage

-- Check available indexes
SHOW INDEXES;

-- Check index usage
SELECT * FROM system.index_usage
WHERE table_name = 'Person';

Step 4: Optimize

Based on PROFILE analysis:

Add missing indexes
Rewrite query to use indexes
Break complex query into steps
Consider materialized views

See Troubleshooting Guide for more scenarios.

Learn More

Architecture: Query Planning - How the query planner works
GQL Guide - Query language reference
Schema Design - Schema optimization patterns
Performance Benchmarking - Benchmark methodology

Advanced Topics

Graph Algorithms - Optimize algorithm execution
Real-time Analytics - Streaming query optimization
Vector Search Tutorial - Optimize vector queries
Index Optimization Tutorial - Hands-on index tuning

Next Steps

Learn EXPLAIN/PROFILE - Master query analysis
Create Indexes - Optimize your schema
Tune Queries - Apply optimization techniques
Use Materialized Views - Cache expensive queries
Monitor Performance - Track query metrics