Query Profiling Guide
Understanding how Geode executes your queries is essential for optimization. This guide covers using EXPLAIN and PROFILE to analyze execution plans, identify bottlenecks, and systematically improve query performance.
EXPLAIN vs PROFILE
Geode provides two tools for query analysis:
| Tool | What It Does | When to Use |
|---|---|---|
| EXPLAIN | Shows the execution plan without running | Planning queries, understanding structure |
| PROFILE | Executes query and shows actual metrics | Measuring real performance, finding bottlenecks |
Using EXPLAIN
EXPLAIN shows the planned execution strategy:
EXPLAIN
MATCH (user:User {email: "[email protected]"})-[:POSTED]->(post:Post)
WHERE post.created_at > datetime() - duration('P7D')
RETURN post.title, post.created_at
ORDER BY post.created_at DESC
LIMIT 10
Output:
+---------------------------+
| Plan |
+---------------------------+
| Projection |
| | |
| +Sort |
| | |
| +Filter |
| | |
| +Expand(All) |
| | |
| +NodeIndexSeek |
+---------------------------+
Using PROFILE
PROFILE executes the query and collects metrics:
PROFILE
MATCH (user:User {email: "[email protected]"})-[:POSTED]->(post:Post)
WHERE post.created_at > datetime() - duration('P7D')
RETURN post.title, post.created_at
ORDER BY post.created_at DESC
LIMIT 10
Output:
+---------------------------+----------------+----------+------------+
| Operator | Rows | DB Hits | Time (ms) |
+---------------------------+----------------+----------+------------+
| ProduceResults | 10 | 0 | 0.1 |
| | | | | |
| +Top | 10 | 0 | 0.2 |
| | | | | |
| +Projection | 42 | 84 | 0.3 |
| | | | | |
| +Filter | 42 | 42 | 0.5 |
| | | | | |
| +Expand(All) | 156 | 157 | 1.2 |
| | | | | |
| +NodeIndexSeek | 1 | 2 | 0.1 |
+---------------------------+----------------+----------+------------+
Total database hits: 285
Total time: 2.4ms
Reading Execution Plans
Plan Structure
Execution plans read from bottom to top - data flows upward through operators:
ProduceResults <-- Final output (top)
|
Sort <-- Ordering
|
Filter <-- WHERE conditions
|
Expand <-- Relationship traversal
|
NodeIndexSeek <-- Starting point (bottom)
Common Operators
Data Access Operators
| Operator | Description | Performance |
|---|---|---|
| NodeIndexSeek | Uses index to find nodes | Fast - O(log n) |
| NodeByLabelScan | Scans all nodes with label | Slow - O(n) |
| NodeByIdSeek | Finds node by internal ID | Very fast - O(1) |
| AllNodesScan | Scans every node | Very slow - avoid |
Example - Good (index seek):
EXPLAIN
MATCH (u:User {email: "[email protected]"})
RETURN u
// Plan shows: NodeIndexSeek
Example - Bad (label scan):
EXPLAIN
MATCH (u:User)
WHERE u.signup_source = "organic"
RETURN u
// Plan shows: NodeByLabelScan + Filter
// Consider: CREATE INDEX user_signup ON :User(signup_source)
Traversal Operators
| Operator | Description | Performance |
|---|---|---|
| Expand(All) | Follow relationships | O(degree) |
| Expand(Into) | Check if relationship exists | O(degree) |
| VarLengthExpand | Variable-length paths | O(branching^depth) |
| ShortestPath | Find shortest path | Expensive - use with caution |
Example - Expand:
EXPLAIN
MATCH (u:User)-[:FOLLOWS]->(f:User)
RETURN f
// Plan shows: Expand(All) - follows FOLLOWS relationships
Example - VarLengthExpand:
EXPLAIN
MATCH (u:User)-[:FOLLOWS*1..3]->(f:User)
RETURN f
// Plan shows: VarLengthExpand - WARNING: can be expensive
Processing Operators
| Operator | Description | Performance |
|---|---|---|
| Filter | Apply WHERE conditions | O(n) on input rows |
| Projection | Select/transform columns | O(n) - lightweight |
| Sort | Order results | O(n log n) |
| Top | Sort + limit combined | More efficient than Sort + Limit |
| Limit | Restrict output rows | Can short-circuit |
| Skip | Skip initial rows | Must process skipped rows |
| Distinct | Remove duplicates | O(n) with hash |
Aggregation Operators
| Operator | Description | Performance |
|---|---|---|
| EagerAggregation | GROUP BY with aggregates | O(n) - must see all rows |
| OrderedAggregation | Aggregation on sorted input | More efficient |
| Count | Count rows | O(n) |
Join Operators
| Operator | Description | Performance |
|---|---|---|
| NodeHashJoin | Hash join on nodes | O(n + m) |
| ValueHashJoin | Hash join on values | O(n + m) |
| CartesianProduct | Cross product | O(n * m) - WARNING |
| Apply | Nested loop join | O(n * m) - can be expensive |
Key Metrics to Watch
When analyzing PROFILE output:
Rows: Number of rows passing through each operator
- Large row counts early = may need better filtering
- Rows increasing dramatically = possible Cartesian product
DB Hits: Number of storage accesses
- High DB hits = more I/O, slower query
- Compare DB hits to rows for efficiency
Time: Execution time per operator
- Identify slowest operators
- Focus optimization efforts there
Estimated Rows (EXPLAIN): Planner’s row estimates
- Compare to actual rows in PROFILE
- Large discrepancies indicate stale statistics
Identifying Bottlenecks
High Row Counts
Problem: Operators processing too many rows
+---------------------------+----------------+
| Operator | Rows |
+---------------------------+----------------+
| Filter | 100 |
| | | |
| +Expand(All) | 1,000,000 | <-- Bottleneck!
| | | |
| +NodeByLabelScan | 10,000 |
+---------------------------+----------------+
Solution: Add index or restructure query
// Before: Scanning all users, expanding all relationships
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE f.verified = true
RETURN f
// After: Start from verified users (with index)
MATCH (f:User {verified: true})<-[:FOLLOWS]-(u:User)
RETURN f
NodeByLabelScan
Problem: Full label scan instead of index
+---------------------------+
| Operator |
+---------------------------+
| Filter |
| | |
| +NodeByLabelScan | <-- No index used
+---------------------------+
Solution: Create appropriate index
// Identify the filtering property
EXPLAIN
MATCH (p:Product)
WHERE p.category = "electronics"
RETURN p
// Create index
CREATE INDEX product_category ON :Product(category)
// Verify index usage
EXPLAIN
MATCH (p:Product {category: "electronics"})
RETURN p
// Should show: NodeIndexSeek
CartesianProduct
Problem: Unconnected patterns creating cross product
+---------------------------+----------------+
| Operator | Rows |
+---------------------------+----------------+
| CartesianProduct | 10,000,000 | <-- Explosive!
| |\ | |
| | NodeByLabelScan | 10,000 |
| | | |
| +NodeByLabelScan | 1,000 |
+---------------------------+----------------+
Solution: Connect patterns or split queries
// Before: Cartesian product
MATCH (u:User), (p:Product)
WHERE u.country = p.origin_country
RETURN u, p
// After: Connected pattern
MATCH (u:User)-[:INTERESTED_IN]->(:Category)<-[:IN_CATEGORY]-(p:Product)
WHERE u.country = p.origin_country
RETURN u, p
Expensive VarLengthExpand
Problem: Unbounded or deep variable-length patterns
+---------------------------+----------------+----------+
| Operator | Rows | Time |
+---------------------------+----------------+----------+
| VarLengthExpand | 5,000,000 | 4500ms | <-- Very slow!
| | | | |
| +NodeIndexSeek | 1 | 1ms |
+---------------------------+----------------+----------+
Solution: Bound the depth, add intermediate filtering
// Before: Unbounded depth
MATCH path = (start:Person)-[:KNOWS*]->(end:Person)
RETURN path
// After: Bounded depth
MATCH path = (start:Person)-[:KNOWS*1..4]->(end:Person)
RETURN path
LIMIT 1000
// Even better: Filter at each hop
MATCH path = (start:Person {id: $id})-[:KNOWS*1..3]->(end:Person)
WHERE ALL(n IN nodes(path) WHERE n.active = true)
RETURN path
Late Filtering
Problem: Filters applied after expensive operations
+---------------------------+----------------+----------+
| Operator | Rows | Time |
+---------------------------+----------------+----------+
| Filter | 10 | 50ms |
| | | | |
| +Sort | 100,000 | 500ms | <-- Sorted everything
| | | | |
| +Expand | 100,000 | 200ms |
| | | | |
| +NodeByLabelScan | 10,000 | 100ms |
+---------------------------+----------------+----------+
Solution: Filter earlier
// Before: Filter after sort
MATCH (u:User)-[:POSTED]->(p:Post)
RETURN p
ORDER BY p.created_at
LIMIT 10
// Then filtering for specific user in application
// After: Filter first
MATCH (u:User {id: $user_id})-[:POSTED]->(p:Post)
RETURN p
ORDER BY p.created_at
LIMIT 10
Query Optimization Workflow
Step 1: Baseline Measurement
Profile the original query to establish baseline:
PROFILE
MATCH (user:User)-[:PURCHASED]->(product:Product)
WHERE product.category = "electronics"
AND product.price > 100
RETURN user.name, product.name, product.price
ORDER BY product.price DESC
LIMIT 20
Record metrics:
- Total time: 450ms
- Total DB hits: 125,000
- Rows at each stage
Step 2: Identify Bottlenecks
Analyze the profile output:
+---------------------------+----------------+----------+------------+
| Operator | Rows | DB Hits | Time (ms) |
+---------------------------+----------------+----------+------------+
| Top | 20 | 0 | 1 |
| | | | | |
| +Projection | 500 | 1000 | 5 |
| | | | | |
| +Filter | 500 | 500 | 10 |
| | | | | |
| +Expand | 50,000 | 50,001 | 200 | <-- High rows
| | | | | |
| +NodeByLabelScan | 10,000 | 73,500 | 230 | <-- No index!
+---------------------------+----------------+----------+------------+
Bottlenecks identified:
- NodeByLabelScan on User (no index)
- 50,000 rows from Expand before filtering
- Filter on Product properties applied late
Step 3: Apply Optimizations
Optimization 1: Start from indexed Product instead:
// Create index on Product category and price
CREATE INDEX product_category_price ON :Product(category, price)
Optimization 2: Restructure query:
PROFILE
MATCH (product:Product)
WHERE product.category = "electronics"
AND product.price > 100
WITH product
ORDER BY product.price DESC
LIMIT 20
MATCH (user:User)-[:PURCHASED]->(product)
RETURN user.name, product.name, product.price
Step 4: Measure Improvement
Profile the optimized query:
+---------------------------+----------------+----------+------------+
| Operator | Rows | DB Hits | Time (ms) |
+---------------------------+----------------+----------+------------+
| Projection | 20 | 40 | 1 |
| | | | | |
| +Expand | 20 | 40 | 2 |
| | | | | |
| +Top | 20 | 0 | 1 |
| | | | | |
| +NodeIndexSeek | 500 | 501 | 5 |
+---------------------------+----------------+----------+------------+
Results:
- Total time: 9ms (98% improvement)
- Total DB hits: 581 (99.5% reduction)
Step 5: Document and Monitor
Record the optimization:
## Query: User Purchases by Category
### Original
- Time: 450ms
- DB Hits: 125,000
- Issue: Full User scan, late Product filtering
### Optimized
- Time: 9ms
- DB Hits: 581
- Changes:
1. Created composite index on Product(category, price)
2. Restructured to start from Product with early LIMIT
3. Expand to User as final step
### Index Added
CREATE INDEX product_category_price ON :Product(category, price)
Before/After Comparisons
Example 1: Adding an Index
Before (no index):
PROFILE
MATCH (u:User)
WHERE u.email = "[email protected]"
RETURN u
+---------------------------+----------------+----------+
| Operator | Rows | DB Hits |
+---------------------------+----------------+----------+
| Filter | 1 | 0 |
| | | | |
| +NodeByLabelScan | 1,000,000 | 2,000,000|
+---------------------------+----------------+----------+
Time: 850ms
After (with index):
-- CREATE INDEX user_email ON :User(email)
PROFILE
MATCH (u:User {email: "[email protected]"})
RETURN u
+---------------------------+----------------+----------+
| Operator | Rows | DB Hits |
+---------------------------+----------------+----------+
| NodeIndexSeek | 1 | 2 |
+---------------------------+----------------+----------+
Time: 1ms
Improvement: 850x faster, 1,000,000x fewer DB hits
Example 2: Restructuring Pattern Order
Before (starting from less selective node):
PROFILE
MATCH (post:Post)-[:WRITTEN_BY]->(author:Author {verified: true})
WHERE post.published = true
RETURN post.title, author.name
LIMIT 10
+---------------------------+----------------+----------+
| Operator | Rows | DB Hits |
+---------------------------+----------------+----------+
| Top | 10 | 0 |
| | | | |
| +Filter | 500 | 500 |
| | | | |
| +Expand | 10,000 | 10,001 |
| | | | |
| +NodeByLabelScan | 10,000 | 30,000 |
+---------------------------+----------------+----------+
Time: 120ms
After (starting from more selective indexed node):
-- CREATE INDEX author_verified ON :Author(verified)
PROFILE
MATCH (author:Author {verified: true})-[:WROTE]->(post:Post)
WHERE post.published = true
RETURN post.title, author.name
LIMIT 10
+---------------------------+----------------+----------+
| Operator | Rows | DB Hits |
+---------------------------+----------------+----------+
| Top | 10 | 0 |
| | | | |
| +Filter | 50 | 50 |
| | | | |
| +Expand | 200 | 201 |
| | | | |
| +NodeIndexSeek | 100 | 101 |
+---------------------------+----------------+----------+
Time: 8ms
Improvement: 15x faster, 115x fewer DB hits
Example 3: Using WITH for Early Aggregation
Before (aggregating at the end):
PROFILE
MATCH (user:User)-[:FOLLOWS]->(followed:User)-[:POSTED]->(post:Post)
WHERE user.id = $user_id
RETURN followed.name, count(post) AS post_count
ORDER BY post_count DESC
LIMIT 10
+---------------------------+----------------+----------+
| Operator | Rows | DB Hits |
+---------------------------+----------------+----------+
| Top | 10 | 0 |
| | | | |
| +EagerAggregation | 500 | 0 |
| | | | |
| +Expand | 50,000 | 50,001 | <-- Many posts
| | | | |
| +Expand | 500 | 501 |
| | | | |
| +NodeIndexSeek | 1 | 2 |
+---------------------------+----------------+----------+
Time: 180ms
After (limiting followed users first):
PROFILE
MATCH (user:User {id: $user_id})-[:FOLLOWS]->(followed:User)
WITH followed
ORDER BY followed.follower_count DESC
LIMIT 50
MATCH (followed)-[:POSTED]->(post:Post)
RETURN followed.name, count(post) AS post_count
ORDER BY post_count DESC
LIMIT 10
+---------------------------+----------------+----------+
| Operator | Rows | DB Hits |
+---------------------------+----------------+----------+
| Top | 10 | 0 |
| | | | |
| +EagerAggregation | 50 | 0 |
| | | | |
| +Expand | 5,000 | 5,001 | <-- Reduced posts
| | | | |
| +Top | 50 | 0 |
| | | | |
| +Expand | 500 | 501 |
| | | | |
| +NodeIndexSeek | 1 | 2 |
+---------------------------+----------------+----------+
Time: 25ms
Improvement: 7x faster, 10x fewer rows processed
Advanced Profiling Techniques
Comparing Plan Variants
Test multiple query formulations:
// Variant A: Standard pattern
PROFILE
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
WHERE a.name = "Alice"
RETURN DISTINCT c.name
// Variant B: Using WITH
PROFILE
MATCH (a:Person {name: "Alice"})-[:KNOWS]->(b:Person)
WITH DISTINCT b
MATCH (b)-[:KNOWS]->(c:Person)
RETURN DISTINCT c.name
// Variant C: Using shortestPath (if applicable)
PROFILE
MATCH (a:Person {name: "Alice"}), (c:Person)
WHERE (a)-[:KNOWS*2..2]->(c)
RETURN DISTINCT c.name
Compare total time and DB hits to choose the best variant.
Profiling with Parameters
Always profile with realistic parameter values:
// Profile with edge case (very connected user)
:param user_id => "power_user_123"
PROFILE
MATCH (u:User {id: $user_id})-[:FOLLOWS]->(f:User)
RETURN count(f)
// Profile with typical case
:param user_id => "average_user_456"
PROFILE
MATCH (u:User {id: $user_id})-[:FOLLOWS]->(f:User)
RETURN count(f)
Analyzing Memory Usage
For complex queries, check memory consumption:
PROFILE
MATCH (a)-[r*1..5]->(b)
WHERE a.id = $id
RETURN count(DISTINCT b)
Watch for operators that accumulate state:
EagerAggregation: Holds all groups in memorySort: Holds all rows to sortDistinct: Maintains hash set of seen valuesCollect: Accumulates list in memory
Profiling Checklist
When optimizing a query:
- Profile the original - Establish baseline metrics
- Identify expensive operators - Focus on highest time/DB hits
- Check for label scans - NodeByLabelScan often indicates missing index
- Look for Cartesian products - CartesianProduct is usually a bug
- Verify index usage - Ensure NodeIndexSeek where expected
- Check row counts - Large intermediate counts indicate filtering issues
- Test with realistic data - Small test data may hide problems
- Document improvements - Record before/after metrics
Common Profiling Patterns
Pattern: Index Verification
// Check if index is used
EXPLAIN
MATCH (u:User {email: $email})
RETURN u
// If NodeByLabelScan shown, create index:
CREATE INDEX user_email ON :User(email)
// Verify index usage:
EXPLAIN
MATCH (u:User {email: $email})
RETURN u
// Should show NodeIndexSeek
Pattern: Finding Hot Spots
// Profile and sort by time
PROFILE
MATCH (complex query here)
RETURN results
// Look at the Time column - operators taking >50% of total time
// are optimization targets
Pattern: Comparing Index Types
// Test B-tree index
CREATE INDEX user_age_btree ON :User(age)
PROFILE
MATCH (u:User) WHERE u.age > 25 AND u.age < 35 RETURN count(u)
DROP INDEX user_age_btree
// B-tree is better for range queries
// Hash is better for equality-only queries
Summary
Effective query profiling follows these principles:
- Use EXPLAIN for planning: Understand query structure before execution
- Use PROFILE for measurement: Get real execution metrics
- Read bottom-to-top: Data flows upward through operators
- Focus on bottlenecks: Optimize the slowest, most expensive operators
- Compare before/after: Quantify improvements
- Test with real data: Small datasets hide performance issues
- Document findings: Record optimizations for future reference
Related Guides
- Query Performance Guide - Writing efficient queries
- Index Strategy Guide - Creating effective indexes
- Graph Modeling Guide - Schema design for performance
Questions? Discuss query profiling in our forum .