Query Performance Guide
Writing efficient GQL queries is essential for building responsive applications. This guide covers query execution fundamentals, optimization techniques, and common pitfalls to avoid.
Understanding Query Execution
How Geode Executes Queries
When you submit a GQL query, Geode processes it through several stages:
- Parsing: Query text is parsed into an abstract syntax tree (AST)
- Planning: The query planner creates an execution plan
- Optimization: The optimizer rewrites the plan for efficiency
- Execution: The engine executes the optimized plan
- Result Assembly: Results are collected and returned
// This simple query...
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.name = "Alice"
RETURN friend.name
// ...becomes an execution plan:
// 1. Index scan on Person(name) for "Alice"
// 2. Expand relationships [:KNOWS]
// 3. Filter by Person label on target
// 4. Project friend.name
Execution Plan Components
Understanding plan components helps you write better queries:
| Component | Description | Performance Impact |
|---|---|---|
| NodeByLabelScan | Scans all nodes with a label | O(n) - slow for large datasets |
| NodeIndexSeek | Uses index to find nodes | O(log n) - fast |
| Expand | Follows relationships | O(degree) - varies by connectivity |
| Filter | Applies WHERE conditions | O(n) - applied to current result set |
| Projection | Selects return columns | O(n) - minimal overhead |
Writing Efficient MATCH Patterns
Start with Selective Nodes
Always begin your MATCH pattern with the most selective (narrowest) node. This minimizes the initial working set.
Inefficient (scans all Posts first):
MATCH (post:Post)<-[:POSTED]-(user:User {username: "alice"})
RETURN post.title
Efficient (uses index on User.username):
MATCH (user:User {username: "alice"})-[:POSTED]->(post:Post)
RETURN post.title
Use Specific Relationship Types
Specify relationship types when possible to reduce exploration:
Inefficient (explores all relationship types):
MATCH (a:Person)-[r]->(b:Person)
WHERE type(r) = "KNOWS"
RETURN a, b
Efficient (only explores KNOWS relationships):
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a, b
Limit Relationship Depth
Variable-length patterns can be expensive. Always set bounds:
Dangerous (unbounded traversal):
MATCH (start:Person)-[:KNOWS*]->(end:Person)
RETURN end
Safe (bounded traversal):
MATCH (start:Person)-[:KNOWS*1..4]->(end:Person)
RETURN end
Choose Relationship Direction
Directional patterns are more efficient than undirected ones:
Less efficient (explores both directions):
MATCH (a:Person)-[:KNOWS]-(b:Person)
RETURN a, b
More efficient (specific direction):
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a, b
Avoiding Cartesian Products
A Cartesian product occurs when two unconnected patterns create all possible combinations, leading to explosive result sets.
Recognizing Cartesian Products
Cartesian product (users x products = millions of rows):
MATCH (u:User), (p:Product)
RETURN u.name, p.name
Connected pattern (only related pairs):
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN u.name, p.name
Unintentional Cartesian Products
Watch for separated patterns in the same MATCH:
Cartesian product (a,b and c,d are unconnected):
MATCH (a:Person)-[:KNOWS]->(b:Person),
(c:Company)-[:LOCATED_IN]->(d:City)
RETURN a, b, c, d
Solution 1 - Connect the patterns:
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)-[:LOCATED_IN]->(d:City)
RETURN a, b, c, d
Solution 2 - Use separate queries:
// Query 1
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a, b
// Query 2
MATCH (c:Company)-[:LOCATED_IN]->(d:City)
RETURN c, d
Detecting Cartesian Products
Use EXPLAIN to identify Cartesian products:
EXPLAIN
MATCH (u:User), (p:Product)
RETURN u.name, p.name
Look for CartesianProduct in the execution plan - this is a warning sign.
Using WITH for Query Chaining
The WITH clause allows you to chain query parts, reducing intermediate result sets and enabling optimizations.
Filtering Early with WITH
Inefficient (processes all relationships before filtering):
MATCH (user:User)-[:POSTED]->(post:Post)-[:HAS_COMMENT]->(comment:Comment)
WHERE user.active = true
RETURN user.name, count(comment) AS comment_count
Efficient (filters users early):
MATCH (user:User)
WHERE user.active = true
WITH user
MATCH (user)-[:POSTED]->(post:Post)-[:HAS_COMMENT]->(comment:Comment)
RETURN user.name, count(comment) AS comment_count
Aggregating Before Continuing
Aggregate intermediate results to reduce the working set:
// Find users with more than 100 posts, then get their most recent comments
MATCH (user:User)-[:POSTED]->(post:Post)
WITH user, count(post) AS post_count
WHERE post_count > 100
MATCH (user)-[:COMMENTED]->(comment:Comment)
RETURN user.name, post_count, comment
ORDER BY comment.created_at DESC
LIMIT 10
Breaking Complex Queries
Split complex queries into manageable parts:
// Step 1: Find influential users
MATCH (user:User)-[:FOLLOWS]->(follower:User)
WITH user, count(follower) AS followers
WHERE followers > 1000
WITH user ORDER BY followers DESC LIMIT 100
// Step 2: Get their recent activity
MATCH (user)-[:POSTED]->(post:Post)
WHERE post.created_at > datetime() - duration('P30D')
WITH user, collect(post)[0..5] AS recent_posts
// Step 3: Return with statistics
RETURN user.name, size(recent_posts) AS recent_post_count, recent_posts
Aggregation Performance
Use COUNT Efficiently
COUNT operations have different performance characteristics:
// Fast: Counting with early termination
MATCH (p:Person)
RETURN count(p) AS total
// Slower: Counting after traversal
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN count(friend) AS total_friends
// Optimize: Aggregate per-person first
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN p.name, count(friend) AS friend_count
Optimize SUM and AVG
Pre-filter data before aggregating:
Inefficient:
MATCH (o:Order)
WHERE o.status = "completed"
RETURN sum(o.total) AS revenue
More efficient with index:
// Ensure index exists: CREATE INDEX order_status ON :Order(status)
MATCH (o:Order {status: "completed"})
RETURN sum(o.total) AS revenue
Collect Wisely
The collect() function gathers values into a list. Limit what you collect:
Inefficient (collects everything):
MATCH (author:Author)-[:WROTE]->(book:Book)
RETURN author.name, collect(book) AS all_books
Efficient (limits collection):
MATCH (author:Author)-[:WROTE]->(book:Book)
WITH author, book
ORDER BY book.published DESC
WITH author, collect(book)[0..10] AS recent_books
RETURN author.name, recent_books
Limiting Results Early
Use LIMIT Strategically
Place LIMIT as early as possible in your query:
Inefficient (processes all results, then limits):
MATCH (p:Person)-[:KNOWS]->(friend:Person)-[:KNOWS]->(fof:Person)
RETURN DISTINCT fof.name
ORDER BY fof.name
LIMIT 10
Efficient (limits intermediate results):
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WITH friend LIMIT 100
MATCH (friend)-[:KNOWS]->(fof:Person)
RETURN DISTINCT fof.name
ORDER BY fof.name
LIMIT 10
Skip Efficiently
Combine SKIP and LIMIT for pagination:
// Page 1 (items 0-9)
MATCH (p:Product)
RETURN p.name, p.price
ORDER BY p.created_at DESC
SKIP 0 LIMIT 10
// Page 2 (items 10-19)
MATCH (p:Product)
RETURN p.name, p.price
ORDER BY p.created_at DESC
SKIP 10 LIMIT 10
Note: Deep pagination (large SKIP values) can be slow. Consider cursor-based pagination for large datasets:
// Cursor-based pagination using a timestamp
MATCH (p:Product)
WHERE p.created_at < $cursor_timestamp
RETURN p.name, p.price, p.created_at
ORDER BY p.created_at DESC
LIMIT 10
Parameterized Queries for Plan Caching
Why Use Parameters
Geode caches execution plans for parameterized queries. This provides:
- Faster parsing: Cached plans skip parsing and planning
- Better security: Prevents GQL injection
- Cleaner code: Separation of query and data
Parameter Syntax
Use $name syntax for parameters:
// Parameterized query
MATCH (user:User {username: $username})
WHERE user.age >= $min_age
RETURN user.name, user.email
Client usage (Python):
result, _ = await conn.query(
"""
MATCH (user:User {username: $username})
WHERE user.age >= $min_age
RETURN user.name, user.email
""",
{"username": "alice", "min_age": 18}
)
Client usage (Go):
rows, err := db.QueryContext(ctx, `
MATCH (user:User {username: ?})
WHERE user.age >= ?
RETURN user.name, user.email
`, "alice", 18)
Parameter Best Practices
Good: Parameterize variable values:
MATCH (p:Product)
WHERE p.category = $category AND p.price <= $max_price
RETURN p
Bad: Don’t parameterize structural elements:
// Labels and relationship types cannot be parameterized
// This does NOT work:
MATCH (n:$label)-[:$reltype]->(m) // INVALID
Batch Operations with Parameters
Parameterized queries work well for batch operations:
// Use UNWIND for batch inserts
UNWIND $users AS user_data
CREATE (u:User {
name: user_data.name,
email: user_data.email,
created_at: timestamp()
})
Client usage:
users = [
{"name": "Alice", "email": "[email protected]"},
{"name": "Bob", "email": "[email protected]"},
{"name": "Charlie", "email": "[email protected]"}
]
await conn.execute(
"UNWIND $users AS user_data CREATE (u:User {name: user_data.name, email: user_data.email})",
{"users": users}
)
Common Performance Pitfalls
Pitfall 1: Selecting All Properties
Problem: Returning entire nodes when you only need specific properties.
Slow:
MATCH (user:User)-[:POSTED]->(post:Post)
RETURN user, post // Returns all properties
Fast:
MATCH (user:User)-[:POSTED]->(post:Post)
RETURN user.name, post.title, post.created_at // Returns only needed properties
Pitfall 2: Not Using Indexes
Problem: Full label scans when indexes could be used.
// Slow: Full scan of User nodes
MATCH (u:User)
WHERE u.email = "[email protected]"
RETURN u
// Fast: Uses index (after creating it)
// CREATE INDEX user_email ON :User(email)
MATCH (u:User {email: "[email protected]"})
RETURN u
Pitfall 3: Unnecessary Ordering
Problem: Sorting large result sets unnecessarily.
// Slow: Sorts millions of rows
MATCH (p:Product)
RETURN p.name
ORDER BY p.name
// Fast: Only sort what you need
MATCH (p:Product)
RETURN p.name
ORDER BY p.name
LIMIT 100
Pitfall 4: Repeated Pattern Matching
Problem: Matching the same pattern multiple times.
Slow:
MATCH (user:User {id: $id})-[:POSTED]->(post:Post)
WITH user, count(post) AS post_count
MATCH (user)-[:COMMENTED]->(comment:Comment)
WITH user, post_count, count(comment) AS comment_count
MATCH (user)-[:LIKED]->(liked:Post)
RETURN user.name, post_count, comment_count, count(liked) AS like_count
Fast (using OPTIONAL MATCH):
MATCH (user:User {id: $id})
OPTIONAL MATCH (user)-[:POSTED]->(post:Post)
WITH user, count(post) AS post_count
OPTIONAL MATCH (user)-[:COMMENTED]->(comment:Comment)
WITH user, post_count, count(comment) AS comment_count
OPTIONAL MATCH (user)-[:LIKED]->(liked:Post)
RETURN user.name, post_count, comment_count, count(liked) AS like_count
Pitfall 5: Over-fetching in Traversals
Problem: Fetching all paths when you only need a few.
Slow:
MATCH path = (start:Person)-[:KNOWS*1..6]->(end:Person)
WHERE start.name = "Alice"
RETURN path
Fast:
MATCH path = (start:Person {name: "Alice"})-[:KNOWS*1..6]->(end:Person)
RETURN path
LIMIT 100
Pitfall 6: String Operations Without Indexes
Problem: Using CONTAINS or regex without full-text indexes.
Slow:
MATCH (p:Product)
WHERE p.description CONTAINS "wireless"
RETURN p
Fast (with full-text index):
// First, create a full-text index:
// CREATE FULLTEXT INDEX product_search ON :Product(name, description)
CALL db.index.fulltext.queryNodes("product_search", "wireless")
YIELD node
RETURN node
Performance Checklist
Before deploying queries to production, verify:
- Indexes exist for properties used in WHERE clauses and node lookups
- No Cartesian products - all MATCH patterns are connected
- Bounded traversals - variable-length patterns have max depth
- Early filtering - WHERE clauses appear as early as possible
- Limited results - LIMIT is applied before expensive operations
- Parameterized queries - variable values use parameters
- Selective projections - only necessary properties are returned
- Profiled execution - PROFILE shows acceptable metrics
Measuring Performance
Using EXPLAIN
Get the query plan without executing:
EXPLAIN
MATCH (user:User {username: "alice"})-[:POSTED]->(post:Post)
WHERE post.created_at > datetime() - duration('P7D')
RETURN post.title, post.created_at
ORDER BY post.created_at DESC
LIMIT 10
Using PROFILE
Execute and get actual metrics:
PROFILE
MATCH (user:User {username: "alice"})-[:POSTED]->(post:Post)
WHERE post.created_at > datetime() - duration('P7D')
RETURN post.title, post.created_at
ORDER BY post.created_at DESC
LIMIT 10
For detailed profiling techniques, see the Query Profiling Guide .
Summary
Optimizing GQL queries follows these principles:
- Start selective: Begin patterns with the most filtered nodes
- Be specific: Use exact labels and relationship types
- Bound traversals: Always limit variable-length patterns
- Filter early: Apply WHERE conditions as soon as possible
- Limit early: Reduce working sets before expensive operations
- Use indexes: Ensure indexes exist for common lookups
- Parameterize: Enable plan caching with parameters
- Profile: Measure before and after optimizations
Related Guides
- Index Strategy Guide - Choosing and creating indexes
- Query Profiling Guide - Analyzing execution plans
- Graph Modeling Guide - Schema design for performance
Questions? Discuss query optimization in our forum .