Tag: Query Optimization

Documentation tagged with query-optimization in the Geode graph database. This comprehensive collection covers query performance analysis, optimization techniques, execution plan understanding, and strategies for achieving maximum efficiency in your GQL queries.

Overview

Query optimization is critical for building high-performance graph applications. Geode provides powerful tools for analyzing and optimizing query execution, including the EXPLAIN command for execution plan analysis, the PROFILE command for runtime performance metrics, and advanced index strategies for accelerating pattern matching and traversals.

Understanding query optimization helps you:

Analyze Query Performance: Use EXPLAIN to understand execution strategies
Optimize Index Usage: Design indexes that accelerate critical access patterns
Reduce Execution Time: Apply optimization techniques to slow queries
Scale to Production: Ensure consistent performance under load
Debug Performance Issues: Identify and resolve bottlenecks systematically

Core Concepts

Execution Plan Analysis

Geode’s query optimizer generates execution plans that determine how queries are executed:

Understanding EXPLAIN Output:

// Analyze a pattern match query
EXPLAIN
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE u.name = 'Alice'
RETURN f.name, f.created_at
ORDER BY f.created_at DESC
LIMIT 10

// Output shows:
// 1. Access method (index seek vs scan)
// 2. Join strategy (nested loop, hash join)
// 3. Filter application points
// 4. Sort operations
// 5. Estimated row counts

Key Plan Components:

Access Methods: Index seek, index scan, full scan
Join Strategies: Nested loop, hash join, merge join
Filter Placement: Early vs late predicate application
Sort Operations: In-memory vs external sort
Projection: Column selection and transformation

Index Strategy

Indexes dramatically accelerate query execution when used effectively:

Index Types:

// Property index for equality lookups
CREATE INDEX user_email_idx ON User(email);

// Composite index for multi-property filters
CREATE INDEX user_location_idx ON User(city, state);

// Label index for type-based queries
CREATE INDEX transaction_type_idx ON Transaction(type, amount);

Index Selection Rules:

Equality First: Indexes work best for equality predicates
Prefix Matching: Composite indexes match left-to-right
Cardinality Matters: High-cardinality columns benefit most
Covering Indexes: Include all referenced properties when possible

Pattern Matching Efficiency

Graph pattern matching can be optimized through careful query design:

Optimization Techniques:

// BEFORE: Inefficient - scans all users
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE f.name STARTS WITH 'A'
RETURN u.name, f.name;

// AFTER: Optimized - uses index on target first
MATCH (f:User)-[:FOLLOWS]-(u:User)
WHERE f.name STARTS WITH 'A'
RETURN u.name, f.name;

// Even better: Add index
CREATE INDEX user_name_prefix_idx ON User(name);

Pattern Ordering:

Filter Early: Apply selective predicates first
Index-Backed Patterns: Start with indexed properties
Relationship Direction: Consider relationship cardinality
Optional Matches: Use sparingly, can force Cartesian products

Optimization Strategies

Query Rewriting

Restructure queries for better performance:

Predicate Pushdown:

// BEFORE: Filter after aggregation
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH u, COUNT(p) AS purchase_count
WHERE purchase_count > 10
RETURN u.name, purchase_count;

// AFTER: Filter during traversal
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH u, p
WHERE u.account_status = 'active'
WITH u, COUNT(p) AS purchase_count
WHERE purchase_count > 10
RETURN u.name, purchase_count;

Join Elimination:

// BEFORE: Unnecessary self-join
MATCH (u1:User)-[:FOLLOWS]->(u2:User)
MATCH (u2)-[:POSTS]->(p:Post)
WHERE u1.id = 'user123'
RETURN p;

// AFTER: Direct path
MATCH (u:User {id: 'user123'})-[:FOLLOWS]->()-[:POSTS]->(p:Post)
RETURN p;

Aggregation Optimization

Optimize aggregation queries for large datasets:

Incremental Aggregation:

// Use WITH to aggregate incrementally
MATCH (u:User)-[:PURCHASED]->(p:Product)-[:IN_CATEGORY]->(c:Category)
WHERE u.region = 'West'
WITH c, SUM(p.price) AS category_total
WHERE category_total > 1000
RETURN c.name, category_total
ORDER BY category_total DESC;

Pre-Aggregated Data:

// Maintain aggregates for fast access
MATCH (u:User)
SET u.total_purchases = size((u)-[:PURCHASED]->());

// Query uses pre-computed value
MATCH (u:User)
WHERE u.total_purchases > 100
RETURN u.name, u.total_purchases;

Index Coverage

Design indexes that cover entire queries:

Covering Index Example:

// Create covering index
CREATE INDEX user_profile_idx ON User(city, state, age, name);

// Query uses index without accessing node
MATCH (u:User)
WHERE u.city = 'Seattle' AND u.state = 'WA' AND u.age > 25
RETURN u.name;
// All data comes from index, no node lookup needed

Performance Tools

EXPLAIN Command

Analyze query execution plans before running queries:

Basic Usage:

// See execution plan without running query
EXPLAIN
MATCH (u:User)-[:FOLLOWS*1..3]->(friend:User)
WHERE u.name = 'Alice'
RETURN friend.name;

Plan Interpretation:

QueryPlan:
  - NodeIndexSeek: User(name = 'Alice') -> u
    Estimated rows: 1
    Index: user_name_idx

  - VariableLengthExpand: (u)-[:FOLLOWS*1..3]->(friend)
    Estimated rows: 150
    Strategy: BFS traversal
    Max depth: 3

  - Projection: friend.name
    Estimated rows: 150

PROFILE Command

Measure actual runtime performance:

Runtime Analysis:

// Execute query and collect performance metrics
PROFILE
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE u.region = 'East' AND p.price > 100
RETURN u.name, p.name, p.price
LIMIT 100;

Profile Output:

Execution Profile:
  Total time: 45ms

  NodeIndexSeek (User.region = 'East'):
    Time: 2ms
    Rows: 15,234
    Index: user_region_idx

  Expand (PURCHASED):
    Time: 28ms
    Rows: 45,678

  Filter (p.price > 100):
    Time: 12ms
    Input rows: 45,678
    Output rows: 12,345

  Projection + Limit:
    Time: 3ms
    Rows: 100

Query Metrics

Monitor query performance in production:

Key Metrics:

Execution Time: Total query runtime
Rows Examined: Total nodes/edges accessed
Rows Returned: Result set size
Index Hits: Index usage count
Cache Efficiency: Hot data access rate
Memory Usage: Query memory footprint

Best Practices

Index Design

Create indexes strategically:

High-Impact Indexes:

// 1. Unique identifiers
CREATE UNIQUE INDEX user_id_idx ON User(user_id);

// 2. Foreign key equivalents
CREATE INDEX order_user_idx ON Order(user_id);

// 3. Common filter columns
CREATE INDEX product_category_idx ON Product(category);

// 4. Sort columns
CREATE INDEX event_timestamp_idx ON Event(timestamp);

// 5. Composite for complex queries
CREATE INDEX user_search_idx ON User(status, region, created_at);

Index Maintenance:

Avoid Over-Indexing: Each index adds write overhead
Monitor Usage: Remove unused indexes
Update Statistics: Keep optimizer statistics current
Rebuild When Needed: Maintain index health

Query Structure

Write efficient query patterns:

Efficiency Guidelines:

// ✓ Good: Filter early with indexes
MATCH (u:User {status: 'active'})-[:PURCHASED]->(p:Product)
WHERE p.category = 'Electronics'
RETURN u.name, COUNT(p);

// ✗ Bad: Late filtering forces full scan
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE u.status = 'active' AND p.category = 'Electronics'
RETURN u.name, COUNT(p);

// ✓ Good: Limit early when possible
MATCH (u:User)-[:RECENT_LOGIN]->(l:Login)
WHERE u.region = 'West'
WITH u, l
ORDER BY l.timestamp DESC
LIMIT 10
RETURN u.name, l.timestamp;

// ✗ Bad: Sort entire result set
MATCH (u:User)-[:RECENT_LOGIN]->(l:Login)
WHERE u.region = 'West'
RETURN u.name, l.timestamp
ORDER BY l.timestamp DESC
LIMIT 10;

Traversal Optimization

Optimize graph traversal queries:

Traversal Strategies:

// Bounded depth prevents runaway queries
MATCH (u:User)-[:FOLLOWS*1..3]->(friend)
WHERE u.id = 'user123'
RETURN DISTINCT friend.name;

// Directed traversals when possible
MATCH (u:User)-[:FOLLOWS]->(friend)  // Better
// vs
MATCH (u:User)-[:FOLLOWS]-(friend)   // Checks both directions

// Prune paths early
MATCH path = (u:User)-[:FOLLOWS*1..5]->(influencer:User)
WHERE ALL(n IN nodes(path) WHERE n.account_status = 'active')
  AND influencer.follower_count > 10000
RETURN influencer.name;

Troubleshooting

Slow Query Diagnosis

Systematic approach to finding bottlenecks:

Step 1: Get Baseline:

// Measure current performance
PROFILE
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE u.region = 'West'
RETURN u.name, COUNT(p) AS purchases
ORDER BY purchases DESC
LIMIT 20;

Step 2: Analyze Plan:

// Check execution strategy
EXPLAIN
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE u.region = 'West'
RETURN u.name, COUNT(p) AS purchases
ORDER BY purchases DESC
LIMIT 20;

// Look for:
// - Full scans (missing indexes)
// - Large intermediate results
// - Cartesian products
// - Late filtering

Step 3: Add Indexes:

// Address index gaps
CREATE INDEX user_region_idx ON User(region);

// Re-measure
PROFILE
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE u.region = 'West'
RETURN u.name, COUNT(p) AS purchases
ORDER BY purchases DESC
LIMIT 20;

Common Issues

Issue: Full Table Scans:

// Problem: No index on filter column
MATCH (p:Product)
WHERE p.sku = 'ABC123'
RETURN p;
// Shows: FullScan (Product) in EXPLAIN

// Solution: Add index
CREATE INDEX product_sku_idx ON Product(sku);

Issue: Cartesian Products:

// Problem: Disconnected patterns
MATCH (u:User), (p:Product)
WHERE u.region = 'West' AND p.category = 'Books'
RETURN u.name, p.name;
// Generates u_count * p_count rows

// Solution: Add relationship
MATCH (u:User)-[:INTERESTED_IN]->(c:Category {name: 'Books'})<-[:IN_CATEGORY]-(p:Product)
WHERE u.region = 'West'
RETURN u.name, p.name;

Issue: Expensive Aggregations:

// Problem: Aggregating too late
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH u, COUNT(p) AS purchase_count
WHERE purchase_count > 50
RETURN u.name, purchase_count;

// Solution: Filter before aggregating
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE u.account_type = 'premium'  // Reduces input set
WITH u, COUNT(p) AS purchase_count
WHERE purchase_count > 50
RETURN u.name, purchase_count;

Production Optimization

Monitoring

Track query performance continuously:

Key Queries to Monitor:

// Identify slow queries
SELECT query_text, avg_duration, execution_count
FROM system.query_stats
WHERE avg_duration > 1000  -- milliseconds
ORDER BY avg_duration DESC
LIMIT 20;

// Find missing indexes
SELECT table_name, column_name, scan_count
FROM system.column_stats
WHERE scan_count > 1000 AND index_exists = false
ORDER BY scan_count DESC;

Capacity Planning

Plan for growth:

Scaling Considerations:

Data Growth: Test queries at 2x, 5x, 10x current size
Concurrent Users: Validate performance under load
Index Memory: Ensure hot indexes fit in memory
Query Mix: Optimize for actual workload patterns
Caching: Leverage query result caching when appropriate

Query Tuning Workflow

Systematic optimization process:

Identify Slow Queries: Use monitoring to find problems
Analyze Execution Plans: Understand current strategy
Create Targeted Indexes: Address access pattern gaps
Rewrite Inefficient Queries: Apply optimization patterns
Measure Improvements: Validate changes with PROFILE
Deploy and Monitor: Track production performance
Iterate: Continuously refine based on real usage

Advanced Techniques

Materialized Views

Pre-compute expensive queries:

// Create materialized aggregation
CREATE GRAPH SCHEMA popular_products AS
MATCH (p:Product)<-[:PURCHASED]-(u:User)
WITH p, COUNT(u) AS purchase_count
WHERE purchase_count > 100
RETURN p.id AS product_id,
       p.name AS product_name,
       purchase_count;

// Query uses pre-computed results
MATCH (mp:popular_products)
WHERE mp.purchase_count > 500
RETURN mp.product_name, mp.purchase_count;

Query Hints

Guide optimizer when needed:

// Force index usage
MATCH (u:User)
USING INDEX user_email_idx
WHERE u.email = 'user@example.com'
RETURN u;

// Control join strategy
MATCH (u:User)-[:PURCHASED]->(p:Product)
USING JOIN hash_join
WHERE u.region = 'West'
RETURN u.name, p.name;

Batch Processing

Optimize bulk operations:

// Process in batches to control memory
MATCH (u:User)
WHERE u.status = 'inactive'
WITH u
LIMIT 1000
SET u.archived = true
RETURN COUNT(u);
// Repeat until all processed

Integration with Geode

Query optimization integrates with other Geode features:

Transactions: Optimize within transaction boundaries
Concurrency: Design for concurrent query execution
Caching: Leverage Geode’s query cache effectively
Monitoring: Use built-in metrics for continuous improvement
Security: Ensure RLS policies don’t impede performance

EXPLAIN Command: Detailed execution plan analysis
PROFILE Command: Runtime performance profiling
Indexes: Index design and management
Query Performance: Overall performance considerations
Monitoring: Production query monitoring
Best Practices: General optimization guidelines
Troubleshooting: Debugging performance issues

Browse the tagged content below to discover comprehensive guides, tutorials, and best practices for query optimization in Geode. Master the tools and techniques needed to build high-performance graph applications that scale to production workloads.

Popular

Overview

Core Concepts

Execution Plan Analysis

Index Strategy

Pattern Matching Efficiency

Optimization Strategies

Query Rewriting

Aggregation Optimization

Index Coverage

Performance Tools

EXPLAIN Command

PROFILE Command

Query Metrics

Best Practices

Index Design

Query Structure

Traversal Optimization

Troubleshooting

Slow Query Diagnosis

Common Issues

Production Optimization

Monitoring

Capacity Planning

Query Tuning Workflow

Advanced Techniques

Materialized Views

Query Hints

Batch Processing

Integration with Geode

Related Articles

Query Optimization

Query Profiling with EXPLAIN and PROFILE

Query Optimization

Overview Share link

Core Concepts Share link

Execution Plan Analysis Share link

Index Strategy Share link

Pattern Matching Efficiency Share link

Optimization Strategies Share link

Query Rewriting Share link

Aggregation Optimization Share link

Index Coverage Share link

Performance Tools Share link

EXPLAIN Command Share link

PROFILE Command Share link

Query Metrics Share link

Best Practices Share link

Index Design Share link

Query Structure Share link

Traversal Optimization Share link

Troubleshooting Share link

Slow Query Diagnosis Share link

Common Issues Share link

Production Optimization Share link

Monitoring Share link

Capacity Planning Share link

Query Tuning Workflow Share link

Advanced Techniques Share link

Materialized Views Share link

Query Hints Share link

Batch Processing Share link

Integration with Geode Share link

Related Topics Share link

Related Articles

Query Optimization

Query Profiling with EXPLAIN and PROFILE

Query Optimization

Overview

Core Concepts

Execution Plan Analysis

Index Strategy

Pattern Matching Efficiency

Optimization Strategies

Query Rewriting

Aggregation Optimization

Index Coverage

Performance Tools

EXPLAIN Command

PROFILE Command

Query Metrics

Best Practices

Index Design

Query Structure

Traversal Optimization

Troubleshooting

Slow Query Diagnosis

Common Issues

Production Optimization

Monitoring

Capacity Planning

Query Tuning Workflow

Advanced Techniques

Materialized Views

Query Hints

Batch Processing

Integration with Geode

Related Topics