Query Execution Plans (EXPLAIN) | Tags

EXPLAIN and PROFILE are essential tools for understanding and optimizing query performance in Geode. They provide visibility into how the query planner executes your GQL queries, helping you identify bottlenecks and optimize performance.

What is EXPLAIN?

EXPLAIN shows the query execution plan without actually running the query. It reveals:

Execution Steps: What operations will be performed
Order of Operations: The sequence of execution
Index Usage: Which indexes will be used
Estimated Costs: Relative cost estimates for each operation
Optimization Decisions: How the query planner optimized your query

PROFILE goes further by executing the query and providing actual runtime statistics including row counts, execution times, and memory usage.

Basic Usage

EXPLAIN Syntax

// Show execution plan without running query
EXPLAIN
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE u.id = 123
RETURN f.name, f.email

// Output shows plan but doesn't execute query

PROFILE Syntax

// Show execution plan AND run query with actual statistics
PROFILE
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE u.id = 123
RETURN f.name, f.email

// Output includes actual row counts, timing, and memory usage

Understanding Execution Plans

Plan Structure

EXPLAIN
MATCH (u:User {email: 'alice@example.com'})-[:POSTED]->(p:Post)
WHERE p.published = true
RETURN p.title, p.created_at
ORDER BY p.created_at DESC
LIMIT 10

Example Output:

Limit (rows=10)
  Sort (key=p.created_at DESC)
    Filter (p.published = true)
      Expand (u)-[:POSTED]->(p:Post)
        NodeIndexSeek (u:User, email='[email protected]')
          [Index: User.email]

Reading the Plan (bottom to top):

NodeIndexSeek: Use index on User.email to find user
Expand: Traverse POSTED relationships to find posts
Filter: Keep only published posts
Sort: Order results by created_at descending
Limit: Take only first 10 results

Common Plan Operators

Index Operations

// NodeIndexSeek - Fast index lookup
EXPLAIN
MATCH (u:User {email: 'alice@example.com'})
RETURN u
// Plan: NodeIndexSeek (u:User, email='alice@example.com')

// NodeIndexScan - Scan entire index
EXPLAIN
MATCH (u:User)
WHERE u.verified = true
RETURN u
// Plan: NodeIndexScan (u:User, verified=true)

// NodeLabelScan - Scan all nodes with label
EXPLAIN
MATCH (u:User)
RETURN u
// Plan: NodeLabelScan (u:User)

// AllNodesScan - Scan all nodes (slowest)
EXPLAIN
MATCH (n)
RETURN n
// Plan: AllNodesScan

Relationship Traversal

// Expand - Traverse relationships
EXPLAIN
MATCH (u:User)-[:FOLLOWS]->(f:User)
RETURN f
// Plan: Expand (u)-[:FOLLOWS]->(f:User)

// VarLengthExpand - Variable-length path traversal
EXPLAIN
MATCH (u:User)-[:KNOWS*1..3]->(friend)
RETURN friend
// Plan: VarLengthExpand (u)-[:KNOWS*1..3]->(friend)

// OptionalExpand - Left outer join
EXPLAIN
MATCH (u:User)
OPTIONAL MATCH (u)-[:POSTED]->(p:Post)
RETURN u, p
// Plan: OptionalExpand (u)-[:POSTED]->(p:Post)

Filtering and Processing

// Filter - Apply WHERE conditions
EXPLAIN
MATCH (u:User)
WHERE u.age >= 18 AND u.verified = true
RETURN u
// Plan: Filter (u.age >= 18 AND u.verified = true)

// Sort - ORDER BY clause
EXPLAIN
MATCH (u:User)
RETURN u.name
ORDER BY u.created_at DESC
// Plan: Sort (u.created_at DESC)

// Limit - LIMIT clause
EXPLAIN
MATCH (u:User)
RETURN u
LIMIT 10
// Plan: Limit (rows=10)

// Skip - OFFSET clause
EXPLAIN
MATCH (u:User)
RETURN u
OFFSET 20 LIMIT 10
// Plan: Limit (rows=10)
//       Skip (rows=20)

Aggregation

// Aggregate - Grouping and aggregation
EXPLAIN
MATCH (u:User)-[:POSTED]->(p:Post)
RETURN u.name, COUNT(p) AS post_count
// Plan: Aggregate (key=[u.name], agg=[COUNT(p)])

// Distinct - Remove duplicates
EXPLAIN
MATCH (u:User)-[:POSTED]->(p:Post)-[:HAS_TAG]->(t:Tag)
RETURN DISTINCT t.name
// Plan: Distinct (t.name)

PROFILE Statistics

Runtime Metrics

PROFILE
MATCH (u:User)-[:POSTED]->(p:Post)
WHERE u.verified = true
  AND p.published = true
RETURN u.name, COUNT(p) AS posts
ORDER BY posts DESC
LIMIT 10

Example PROFILE Output:

Limit (rows=10) [actual_rows=10, time=0.2ms]
  Sort (key=posts DESC) [actual_rows=1523, time=5.4ms, memory=256KB]
    Aggregate (key=[u.name], agg=[COUNT(p)]) [actual_rows=1523, time=12.3ms]
      Filter (p.published = true) [actual_rows=42891, time=8.7ms]
        Expand (u)-[:POSTED]->(p:Post) [actual_rows=45203, time=15.2ms]
          Filter (u.verified = true) [actual_rows=8234, time=3.1ms]
            NodeLabelScan (u:User) [actual_rows=10000, time=2.8ms]
              [scanned: 10000 nodes]

Key Metrics:

actual_rows: Number of rows produced by this step
time: Time spent in this operation
memory: Memory used (for operations like Sort, Aggregate)
scanned: Number of nodes/relationships examined

Identifying Performance Issues

High Row Count Discrepancy

// Problem: Scanning many rows, returning few
PROFILE
MATCH (u:User)-[:POSTED]->(p:Post)
WHERE u.email = 'alice@example.com'
RETURN p

// Output shows:
// Expand [...] [actual_rows=100]
//   NodeLabelScan (u:User) [actual_rows=1000000]
//
// ISSUE: Scanning 1M users to find 1 user!
// SOLUTION: Add index on User.email

Expensive Sorts

// Problem: Sorting large result sets
PROFILE
MATCH (p:Post)
RETURN p.title
ORDER BY p.created_at DESC
LIMIT 10

// Output shows:
// Limit [...] [actual_rows=10]
//   Sort [...] [actual_rows=10000000, time=5000ms, memory=2GB]
//
// ISSUE: Sorting 10M rows to get 10 results
// SOLUTION: Use index on created_at for sorted scan

Cartesian Products

// Problem: Unintended Cartesian product
PROFILE
MATCH (u:User), (p:Post)
WHERE u.location = p.location
RETURN u, p

// Output shows:
// Filter [...] [actual_rows=5000]
//   CartesianProduct [actual_rows=100000000]
//     NodeLabelScan (u:User) [actual_rows=10000]
//     NodeLabelScan (p:Post) [actual_rows=10000]
//
// ISSUE: 10K * 10K = 100M intermediate rows!
// SOLUTION: Add relationship or index-backed join

Optimization Techniques

Index Creation

// Identify missing index
PROFILE
MATCH (u:User)
WHERE u.email = 'alice@example.com'
RETURN u

// Output shows NodeLabelScan - no index!
// Plan: NodeLabelScan (u:User) [actual_rows=1000000]
//       Filter (u.email = 'alice@example.com') [actual_rows=1]

// Solution: Create index
CREATE INDEX FOR (u:User) ON (u.email)

// Verify index usage
EXPLAIN
MATCH (u:User)
WHERE u.email = 'alice@example.com'
RETURN u
// Plan now shows: NodeIndexSeek (u:User, email='alice@example.com')

Filter Pushdown

// Before optimization
PROFILE
MATCH (u:User)-[:POSTED]->(p:Post)
WHERE u.verified = true
RETURN p

// Better: Filter early
PROFILE
MATCH (u:User)
WHERE u.verified = true
MATCH (u)-[:POSTED]->(p:Post)
RETURN p

// Best: Use inline filter
PROFILE
MATCH (u:User {verified: true})-[:POSTED]->(p:Post)
RETURN p

Limit Early

// Inefficient: Sort then limit
PROFILE
MATCH (u:User)
RETURN u.name
ORDER BY u.created_at DESC
LIMIT 10

// Efficient: Use top-N optimization
// (Geode automatically optimizes this pattern)
PROFILE
MATCH (u:User)
RETURN u.name
ORDER BY u.created_at DESC
LIMIT 10

Index-Backed Ordering

// Create index for ordering
CREATE INDEX FOR (p:Post) ON (p.created_at)

// Before index
EXPLAIN
MATCH (p:Post)
RETURN p.title
ORDER BY p.created_at DESC
// Plan: Sort (p.created_at DESC)
//       NodeLabelScan (p:Post)

// After index
EXPLAIN
MATCH (p:Post)
RETURN p.title
ORDER BY p.created_at DESC
// Plan: NodeIndexScan (p:Post, order=created_at DESC)
// No separate Sort operation needed!

Advanced Analysis

Comparing Query Alternatives

// Option 1: Aggregate then filter
EXPLAIN
MATCH (u:User)-[:POSTED]->(p:Post)
WITH u, COUNT(p) AS post_count
WHERE post_count > 10
RETURN u.name, post_count

// Option 2: Filter then aggregate
EXPLAIN
MATCH (u:User)-[:POSTED]->(p:Post)
WHERE p.published = true
WITH u, COUNT(p) AS post_count
WHERE post_count > 10
RETURN u.name, post_count

// Compare plans to see which is more efficient

Analyzing Variable-Length Paths

// Check path expansion cost
PROFILE
MATCH path = (a:User {id: 1})-[:KNOWS*1..3]->(b:User)
RETURN b.name, LENGTH(path) AS hops

// Output shows:
// VarLengthExpand [...] [actual_rows=15234, time=234ms]
//   NodeIndexSeek (a:User, id=1) [actual_rows=1]
//
// If too expensive, consider:
// - Reducing max hops
// - Adding relationship property filters
// - Using shortest path instead

Subquery Performance

// Analyze subquery efficiency
PROFILE
MATCH (u:User)
WHERE EXISTS {
  MATCH (u)-[:POSTED]->(p:Post)
  WHERE p.likes > 100
}
RETURN u.name

// Look for Semi-Apply or Anti-Apply operators
// Plan: SemiApply
//         Expand (u)-[:POSTED]->(p:Post)
//           Filter (p.likes > 100)
//       NodeLabelScan (u:User)

Real-World Optimization Examples

Example 1: Slow User Profile Query

// Original query (slow)
PROFILE
MATCH (u:User)-[:POSTED]->(p:Post)
WHERE u.username = 'alice'
RETURN u.name, COUNT(p) AS posts

// Plan shows:
// Aggregate [...] [actual_rows=1]
//   Expand (u)-[:POSTED]->(p) [actual_rows=100]
//     Filter (u.username = 'alice') [actual_rows=1]
//       NodeLabelScan (u:User) [actual_rows=1000000, time=500ms]

// Problem: Scanning 1M users

// Solution: Add index
CREATE INDEX FOR (u:User) ON (u.username)

// Optimized query
PROFILE
MATCH (u:User {username: 'alice'})-[:POSTED]->(p:Post)
RETURN u.name, COUNT(p) AS posts

// New plan:
// Aggregate [...] [actual_rows=1]
//   Expand (u)-[:POSTED]->(p) [actual_rows=100]
//     NodeIndexSeek (u:User, username='alice') [actual_rows=1, time=0.1ms]

// Improvement: 5000x faster!

Example 2: Expensive Recommendation Query

// Original query (very slow)
PROFILE
MATCH (u:User)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(other:User)
MATCH (other)-[:PURCHASED]->(rec:Product)
WHERE u.id = 123
  AND NOT (u)-[:PURCHASED]->(rec)
RETURN rec.name, COUNT(other) AS similar_users
ORDER BY similar_users DESC
LIMIT 10

// Plan shows Cartesian product and late filtering

// Optimized query
PROFILE
MATCH (u:User {id: 123})-[:PURCHASED]->(p:Product)
WITH u, COLLECT(p) AS purchased
MATCH (p IN purchased)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(rec:Product)
WHERE NOT rec IN purchased
  AND rec <> p
RETURN rec.name, COUNT(DISTINCT other) AS similar_users
ORDER BY similar_users DESC
LIMIT 10

// New plan shows early anchoring and reduced intermediate rows

Example 3: Inefficient Sorting

// Original query
PROFILE
MATCH (p:Post)
WHERE p.category = 'tech'
RETURN p.title, p.created_at
ORDER BY p.created_at DESC
LIMIT 20

// Plan shows:
// Limit [...] [actual_rows=20]
//   Sort [...] [actual_rows=500000, time=2000ms, memory=1GB]
//     Filter (p.category = 'tech') [actual_rows=500000]
//       NodeLabelScan (p:Post) [actual_rows=5000000]

// Solution: Composite index
CREATE INDEX FOR (p:Post) ON (p.category, p.created_at)

// Optimized plan:
// Limit [...] [actual_rows=20, time=0.5ms]
//   NodeIndexScan (p:Post, category='tech', order=created_at DESC)
//     [actual_rows=20]  // Only reads 20 rows!

Client Library Integration

Python

from geode_client import Client

client = Client(host="localhost", port=3141)

async with client.connection() as conn:
    # Get EXPLAIN output
    result, _ = await conn.query("""
        EXPLAIN
        MATCH (u:User)-[:FOLLOWS]->(f:User)
        WHERE u.id = $user_id
        RETURN f.name
    """, {"user_id": 123})

    # Plan is in result metadata
    plan = result.execution_plan
    print(f"Plan: {plan}")

    # Get PROFILE statistics
    result, _ = await conn.query("""
        PROFILE
        MATCH (u:User)-[:FOLLOWS]->(f:User)
        WHERE u.id = $user_id
        RETURN f.name
    """, {"user_id": 123})

    stats = result.profile_stats
    print(f"Total time: {stats['total_time_ms']}ms")
    print(f"Rows examined: {stats['rows_examined']}")

Go

import "database/sql"
import _ "geodedb.com/geode"

db, _ := sql.Open("geode", "quic://localhost:3141")

// Execute EXPLAIN
rows, _ := db.Query(`
    EXPLAIN
    MATCH (u:User)-[:FOLLOWS]->(f:User)
    WHERE u.id = $1
    RETURN f.name
`, 123)

// Read plan from result
for rows.Next() {
    var planStep string
    rows.Scan(&planStep)
    fmt.Println(planStep)
}

// Execute PROFILE
rows, _ = db.Query(`
    PROFILE
    MATCH (u:User)-[:FOLLOWS]->(f:User)
    WHERE u.id = $1
    RETURN f.name
`, 123)

// Profile results include stats

Rust

use geode_client::Client;

let client = Client::connect("localhost:3141").await?;

// Get EXPLAIN plan
let result = client.query(
    r#"
        EXPLAIN
        MATCH (u:User)-[:FOLLOWS]->(f:User)
        WHERE u.id = $user_id
        RETURN f.name
    "#,
    &[("user_id", &123)],
).await?;

println!("Execution plan: {}", result.explain_plan().unwrap());

// Get PROFILE statistics
let result = client.query(
    r#"
        PROFILE
        MATCH (u:User)-[:FOLLOWS]->(f:User)
        WHERE u.id = $user_id
        RETURN f.name
    "#,
    &[("user_id", &123)],
).await?;

let stats = result.profile_stats().unwrap();
println!("Total time: {}ms", stats.total_time_ms);
println!("Rows examined: {}", stats.rows_examined);

Best Practices

Always EXPLAIN First: Before optimizing, understand the current plan
Create Indexes Strategically: Add indexes for frequently filtered properties
Use PROFILE for Real Data: Estimated costs can differ from actual performance
Compare Alternatives: Use EXPLAIN to compare different query formulations
Monitor Production: Track query performance over time as data grows
Index Selectivity: Prioritize indexes on high-selectivity properties
Avoid Over-Indexing: Too many indexes slow down writes
Composite Indexes: Use for common multi-property filters and ordering

Common Plan Patterns

Good Plans

// Index seek with minimal rows
NodeIndexSeek (u:User, email='...') [actual_rows=1]
  Expand (u)-[:POSTED]->(p) [actual_rows=10]

// Early filtering
Filter (u.verified = true) [actual_rows=1000]
  NodeLabelScan (u:User) [actual_rows=1500]

// Index-backed ordering
NodeIndexScan (p:Post, order=created_at DESC) [actual_rows=10]

Problematic Plans

// Full table scan
NodeLabelScan (u:User) [actual_rows=10000000]
  Filter (u.email = '...') [actual_rows=1]

// Cartesian product
CartesianProduct [actual_rows=100000000]

// Large sort
Sort [...] [actual_rows=5000000, memory=10GB]

// Unbounded expansion
VarLengthExpand (...)-[:KNOWS*]->(...) [actual_rows=50000000]

Troubleshooting Guide

Query Taking Too Long

Run PROFILE to identify bottleneck
Look for high-cost operations (Sort, Filter with many rows)
Check if indexes are being used
Consider query restructuring

High Memory Usage

Look for Sort operators on large result sets
Check Aggregate operations
Consider pagination (LIMIT/OFFSET)
Use streaming in client libraries

Unexpected Results

Check actual_rows vs. expected
Verify filter conditions are applied
Look for Cartesian products
Ensure correct relationship directions

Query Language: GQL query syntax and patterns
Indexing: Creating and managing indexes
Performance: Query optimization strategies
MATCH Clause: Pattern matching fundamentals
Monitoring: Production query monitoring

What is EXPLAIN? Share link

Basic Usage Share link

EXPLAIN Syntax Share link

PROFILE Syntax Share link

Understanding Execution Plans Share link

Plan Structure Share link

Common Plan Operators Share link

Index Operations Share link

Relationship Traversal Share link

Filtering and Processing Share link

Aggregation Share link

PROFILE Statistics Share link

Runtime Metrics Share link

Identifying Performance Issues Share link

High Row Count Discrepancy Share link

Expensive Sorts Share link

Cartesian Products Share link

Optimization Techniques Share link

Index Creation Share link

Filter Pushdown Share link

Limit Early Share link

Index-Backed Ordering Share link

Advanced Analysis Share link

Comparing Query Alternatives Share link

Analyzing Variable-Length Paths Share link

Subquery Performance Share link

Real-World Optimization Examples Share link

Example 1: Slow User Profile Query Share link

Example 2: Expensive Recommendation Query Share link

Example 3: Inefficient Sorting Share link

Client Library Integration Share link

Python Share link

Go Share link

Rust Share link

Best Practices Share link

Common Plan Patterns Share link

Good Plans Share link

Problematic Plans Share link

Troubleshooting Guide Share link

Query Taking Too Long Share link

High Memory Usage Share link

Unexpected Results Share link

Related Topics Share link

Further Reading Share link

Related Articles

Querying, Indexing, and Query Optimization

Query Performance Tuning Guide

Query Optimization

Query Profiling Guide

EXPLAIN Command