Query Profiling with PROFILE Command

Query profiling is essential for understanding and optimizing database performance. Geode’s PROFILE command provides comprehensive execution metrics, timing information, and resource usage statistics to help you identify bottlenecks and optimize query performance.

Understanding PROFILE

What is Query Profiling?

Query profiling captures detailed runtime information about query execution:

  • Execution Time: Time spent in each query phase
  • Row Counts: Number of rows processed at each step
  • Index Usage: Which indexes were used and how effectively
  • Memory Usage: Memory allocated during execution
  • Cache Hits: Cache effectiveness and hit rates
  • Lock Statistics: Locking behavior and contention

PROFILE vs EXPLAIN

While both commands analyze queries, they serve different purposes:

EXPLAIN:

  • Shows the query execution plan
  • Runs without executing the query
  • Displays estimated costs and strategies
  • Used for planning and design

PROFILE:

  • Executes the query and collects actual metrics
  • Shows real execution times and row counts
  • Reveals actual resource usage
  • Used for performance tuning

Use EXPLAIN during development, PROFILE during optimization.

Basic PROFILE Usage

Simple Profiling

PROFILE
MATCH (u:User)
WHERE u.age > 25
RETURN u.name, u.email;

Output:

+-------------------------+----------+----------+----------+
| Operation               | Rows     | Time(ms) | DB Hits  |
+-------------------------+----------+----------+----------+
| ProduceResults          | 1,523    | 2.3      | 0        |
| Projection              | 1,523    | 1.8      | 3,046    |
| Filter                  | 1,523    | 4.5      | 5,000    |
| NodeLabelScan(User)     | 5,000    | 8.1      | 5,000    |
+-------------------------+----------+----------+----------+
| Total                   | -        | 16.7     | 13,046   |
+-------------------------+----------+----------+----------+

Understanding Profile Output

Each row represents an operation in the query execution:

  • Operation: The type of operation performed
  • Rows: Actual number of rows produced by this operation
  • Time(ms): Milliseconds spent in this operation
  • DB Hits: Number of database lookups performed

Complex Query Profiling

PROFILE
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE u.created > '2025-01-01'
  AND f.verified = true
WITH u, COUNT(f) AS followCount
WHERE followCount > 10
RETURN u.name, followCount
ORDER BY followCount DESC
LIMIT 20;

Output:

+-------------------------+----------+----------+----------+
| Operation               | Rows     | Time(ms) | DB Hits  |
+-------------------------+----------+----------+----------+
| Top(20)                 | 20       | 0.1      | 0        |
| Projection              | 45       | 0.2      | 90       |
| Filter(followCount>10)  | 45       | 0.3      | 0        |
| Aggregation             | 145      | 12.5     | 0        |
| Filter(verified)        | 892      | 3.2      | 892      |
| Expand(FOLLOWS)         | 2,345    | 18.7     | 2,345    |
| Filter(created)         | 1,234    | 4.1      | 1,234    |
| NodeLabelScan(User)     | 5,000    | 8.9      | 5,000    |
+-------------------------+----------+----------+----------+
| Total                   | -        | 48.0     | 9,561    |
+-------------------------+----------+----------+----------+

Profiling Patterns

Index Usage Analysis

Profile queries to verify index usage:

-- Without index
PROFILE
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u;

-- Output shows full scan:
-- NodeLabelScan(User): 50,000 rows, 125ms

-- Create index
CREATE INDEX user_email_idx ON :User(email);

-- With index
PROFILE
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u;

-- Output shows index usage:
-- NodeIndexSeek(User.email): 1 row, 0.8ms

The dramatic difference in rows processed and time confirms the index is working.

Join Performance

Profile relationship traversals:

PROFILE
MATCH (p:Product)-[:IN_CATEGORY]->(c:Category)
WHERE c.name = 'Electronics'
RETURN p.name, p.price;

Analysis Points:

  • Check if Category filter happens before or after expansion
  • Verify index usage on Category.name
  • Compare row counts before and after filtering
  • Identify if relationship direction affects performance

Aggregation Performance

PROFILE
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH p.category AS category,
     COUNT(*) AS purchases,
     SUM(p.price) AS revenue
RETURN category, purchases, revenue
ORDER BY revenue DESC;

Watch For:

  • Memory usage in aggregation step
  • Sorting overhead with large result sets
  • Whether grouping keys are indexed
  • Impact of multiple aggregation functions

Subquery Profiling

PROFILE
MATCH (u:User)
WHERE EXISTS {
  MATCH (u)-[:POSTED]->(post:Post)
  WHERE post.likes > 100
}
RETURN u.name;

Profile shows subquery execution for each user, helping identify expensive existence checks.

Performance Optimization Workflow

1. Baseline Profiling

Establish current performance:

PROFILE
MATCH (u:User)-[:FRIEND]->(f:User)
WHERE u.city = 'Seattle'
RETURN u.name, COUNT(f) AS friendCount;

Record baseline metrics:

  • Total execution time: 245ms
  • DB Hits: 125,000
  • Rows scanned: 50,000

2. Identify Bottlenecks

Look for:

  • High Time: Operations taking disproportionate time
  • High DB Hits: Excessive database lookups
  • Large Row Counts: Inefficient filtering order
  • Full Scans: Missing indexes

In our example, NodeLabelScan(User) processes 50,000 rows when only 500 are in Seattle.

3. Apply Optimizations

Create index on frequently filtered property:

CREATE INDEX user_city_idx ON :User(city);

4. Re-Profile

PROFILE
MATCH (u:User)-[:FRIEND]->(f:User)
WHERE u.city = 'Seattle'
RETURN u.name, COUNT(f) AS friendCount;

New metrics:

  • Total execution time: 18ms (13.6x faster)
  • DB Hits: 1,200 (104x fewer)
  • Rows scanned: 500 (100x fewer)

5. Iterate

Continue profiling and optimizing until performance meets requirements.

Advanced Profiling Techniques

Cache Analysis

Profile with cache statistics:

PROFILE
MATCH (u:User)
WHERE u.id = $userId
RETURN u;

-- Second execution shows cache hits:
-- NodeIndexSeek: 1 row, 0.1ms (cached)

Memory Profiling

Monitor memory-intensive operations:

PROFILE
MATCH (u:User)
WITH u
ORDER BY u.name
RETURN u;

Large sorts show memory allocation in profile output.

Concurrent Query Profiling

Profile queries under load:

# Run multiple instances
for i in {1..10}; do
  geode shell -e "PROFILE MATCH (u:User) RETURN COUNT(u);" &
done

Compare single-query vs concurrent execution times.

Pattern Comparison

Profile different approaches to the same query:

Approach 1: Filter then expand

PROFILE
MATCH (u:User)
WHERE u.active = true
MATCH (u)-[:FOLLOWS]->(f:User)
RETURN u.name, f.name;

Approach 2: Expand then filter

PROFILE
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE u.active = true
RETURN u.name, f.name;

Compare total execution times and choose the faster approach.

Profiling Best Practices

Development Workflow

  1. Profile Early: Don’t wait until production
  2. Profile Often: Check performance after each change
  3. Profile Realistic Data: Use production-scale test data
  4. Profile Edge Cases: Test with empty, small, and large datasets
  5. Profile Under Load: Test concurrent execution

Interpreting Results

Red Flags:

  • Any operation > 100ms without data transfer
  • DB Hits » Row count (inefficient access)
  • Large row count early in pipeline (filter late)
  • Multiple full table scans

Green Flags:

  • Index seeks instead of scans
  • Filtering early in pipeline
  • Minimal DB hits per row
  • Consistent performance across executions

Common Patterns

Pattern: Early Filtering

-- GOOD: Filter first
MATCH (u:User)
WHERE u.role = 'admin'  -- Reduces rows early
MATCH (u)-[:MANAGES]->(t:Team)
RETURN u.name, t.name;

-- BAD: Filter late
MATCH (u:User)-[:MANAGES]->(t:Team)
WHERE u.role = 'admin'  -- Expands all users first
RETURN u.name, t.name;

Pattern: Index Coverage

-- Ensure indexes cover WHERE and JOIN conditions
CREATE INDEX user_role_idx ON :User(role);
CREATE INDEX team_manager_idx ON ()-[:MANAGES]->();

Pattern: Selective Properties

-- GOOD: Return only needed properties
RETURN u.name, u.email;

-- BAD: Return entire node
RETURN u;  -- Transfers unnecessary data

Profiling in Production

Safe Production Profiling

-- Profile with read-only queries
PROFILE
MATCH (u:User)
WHERE u.lastLogin > '2025-01-01'
RETURN COUNT(u);

-- NEVER profile write-heavy queries in production
-- (they execute and modify data)

Profiling via Monitoring

Integrate profiling into monitoring:

# Python example
import geode_client

client = geode_client.open_database()

async with client.connection() as conn:
    result, _ = await conn.query("""
        PROFILE
        MATCH (u:User)-[:PURCHASED]->(p:Product)
        WHERE u.id = ?
        RETURN p.name
    """, user_id)

    # Log profile metrics
    metrics = result.profile_data
    logger.info(f"Query time: {metrics.total_time}ms")
    logger.info(f"DB hits: {metrics.db_hits}")

Continuous Profiling

Set up automated profiling:

#!/bin/bash
# profile-queries.sh

while read -r query; do
  echo "Profiling: $query"
  geode shell -e "PROFILE $query" >> profile-log.txt
  sleep 1
done < critical-queries.txt

Troubleshooting with PROFILE

Slow Queries

Symptom: Query takes > 1 second

Diagnosis:

PROFILE
MATCH (u:User)-[:FRIEND*1..3]->(f:User)
WHERE u.id = 'user123'
RETURN f.name;

Analysis:

  • Check if variable-length expansion is too deep
  • Verify starting point uses index
  • Consider limiting depth or adding filters

High Memory Usage

Symptom: Out of memory errors

Diagnosis:

PROFILE
MATCH (u:User)
WITH u
ORDER BY u.name  -- Materializes all users
RETURN u
LIMIT 100;

Solution:

-- Use index for ordering
CREATE INDEX user_name_idx ON :User(name);

PROFILE
MATCH (u:User)
RETURN u
ORDER BY u.name
LIMIT 100;  -- Now uses index scan, no materialization

Unexpected Results

Symptom: Wrong number of rows returned

Diagnosis:

PROFILE
MATCH (u:User)-[:FOLLOWS]->(f:User)
WHERE u.city = 'Seattle'
RETURN COUNT(DISTINCT f);

Profile shows row counts at each step, revealing where rows are added/removed unexpectedly.

Integration with Development Tools

IDE Integration

Many IDEs support profiling output:

-- Visual Studio Code with Geode extension
-- Hover over PROFILE results for visual execution plan

CI/CD Profiling

Integrate profiling into tests:

// Go example
func TestQueryPerformance(t *testing.T) {
    result := db.Profile(`
        MATCH (u:User)
        WHERE u.active = true
        RETURN COUNT(u)
    `)

    if result.TotalTime > 100 {
        t.Errorf("Query too slow: %dms", result.TotalTime)
    }
}

Profiling Dashboards

Visualize profiling data:

# Grafana dashboard example
import grafana_api

def send_profile_metrics(profile_data):
    grafana_api.send({
        'query_time': profile_data.total_time,
        'db_hits': profile_data.db_hits,
        'rows_processed': profile_data.total_rows
    })
  • EXPLAIN : Query execution plans without execution
  • Performance : General performance optimization
  • Indexing : Index strategies for better performance
  • Optimization : Query optimization techniques
  • Benchmarking : Performance measurement and comparison
  • Monitoring : Production monitoring and observability
  • Caching : Query result caching strategies

Further Reading

  • PROFILE Command Reference: /docs/gql-reference/profile/
  • Performance Tuning Guide: /docs/performance/tuning-guide/
  • Index Optimization: /docs/performance/index-optimization/
  • Query Best Practices: /docs/best-practices/query-optimization/
  • Monitoring Guide: /docs/operations/monitoring/

Conclusion

The PROFILE command is your most powerful tool for understanding and optimizing query performance in Geode. By systematically profiling queries, identifying bottlenecks, and applying targeted optimizations, you can achieve dramatic performance improvements.

Remember:

  • Profile early and often during development
  • Compare before and after optimization metrics
  • Focus on the biggest bottlenecks first
  • Use realistic data volumes for profiling
  • Integrate profiling into your CI/CD pipeline

Master query profiling to build high-performance graph applications with Geode.


Related Articles