Performance Profiling | Tags

Performance profiling is the systematic analysis of query execution to identify bottlenecks, optimize resource usage, and improve response times. In Geode, profiling capabilities provide detailed insights into how GQL queries execute, enabling data-driven optimization decisions for graph database workloads.

Understanding Performance Profiling

Performance profiling reveals the internal execution characteristics of database queries, exposing where time is spent and resources are consumed. For graph databases like Geode, profiling becomes particularly important due to the complex traversal patterns and relationship-heavy operations that can exhibit non-linear performance characteristics.

Why Profile Graph Queries

Graph queries differ fundamentally from relational queries. A graph traversal can touch a few nodes or millions, depending on data density and relationship patterns. Without profiling, developers operate blind to execution efficiency, potentially deploying queries that perform well on test data but degrade dramatically in production.

Profiling provides concrete metrics on execution time, rows processed, index utilization, and memory consumption. This empirical evidence guides optimization efforts toward the operations that actually impact performance, rather than premature optimization based on assumptions.

Profiling vs. Explaining

Geode provides two complementary analysis tools: EXPLAIN shows the query execution plan without running the query, while PROFILE executes the query and provides actual runtime metrics. EXPLAIN helps understand what the optimizer intends to do, while PROFILE reveals what actually happened during execution.

The PROFILE Command

Geode’s PROFILE command executes a query while collecting detailed performance metrics.

Basic Profiling

The simplest profiling workflow prefixes any GQL query with the PROFILE keyword:

PROFILE MATCH (n:Person)
WHERE n.age > 30
RETURN n.name, n.age;

This executes the query normally but returns additional metadata about execution characteristics, including:

Total execution time
Rows examined vs. rows returned
Index usage statistics
Memory allocation patterns
Join and traversal metrics

Interpreting Profile Output

Profile output includes multiple sections, each providing specific insights into query execution.

The execution summary shows overall timing:

Execution Time: 45.2ms
Rows Examined: 15,420
Rows Returned: 3,847
Index Scans: 1 (Person.age)
Full Scans: 0

This immediately reveals efficiency: the query scanned 15,420 rows to return 3,847, indicating reasonable selectivity. The index scan on Person.age avoided a full table scan.

Detailed Execution Steps

More detailed profiling breaks down execution into individual operations:

1. Index Scan (Person.age > 30)
   Time: 12.3ms
   Rows: 15,420

2. Property Access (n.name, n.age)
   Time: 18.7ms
   Rows: 15,420

3. Result Materialization
   Time: 14.2ms
   Rows: 3,847

This granular breakdown identifies the property access phase as the primary time consumer, suggesting potential optimization through projection reduction or property indexing.

Profiling Graph Traversals

Graph traversals present unique profiling challenges due to their recursive and relationship-oriented nature.

Simple Traversal Profiling

A basic friend-of-friend query demonstrates traversal profiling:

PROFILE MATCH (p:Person {name: 'Alice'})-[:KNOWS]->(friend)-[:KNOWS]->(fof)
RETURN DISTINCT fof.name;

Profile output reveals traversal efficiency:

Execution Time: 156.8ms
Traversal Depth: 2
Starting Nodes: 1 (index lookup)
Intermediate Nodes: 47 (first hop)
Final Nodes: 892 (second hop)
Distinct Results: 623

The expanding traversal from 1 to 47 to 892 nodes shows typical graph “fan-out” behavior. The DISTINCT operation reduced 892 nodes to 623 unique results, indicating some overlap in friend networks.

Variable-Length Path Profiling

Variable-length paths can have unpredictable performance:

PROFILE MATCH (p:Person {name: 'Alice'})-[:KNOWS*1..3]->(connected)
RETURN connected.name, length(path) AS distance;

Profiling variable-length paths reveals how many paths were explored:

Execution Time: 2,341ms
Paths Explored: 45,672
Paths Matched: 1,124
Average Path Length: 1.8
Max Path Length: 3
Traversal Strategy: Breadth-First

The large difference between explored (45,672) and matched (1,124) paths indicates significant pruning. If this ratio is too high, constraining the search with additional filters can improve performance.

Traversal Strategy Analysis

Geode’s optimizer chooses traversal strategies based on query patterns. Profiling reveals which strategy was selected:

Traversal Strategy: Depth-First Search
Reasoning: Low branching factor detected
Nodes Visited: 234
Backtracking Events: 12

Understanding the chosen strategy helps evaluate whether the optimizer made optimal decisions for the actual data distribution.

Profiling Aggregations

Aggregation queries require profiling to understand grouping and computation costs.

Group By Performance

Profiling a grouped aggregation:

PROFILE MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, count(p) AS employees, avg(p.salary) AS avg_salary
ORDER BY employees DESC;

Profile output breaks down aggregation costs:

Execution Time: 234.5ms
Traversal Time: 89.2ms (Person -> Company)
Grouping Time: 78.3ms (12 groups)
Aggregation Time: 45.6ms (count, avg)
Sorting Time: 21.4ms
Rows Processed: 8,456
Groups Created: 12

The grouping operation consumed the most time after traversal. With only 12 groups from 8,456 rows, the grouping is highly selective, which is efficient.

Complex Aggregation Analysis

Nested aggregations or aggregations over traversals can be expensive:

PROFILE MATCH (p:Person)-[:KNOWS]->(friend)-[:LIKES]->(product:Product)
RETURN p.name,
       count(DISTINCT friend) AS friend_count,
       collect(DISTINCT product.name) AS liked_products;

Profile output reveals collection costs:

Execution Time: 1,567ms
Distinct Operations: 2 (friend, product)
Collections Built: 847 (one per person)
Collection Avg Size: 12.3 products
Memory Allocated: 23.4MB

The memory allocation metric becomes critical for large result sets, indicating potential memory pressure.

Index Usage Profiling

Indexes dramatically affect performance, making index usage profiling essential.

Index Scan Detection

Profiling reveals whether indexes were used:

PROFILE MATCH (p:Person)
WHERE p.email = 'alice@example.com'
RETURN p;

Ideal profile output shows index usage:

Execution Time: 2.3ms
Index Used: Person.email (unique)
Index Scan Time: 1.8ms
Rows Examined: 1
Rows Returned: 1

Without an index, the profile would show:

Execution Time: 145.7ms
Full Table Scan: Person
Rows Examined: 15,420
Rows Returned: 1

The dramatic difference (2.3ms vs 145.7ms) demonstrates index value.

Composite Index Analysis

Composite indexes require specific query patterns for optimal use:

PROFILE MATCH (p:Person)
WHERE p.city = 'San Francisco' AND p.age > 30
RETURN p;

Profile output indicates index utilization:

Index Used: Person(city, age) - Full
Reasoning: Both predicates match index
Rows Examined: 847
Rows Returned: 847

If only part of the composite index is used:

Index Used: Person(city, age) - Partial (city only)
Reasoning: age predicate not in index-compatible form
Rows Examined: 2,341
Rows Returned: 847

This indicates optimization opportunity by restructuring the query or index.

Profiling Join Operations

Graph joins, particularly between multiple patterns, require profiling to optimize join strategies.

Pattern Join Profiling

Multiple MATCH patterns create implicit joins:

PROFILE MATCH (p:Person)-[:WORKS_AT]->(c:Company)
MATCH (p)-[:LIVES_IN]->(city:City)
WHERE c.industry = 'Technology'
  AND city.name = 'San Francisco'
RETURN p.name;

Profile output shows join execution:

Execution Time: 456.8ms

Pattern 1: Person-WORKS_AT->Company
  Index: Company.industry
  Rows: 3,421 persons

Pattern 2: Person-LIVES_IN->City
  Index: City.name
  Rows: 15,234 persons

Join Strategy: Hash Join on Person.id
  Left Input: 3,421
  Right Input: 15,234
  Join Output: 892
  Join Time: 234.5ms

The join strategy and timing help evaluate whether query restructuring could improve performance.

Memory Profiling

Understanding memory consumption prevents out-of-memory errors and identifies optimization opportunities.

Materialization Costs

Some operations require materializing intermediate results:

PROFILE MATCH (p:Person)-[:KNOWS*2..4]->(connected)
RETURN DISTINCT connected.name
ORDER BY connected.name;

Memory profiling reveals:

Execution Time: 3,421ms
Peak Memory: 156.7MB
Memory Breakdown:
  - Path Exploration: 89.2MB
  - Distinct Set: 45.3MB
  - Sort Buffer: 22.2MB

Large memory consumption may indicate the need for query restructuring or result streaming.

Collection Memory Usage

COLLECT operations can consume significant memory:

PROFILE MATCH (p:Person)-[:PURCHASED]->(product:Product)
RETURN p.name, collect(product) AS purchases;

Memory profiling shows collection sizes:

Collections Created: 8,234
Average Collection Size: 15.6 items
Peak Memory: 67.8MB

If individual collections become very large, consider pagination or limiting collection size.

Comparative Profiling

Comparing alternative query formulations identifies the most efficient approach.

Query Variant Comparison

Two queries achieving the same result can have different performance:

Version 1:

PROFILE MATCH (p:Person)-[:KNOWS]->(friend)
WHERE friend.city = 'Boston'
RETURN p.name, collect(friend.name) AS boston_friends;

Version 2:

PROFILE MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE friend.city = 'Boston'
RETURN p.name, [f IN collect(friend) | f.name] AS boston_friends;

Comparing profile outputs reveals performance differences based on filtering location, collection strategy, and property access patterns.

Profiling Best Practices

Effective profiling follows systematic approaches.

Establish Baselines

Before optimization, profile current performance to establish baselines:

-- Profile current query
PROFILE MATCH (p:Person)-[:FRIEND*1..3]->(connected)
WHERE p.city = 'Seattle'
RETURN count(DISTINCT connected);

Record baseline metrics:

Execution time: 2,341ms
Rows examined: 145,623
Memory used: 78.2MB

Isolate Variables

When testing optimizations, change one variable at a time:

Profile original query
Add index and profile again
Compare metrics to isolate index impact
Try alternative formulation and profile
Compare all variants

Profile Representative Data

Always profile against production-like data volumes and distributions. Test data that is too small or too uniform can give misleading profiling results.

Profile Under Load

Query performance can degrade under concurrent load. Profile in realistic multi-user scenarios:

# Run concurrent profiling sessions
for i in {1..10}; do
  ./geode shell < profile_queries.gql &
done

Profiling Tools and Techniques

Beyond the PROFILE command, additional techniques provide profiling insights.

Server-Side Logging

Configure Geode to log slow queries automatically:

geode serve \
  --slow-query-log \
  --slow-query-threshold 1000  # Log queries >1s

This captures problematic queries in production without manual profiling.

Client-Side Profiling

Client libraries can measure round-trip time:

start := time.Now()
result, err := conn.Execute(ctx, query)
elapsed := time.Since(start)
log.Printf("Query took %v", elapsed)

This includes network latency and client processing, complementing server-side profiling.

Automated Profiling

Integrate profiling into CI/CD pipelines to detect performance regressions:

# GitLab CI example
test_performance:
  script:
    - ./run_profile_suite.sh
    - ./compare_to_baseline.sh
  artifacts:
    reports:
      performance: profile_results.json

Profiling Common Patterns

Certain query patterns have predictable profiling characteristics.

Hub Node Queries

Queries involving high-degree nodes (hubs) often show performance issues:

PROFILE MATCH (hub:Person {name: 'Influencer'})-[:FOLLOWS]->(follower)
RETURN count(follower);

Profile output reveals hub impact:

Execution Time: 5,234ms
Starting Node Degree: 1,245,678
Followers Traversed: 1,245,678

High-degree nodes require special handling, possibly through relationship sampling or pagination.

Path Finding Queries

Shortest path queries have specific profiling needs:

PROFILE MATCH path = shortestPath((a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Bob'}))
RETURN length(path);

Profile output shows search efficiency:

Execution Time: 234ms
Algorithm: Bidirectional Dijkstra
Nodes Explored: 1,247
Path Length: 4

The nodes explored metric indicates search efficiency; high exploration suggests sparse connectivity or long paths.

Optimization Workflow

Profiling drives a systematic optimization workflow.

Step 1: Profile Current State

PROFILE [your query]

Document baseline metrics comprehensively.

Step 2: Identify Bottlenecks

Analyze profile output to find:

Longest-running operations
Highest row examination counts
Memory pressure points
Missing index usage

Step 3: Hypothesize Improvements

Based on bottlenecks, form specific hypotheses:

“Adding an index on Person.age would reduce scan time”
“Limiting traversal depth would reduce explored paths”
“Filtering earlier would reduce intermediate results”

Step 4: Test Hypotheses

Implement one change and profile again:

-- Create index
CREATE INDEX ON Person(age);

-- Profile again
PROFILE MATCH (n:Person) WHERE n.age > 30 RETURN n.name, n.age;

Step 5: Compare Results

Compare new profile to baseline to quantify improvement.

Step 6: Iterate

Continue identifying bottlenecks and testing improvements until performance meets requirements.

Advanced Profiling Techniques

Expert-level profiling goes beyond basic PROFILE commands.

Profiling Sub-Queries

Complex queries with sub-queries require profiling at multiple levels:

PROFILE MATCH (p:Person)
WHERE p.age > (
  MATCH (p2:Person)
  RETURN avg(p2.age)
)
RETURN p.name;

Profile output breaks down main query and sub-query execution separately.

Profiling WITH Clauses

WITH clauses create query pipelines; profile each stage:

PROFILE
MATCH (p:Person)-[:PURCHASED]->(prod:Product)
WITH p, count(prod) AS purchase_count
WHERE purchase_count > 10
MATCH (p)-[:LIVES_IN]->(city:City)
RETURN city.name, count(p) AS high_value_customers;

Profile output shows execution time and row counts at each WITH boundary.

Performance profiling connects to several related areas:

Performance Tuning - Applying profiling insights to optimize configuration
Query Optimization - Restructuring queries based on profiling data
Index Management - Creating indexes informed by profiling analysis
Monitoring - Continuous profiling in production environments
Troubleshooting - Using profiling to diagnose performance issues

Resources

Additional profiling resources:

Geode PROFILE command documentation with complete syntax reference
Query optimization guides with profiling-driven examples
Performance benchmarking methodologies for graph databases
ISO/IEC 39075:2024 GQL standard profiling facilities

Performance profiling transforms database optimization from guesswork into an evidence-based engineering discipline. By systematically measuring query execution characteristics, developers can make informed decisions that demonstrably improve application performance and user experience.

Understanding Performance Profiling Share link

Why Profile Graph Queries Share link

Profiling vs. Explaining Share link

The PROFILE Command Share link

Basic Profiling Share link

Interpreting Profile Output Share link

Detailed Execution Steps Share link

Profiling Graph Traversals Share link

Simple Traversal Profiling Share link

Variable-Length Path Profiling Share link

Traversal Strategy Analysis Share link

Profiling Aggregations Share link

Group By Performance Share link

Complex Aggregation Analysis Share link

Index Usage Profiling Share link

Index Scan Detection Share link

Composite Index Analysis Share link

Profiling Join Operations Share link

Pattern Join Profiling Share link

Memory Profiling Share link

Materialization Costs Share link

Collection Memory Usage Share link

Comparative Profiling Share link

Query Variant Comparison Share link

Profiling Best Practices Share link

Establish Baselines Share link

Isolate Variables Share link

Profile Representative Data Share link

Profile Under Load Share link

Profiling Tools and Techniques Share link

Server-Side Logging Share link

Client-Side Profiling Share link

Automated Profiling Share link

Profiling Common Patterns Share link

Hub Node Queries Share link

Path Finding Queries Share link

Optimization Workflow Share link

Step 1: Profile Current State Share link

Step 2: Identify Bottlenecks Share link

Step 3: Hypothesize Improvements Share link

Step 4: Test Hypotheses Share link

Step 5: Compare Results Share link

Step 6: Iterate Share link

Advanced Profiling Techniques Share link

Profiling Sub-Queries Share link

Profiling WITH Clauses Share link

Related Topics Share link

Resources Share link

Related Articles

Query Profiling Guide

Query Profiling with EXPLAIN and PROFILE