Performance and Scalability

Learn how Geode achieves high performance through optimized storage, query planning, indexing, and distributed coordination.

Performance Pillars

1. Storage: Memory-Mapped I/O

From SERVER_FEATURES.md:

Memory-mapped I/O provides efficient storage access:

Direct memory access: Pages mapped directly into process address space
OS page cache: Leverage kernel page cache for hot data
Zero-copy reads: No buffer copying between kernel and user space
Write coalescing: Batch writes to reduce fsync() calls

2. Query: Cost-Based Optimizer (CBO)

CBO selects optimal execution plans based on:

Statistics: Row counts, cardinality, value distributions
Index availability: Which indexes exist and selectivity
Join cost estimation: Relationship fanout estimates

Example:

EXPLAIN
MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.age > 25
RETURN p.name, friend.name;

Plan:

Project [p.name, friend.name]
  Expand [(p)-[:KNOWS]->(friend)]
    Filter [p.age > 25]
      IndexScan [Person.age] (uses: person_age_idx)

Without index:

Project [p.name, friend.name]
  Expand [(p)-[:KNOWS]->(friend)]
    Filter [p.age > 25]
      SeqScan [Person]  -- Slow!

3. Indexing: IndexOptimizer

From SERVER_FEATURES.md:

IndexOptimizer selects best index for each predicate:

Predicate analysis: Identify filterable predicates
Index applicability: Match predicates to available indexes
Selectivity estimation: Estimate fraction of rows matched
Cost comparison: Compare index scan vs sequential scan

Specialized index implementations:

Index Type	Implementation	Complexity
B-tree	Optimized B+ tree	O(log N)
Hash	Chained hash table	O(1)
Full-text	Inverted index + BM25	Varies by query
HNSW (vector)	SIMD-accelerated ANN	O(log N)
R-tree (spatial)	Hilbert curve ordering	O(log N)
Patricia trie	IP prefix matching	O(k) where k = prefix length

4. HNSW SIMD Acceleration

From SERVER_FEATURES.md:

Vector similarity with SIMD:

AVX2/AVX-512 instructions: Vectorized distance calculations
Batch processing: Compute multiple distances in parallel
Multi-core scaling: Parallel query execution

5. BM25 Full-Text Optimization

From BM25_INDEX_OPTIMIZER_INTEGRATION.md:

BM25 ranking integrated with optimizer:

Index-aware planning: Use full-text index when ORDER BY bm25_score()
Early termination: Stop after top-K results
Parallel scoring: Multi-threaded relevance computation

Distributed Scaling Model

From DISTRIBUTED_QUERY_COORDINATION.md:

Architecture

┌─────────────────┐
│  Client Query   │
└────────┬────────┘
         │
┌────────▼────────┐
│  Coordinator    │  -- Query planning and result merging
└────────┬────────┘
         │
    ┌────┴────┬────────┬────────┐
    │         │        │        │
┌───▼───┐ ┌──▼───┐ ┌──▼───┐ ┌──▼───┐
│Shard 1│ │Shard2│ │Shard3│ │Shard4│  -- Data partitions
└───────┘ └──────┘ └──────┘ └──────┘

Distributed Query Execution

Client sends GQL query to coordinator
Coordinator parses query and determines which shards to query
Parallel execution on all relevant shards
Result merging with deterministic ordering
Return merged result to client

Example:

-- Federated query
EXPLAIN FEDERATION
MATCH (p:Person {city: "NYC"})-[:KNOWS]->(friend)
RETURN p.name, friend.name;

Execution plan:

Merge [ORDER BY p.name]
  │
  ├─ Query Shard 1: MATCH (p:Person {city: "NYC"})-[:KNOWS]->(f) RETURN p.name, f.name
  ├─ Query Shard 2: MATCH (p:Person {city: "NYC"})-[:KNOWS]->(f) RETURN p.name, f.name
  ├─ Query Shard 3: MATCH (p:Person {city: "NYC"})-[:KNOWS]->(f) RETURN p.name, f.name
  └─ Query Shard 4: MATCH (p:Person {city: "NYC"})-[:KNOWS]->(f) RETURN p.name, f.name

Optimization Techniques

Predicate pushdown:

-- Coordinator pushes filter to shards (reduce network transfer)
MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.city = "NYC"
RETURN p.name, friend.name;

-- Each shard executes:
MATCH (p:Person {city: "NYC"})-[:KNOWS]->(friend)
RETURN p.name, friend.name;
-- Not: MATCH (p:Person)-[:KNOWS]->(friend) then filter

Parallel execution:

All shard queries run concurrently
Barrier synchronization: Wait for all shards to complete
Streaming results: Return rows as they arrive (for large result sets)

Deterministic ordering:

Shards return results in local order
Coordinator merges with global ordering (e.g., ORDER BY p.name)
Consistent pagination across queries

Transport: QUIC+TLS

QUIC provides modern, efficient transport:

Multiplexing: Multiple queries over single connection (no head-of-line blocking)
0-RTT handshake: Resume connections instantly
Built-in encryption: TLS 1.3 integrated
UDP-based: Lower latency than TCP

Performance vs TCP:

Connection setup: 0-RTT (vs 1-3 RTT for TCP+TLS)
Head-of-line blocking: None (vs significant for TCP)
Packet loss recovery: Independent streams (vs entire connection blocked)

No TCP fallback: Geode requires QUIC (ensure UDP port open).

Scaling Guidelines

Single-Node Sizing

Recommended single-node sizing:

Memory: 64GB+ (for large datasets with in-memory cache)
CPU: 8+ cores (for parallel query execution)
Storage: NVMe SSD (for memory-mapped I/O)

Distributed Cluster

Shards: Up to 32 shards with automatic coordination
Vector search: HNSW with SIMD-optimized distance calculations

Shard count guidance:

Small datasets: Single node
Medium datasets: 2-4 shards
Large datasets: 8-16 shards

How to Benchmark

Query Benchmarking

# Use PROFILE to measure actual performance
./geode query "PROFILE MATCH (p:Person)-[:KNOWS]->(friend) RETURN p.name, friend.name"

# Extract metrics
# - Execution time
# - Rows scanned
# - Index usage
# - Cache hit ratio

Load Testing

# Generate realistic workload
# 1000 concurrent users, 10,000 queries each
./geode-loadtest \
  --users 1000 \
  --queries 10000 \
  --query-file workload.gql

# Monitor metrics
watch -n 1 'curl -s http://localhost:8080/metrics | grep geode_query_duration'

Scaling Tests

# Test distributed query performance
# Run same query on 1, 2, 4, 8 shards
for shards in 1 2 4 8; do
  echo "Testing with $shards shards"
  ./geode-distributed-test --shards $shards --query-file workload.gql
done

# Compare throughput scaling
# Expected: near-linear up to 4-8 shards

Performance Tuning Checklist

1. Index All Predicates

-- Before: Sequential scan
MATCH (p:Person) WHERE p.email = "[email protected]" RETURN p;

-- After: Hash index
CREATE INDEX person_email_idx ON Person(email) USING hash;
MATCH (p:Person) WHERE p.email = "[email protected]" RETURN p;
-- Uses index for fast lookup

2. Use Appropriate Index Type

-- Range queries: B-tree
CREATE INDEX person_age_idx ON Person(age) USING btree;

-- Equality: Hash
CREATE INDEX person_id_idx ON Person(id) USING hash;

-- Text search: Full-text
CREATE INDEX doc_content_idx ON Document(content) USING fulltext;

-- Vector similarity: HNSW
CREATE INDEX item_embedding_idx ON Item(embedding) USING vector;

3. Tune Page Cache

# geode.yaml
storage:
  page_cache_size: '4GB'  -- Increase for larger datasets

Guidance:

Small datasets (<1GB): 1GB cache
Medium datasets (1-10GB): 2-4GB cache
Large datasets (>10GB): 8-16GB cache

4. Monitor Cache Hit Ratio

# Check cache efficiency
curl http://localhost:8080/metrics | grep geode_storage_cache_hit_ratio

# Target: >80% for hot queries
# If low: Increase page_cache_size

5. Limit Result Sets

-- Bad: Return all results
MATCH (p:Person) RETURN p;

-- Good: Paginate
MATCH (p:Person)
RETURN p
ORDER BY p.name
LIMIT 100;

Next Steps

Indexing and Optimization - Index creation and usage
Deployment Guide - Distributed cluster setup
Monitoring and Telemetry - Performance metrics
GQL Guide - Query optimization patterns