Performance Tuning | Geode Database

Performance Tuning Guide

This guide covers performance optimization strategies, configuration tuning, and best practices for achieving optimal performance with Geode.

Performance Overview

Architecture

Query Processing:

Cost-based query optimizer with statistics
Six specialized index types for different workloads
SIMD-accelerated vector distance calculations

Storage:

Memory-mapped I/O for efficient storage access
WAL-based durability with configurable tuning
Page-level caching with LRU eviction

Scalability:

Connection pooling for concurrent clients
Distributed deployment with up to 32 shards

Query Optimization

Use Indexes Effectively

Create Appropriate Indexes:

-- B-tree for range queries
CREATE INDEX person_age_idx ON Person(age) USING btree;

-- Hash for exact matches
CREATE INDEX person_email_idx ON Person(email) USING hash;

-- Spatial for geographic queries
CREATE INDEX location_coords_idx ON Location(coordinates) USING spatial;

-- Vector for similarity search
CREATE INDEX document_embedding_idx ON Document(embedding) USING vector;

-- Full-text for text search
CREATE INDEX article_content_idx ON Article(content) USING fulltext;

Index Selection Guidelines:

Query Pattern	Index Type	Example
Exact match	Hash	`WHERE p.email = '[email protected]'`
Range query	B-tree	`WHERE p.age > 30 AND p.age < 50`
Sorting	B-tree	`ORDER BY p.name`
Geographic	R-tree	`WHERE distanceKm(loc, point) < 10`
Vector similarity	HNSW	`WHERE cosineSimilarity(v1, v2) > 0.8`
Text search	Full-text	`WHERE content CONTAINS 'graph database'`
IP prefix	Patricia Trie	`WHERE ip_contains(subnet, ip)`

Analyze Query Plans

Use EXPLAIN to understand query execution:

-- View query plan
EXPLAIN MATCH (p:Person)-[:KNOWS]->(f:Person)
WHERE p.age > 30
RETURN f.name
ORDER BY f.name
LIMIT 10;

-- Analyze actual execution
EXPLAIN ANALYZE MATCH (p:Person)-[:KNOWS]->(f:Person)
WHERE p.age > 30
RETURN f.name;

Key Metrics to Review:

Index usage (are indexes being used?)
Estimated vs actual row counts
Join methods
Sort operations
Filter selectivity

Optimize Pattern Matching

Anchor Patterns Early:

-- GOOD: Anchor with indexed property
MATCH (p:Person {email: 'user@example.com'})-[:KNOWS]->(f)
RETURN f

-- BAD: Cartesian product
MATCH (p:Person)-[:KNOWS]->(f)
WHERE p.email = 'user@example.com'
RETURN f

Limit Path Expansion:

-- GOOD: Bounded expansion
MATCH (a)-[:KNOWS*1..3]->(b)
RETURN b

-- BAD: Unbounded expansion (can explode)
MATCH (a)-[:KNOWS*]->(b)
RETURN b

Use LIMIT with ORDER BY:

-- Required for deterministic results
MATCH (n:Person)
RETURN n.name
ORDER BY n.name  -- ORDER BY required with LIMIT
LIMIT 100

Aggregation Optimization

Push Filters Before Aggregation:

-- GOOD: Filter first
MATCH (p:Person)
WHERE p.age > 30
RETURN p.city, count(*) AS population
GROUP BY p.city

-- BAD: Filter after aggregation
MATCH (p:Person)
RETURN p.city, count(*) AS population
GROUP BY p.city
HAVING max(p.age) > 30

Use Covering Indexes:

-- Create composite index
CREATE INDEX person_city_age_idx ON Person(city, age);

-- Query uses index only (no table access)
MATCH (p:Person)
RETURN p.city, count(*)
WHERE p.age > 30
GROUP BY p.city

Configuration Tuning

Memory Settings

Page Cache Size:

# geode.yaml
storage:
  page_cache_size: '4GB'  # Increase for read-heavy workloads
  page_size: 8192         # 8KB pages (default)

Environment variable:

export GEODE_PAGE_CACHE_SIZE=4294967296  # 4GB in bytes

Recommendations:

Read-heavy: 50-75% of RAM
Write-heavy: 25-40% of RAM
Mixed workload: 40-50% of RAM

WAL Configuration

storage:
  wal_dir: '/fast-ssd/geode/wal'
  wal_buffer_size: '16MB'
  wal_sync_interval: 100ms  # Balance durability vs performance

Tuning Tips:

Place WAL on separate fast SSD
Increase buffer size for write-heavy workloads
Adjust sync interval (lower = more durable, higher = faster)

Connection Pooling

server:
  max_connections: 10000
  connection_timeout: '30s'
  idle_timeout: '5m'

Client-Side Pooling (Go example):

db.SetMaxOpenConns(100)   // Maximum connections
db.SetMaxIdleConns(25)    // Idle connection pool
db.SetConnMaxLifetime(5 * time.Minute)
db.SetConnMaxIdleTime(1 * time.Minute)

Query Optimizer Settings

optimizer:
  small_n_threshold: 100     # When to use brute-force
  trace: false               # Enable for debugging
  cost_model:
    seq_scan_cost: 1.0
    index_scan_cost: 0.1
    join_cost: 2.0

Enable optimizer tracing for diagnostics:

export GEODE_TRACE_OPTIMIZER=1

Index Optimization

HNSW Vector Index Tuning

-- Create with custom parameters
CREATE INDEX embedding_idx ON Document(embedding)
USING vector
WITH (
  M = 16,                    -- Max connections per node
  ef_construction = 200,     -- Search width during construction
  ef_search = 100           -- Search width at query time
);

Parameter Guidelines:

M: Higher = better recall, more memory (16-32 typical)
ef_construction: Higher = better index quality (200-400 typical)
ef_search: Higher = better recall, slower queries (50-200 typical)

Performance Trade-offs:

M=16, ef_search=50: Fast, ~85% recall
M=16, ef_search=100: Medium, ~92% recall
M=32, ef_search=200: Slow, ~97% recall

Spatial Index Optimization

-- R-tree with custom node capacity
CREATE INDEX location_coords_idx ON Location(coordinates)
USING spatial
WITH (
  M = 8,                     -- Max entries per node (default)
  min_fill = 0.4            -- Minimum fill factor
);

Best Practices:

Higher M = fewer levels, more entries per node
Place on SSD for random access patterns
Regularly rebuild after bulk inserts

Full-Text Index Configuration

CREATE INDEX article_content_idx ON Article(content)
USING fulltext
WITH (
  language = 'english',
  stemmer = 'porter',
  stop_words = ['the', 'a', 'an']
);

Hardware Recommendations

Storage

SSD vs HDD:

WAL: Always SSD (write-heavy, sequential)
Data files: SSD for <1TB, SATA SSD for 1-10TB
Backup storage: HDD acceptable for cold storage

RAID Configuration:

RAID 10: Best performance (recommended)
RAID 5/6: Acceptable for read-heavy workloads
RAID 0: Maximum performance (no redundancy)

Memory

Minimum Requirements:

Development: 4GB
Small production: 16GB
Medium production: 64GB
Large production: 256GB+

Allocation:

OS: 2-4GB
Page cache: 50-70% of remaining
Query execution: 20-30% of remaining
Buffer pools: 10-20% of remaining

CPU

Core Count:

4 cores: Minimum for development
8-16 cores: Small to medium production
32+ cores: Large production with high concurrency

Features:

AVX2/AVX-512: 2x+ speedup for vector operations
AES-NI: Hardware-accelerated encryption
NUMA: Configure for local memory access

Network

Bandwidth:

1 Gbps: Minimum
10 Gbps: Recommended for distributed setups
25/40 Gbps: High-throughput scenarios

Latency:

<1ms: Same datacenter
<10ms: Cross-region acceptable
50ms: Consider caching layer

Monitoring and Profiling

Metrics to Monitor

# Query performance
geode_query_duration_ms{percentile="50|95|99"}
geode_query_total{status="success|error"}

# Storage metrics
geode_storage_cache_hit_ratio
geode_storage_pages_total
geode_storage_wal_segments_total

# Index metrics
geode_index_lookups_total
geode_index_size_bytes

Profiling Queries

-- Profile query execution
PROFILE MATCH (n:Person)-[:KNOWS*1..3]->(f)
WHERE n.age > 30
RETURN f.name
LIMIT 10;

Profile Output:

Execution time per operator
Rows processed per operator
Memory usage
Index hits/misses

Enable Telemetry

monitoring:
  enabled: true
  endpoint: 'http://localhost:9090/metrics'
  interval: '10s'

Environment variable:

export GEODE_TELEMETRY_PAGING=1

Best Practices

Query Design

Always use ORDER BY with LIMIT for deterministic results
Create indexes before bulk loads for optimal index structure
Use prepared statements for frequently executed queries
Batch operations in transactions for better throughput
Limit path expansion depth to prevent combinatorial explosion

Data Modeling

Denormalize when necessary for read performance
Use appropriate property types (VectorF32 vs VectorI32)
Avoid wide nodes (>100 properties)
Index frequently queried properties
Consider materialized views for expensive aggregations

Transaction Management

Keep transactions short to reduce lock contention
Use appropriate isolation level (Snapshot for most cases)
Batch writes for bulk operations
Use savepoints for complex multi-step operations
Handle deadlocks with retry logic

Federation

Distribute data by access patterns (co-locate related data)
Monitor shard balance for even distribution
Use connection pooling for cross-shard queries
Test failover scenarios regularly
Set appropriate timeouts to prevent hanging queries

Troubleshooting Performance Issues

Slow Queries

Checklist:

✓ Indexes exist for filtered properties?
✓ Using EXPLAIN to verify index usage?
✓ Path expansion bounded?
✓ ORDER BY used with LIMIT?
✓ Appropriate isolation level?

High Memory Usage

Investigation:

# Check cache hit ratio
curl http://localhost:9090/metrics | grep cache_hit_ratio

# Review page cache size
grep page_cache_size /etc/geode/geode.yaml

Solutions:

Reduce page cache size
Add swap space (temporary)
Scale horizontally

Write Performance Issues

Causes:

WAL on slow storage
Too many indexes
Large transactions
Disk I/O saturation

Solutions:

Move WAL to fast SSD
Batch writes in transactions
Increase wal_buffer_size
Monitor disk I/O metrics

Next Steps

Configuration Reference - Complete config options
Observability Guide - Set up monitoring
Distributed Architecture - Scale horizontally

Performance Tuning Guide Share link

Performance Overview Share link

Architecture Share link

Query Optimization Share link

Use Indexes Effectively Share link

Analyze Query Plans Share link

Optimize Pattern Matching Share link

Aggregation Optimization Share link

Configuration Tuning Share link

Memory Settings Share link

WAL Configuration Share link

Connection Pooling Share link

Query Optimizer Settings Share link

Index Optimization Share link

HNSW Vector Index Tuning Share link

Spatial Index Optimization Share link

Full-Text Index Configuration Share link

Hardware Recommendations Share link

Storage Share link

Memory Share link

CPU Share link

Network Share link

Monitoring and Profiling Share link

Metrics to Monitor Share link

Profiling Queries Share link

Enable Telemetry Share link

Best Practices Share link

Query Design Share link

Data Modeling Share link

Transaction Management Share link

Federation Share link

Troubleshooting Performance Issues Share link

Slow Queries Share link

High Memory Usage Share link

Write Performance Issues Share link

Next Steps Share link

Pages

Query Optimization

Benchmarking

Performance Tuning Guide

Performance Overview

Architecture

Query Optimization

Use Indexes Effectively

Analyze Query Plans

Optimize Pattern Matching

Aggregation Optimization

Configuration Tuning

Memory Settings

WAL Configuration

Connection Pooling

Query Optimizer Settings

Index Optimization

HNSW Vector Index Tuning

Spatial Index Optimization

Full-Text Index Configuration

Hardware Recommendations

Storage

Memory

CPU

Network

Monitoring and Profiling

Metrics to Monitor

Profiling Queries

Enable Telemetry

Best Practices

Query Design

Data Modeling

Transaction Management

Federation

Troubleshooting Performance Issues

Slow Queries

High Memory Usage

Write Performance Issues

Next Steps