Best Practices & Design Patterns
Building production-ready graph applications with Geode requires understanding not just the technology, but proven patterns and practices that ensure performance, maintainability, and scalability. This comprehensive guide covers schema design, query optimization, security, and operational best practices for Geode deployments.
Schema Design Best Practices
1. Model for Your Queries
Design your graph schema based on how you’ll query it, not just the domain structure.
Good Practice:
-- If you frequently query "products a user might like"
-- Model supports this pattern:
MATCH (u:User {id: $user_id})-[:LIKES]->(cat:Category)
MATCH (p:Product)-[:IN_CATEGORY]->(cat)
WHERE NOT EXISTS { MATCH (u)-[:PURCHASED]->(p) }
RETURN p.name, count(cat) AS relevance
ORDER BY relevance DESC;
Poor Practice:
-- Model that doesn't support common queries efficiently
-- Requires complex traversals or multiple queries
2. Choose Appropriate Granularity
When to Use Nodes:
- Entity referenced by multiple others
- Has its own relationships
- Queried independently
- Complex structure
When to Use Properties:
- Simple attribute values
- Unique to parent entity
- Not queried independently
- No relationships
-- GOOD: Shared address as node
INSERT (addr:Address {street: '123 Main St', city: 'Austin'});
INSERT (alice:Person)-[:LIVES_AT]->(addr);
INSERT (bob:Person)-[:LIVES_AT]->(addr);
-- GOOD: Unique email as property
INSERT (charlie:Person {email: 'charlie@example.com'});
-- BAD: Address as string property (can't be shared/queried)
INSERT (dave:Person {address: '456 Oak St, Dallas, TX'});
3. Use Meaningful Relationship Types
Choose specific, action-oriented relationship types:
-- GOOD: Clear, specific relationships
(person:Person)-[:WORKS_AT]->(company:Company)
(user:User)-[:PURCHASED]->(product:Product)
(employee:Employee)-[:REPORTS_TO]->(manager:Manager)
-- BAD: Generic, unclear relationships
(entity1)-[:RELATED_TO]->(entity2)
(node1)-[:CONNECTS]->(node2)
4. Avoid Super Nodes
Nodes with millions of relationships degrade performance.
Problem:
-- Country node with millions of users
INSERT (user)-[:LIVES_IN]->(usa:Country); -- Repeated millions of times
Solutions:
-- Solution 1: Use property + index
INSERT (user:User {country: 'USA'});
CREATE INDEX user_country ON User(country);
-- Solution 2: Add intermediate nodes
INSERT (user)-[:LIVES_IN]->(state:State);
INSERT (state)-[:IN_COUNTRY]->(country:Country);
5. Normalize Appropriately
Unlike relational databases, some denormalization improves performance:
-- Acceptable denormalization for performance
INSERT (order:Order {
id: 'o123',
customer_id: 'c456',
customer_name: 'Alice', -- Denormalized
customer_email: 'alice@example.com', -- Denormalized
total: 299.99
});
-- Also maintain relationship
INSERT (order)-[:PLACED_BY]->(customer:Customer {id: 'c456'});
Query Optimization
1. Use Parameterized Queries
Always use parameters for reusable, cacheable query plans:
-- GOOD: Parameterized
MATCH (p:Person {name: $person_name})
RETURN p.age, p.city;
-- BAD: Literal values (prevents plan caching)
MATCH (p:Person {name: 'Alice'})
RETURN p.age, p.city;
From client code:
# Python
result, _ = await client.query(
"MATCH (p:Person {name: $name}) RETURN p.age",
parameters={"name": user_input}
)
2. Profile Slow Queries
Use PROFILE to understand execution:
PROFILE
MATCH (p:Person)-[:KNOWS*2..4]->(friend)
WHERE p.city = 'San Francisco'
RETURN friend.name;
Analyze:
- Execution plan
- Operator costs
- Index usage
- Estimated vs actual rows
3. Create Strategic Indexes
Index properties used for lookups and filtering:
-- Single-column indexes
CREATE INDEX user_email ON User(email);
CREATE INDEX product_sku ON Product(sku);
-- Composite indexes for multi-property queries
CREATE INDEX product_category_price ON Product(category, price);
-- Full-text search
CREATE FULLTEXT INDEX product_search ON Product(name, description);
4. Limit Result Sets
Always use LIMIT for unbounded queries:
-- GOOD: Limited results
MATCH (p:Person)
RETURN p.name
ORDER BY p.created DESC
LIMIT 100;
-- BAD: Unbounded (could return millions)
MATCH (p:Person)
RETURN p.name;
5. Avoid Cartesian Products
Be careful with multiple MATCH clauses:
-- BAD: Creates cartesian product
MATCH (p:Person)
MATCH (c:Company)
RETURN p.name, c.name; -- Returns p_count * c_count rows
-- GOOD: Use relationships
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN p.name, c.name;
Indexing Strategies
1. Index High-Selectivity Properties
Index properties that narrow down results significantly:
-- High selectivity (good for indexing)
CREATE INDEX user_email ON User(email); -- Unique or near-unique
CREATE INDEX product_sku ON Product(sku);
-- Low selectivity (poor candidate)
-- CREATE INDEX user_active ON User(active); -- Only true/false
2. Composite Indexes for Multi-Property Queries
-- Create composite index
CREATE INDEX order_customer_date ON Order(customer_id, order_date);
-- Efficiently supports this query
MATCH (o:Order)
WHERE o.customer_id = $cust_id
AND o.order_date >= DATE '2024-01-01'
RETURN o;
3. Monitor Index Usage
-- Check index statistics
SHOW INDEXES;
-- Profile to verify index usage
PROFILE
MATCH (u:User {email: $email})
RETURN u.name;
Transaction Management
1. Keep Transactions Short
Minimize transaction duration to reduce lock contention:
-- GOOD: Short transaction
BEGIN TRANSACTION;
INSERT (order:Order {id: 'o123', total: 99.99});
MATCH (user:User {id: 'u456'})
INSERT (user)-[:PLACED]->(order);
COMMIT;
-- BAD: Long-running transaction
BEGIN TRANSACTION;
-- Complex data processing...
-- Multiple network calls...
-- User input...
COMMIT; -- Holds locks too long
2. Use Savepoints for Partial Rollback
BEGIN TRANSACTION;
INSERT (user:User {name: 'Alice'});
SAVEPOINT sp1;
-- Risky operation
INSERT (payment:Payment {amount: 999.99});
-- Rollback just this operation if needed
ROLLBACK TO SAVEPOINT sp1;
-- Continue with transaction
INSERT (user:User {name: 'Bob'});
COMMIT;
3. Handle Transaction Errors
# Python example
async def create_order(user_id, items):
async with client.connection() as tx:
await tx.begin()
try:
await tx.execute(
"INSERT (o:Order {id: $id, total: $total})",
{"id": order_id, "total": total}
)
await tx.execute(
"MATCH (u:User {id: $uid}) INSERT (u)-[:PLACED]->(o:Order {id: $oid})",
{"uid": user_id, "oid": order_id}
)
await tx.commit()
except Exception as e:
await tx.rollback()
raise
Security Best Practices
1. Use Row-Level Security (RLS)
Define security policies in the database:
-- Tenant isolation
CREATE POLICY tenant_isolation ON ALL
USING (tenant_id = current_tenant());
-- User data access
CREATE POLICY user_data_access ON User
USING (id = current_user_id() OR current_user_role = 'admin');
-- Confidential relationships
CREATE POLICY salary_access ON EARNS_SALARY
USING (current_user_role IN ['hr', 'admin']);
2. Never Trust Client Input
Always use parameterized queries:
# GOOD: Parameterized
user_input = request.get_param('name')
result, _ = await client.query(
"MATCH (p:Person {name: $name}) RETURN p",
{"name": user_input}
)
# BAD: String concatenation (injection risk!)
# query = f"MATCH (p:Person {{name: '{user_input}'}}) RETURN p"
3. Implement Audit Logging
Log all data modifications:
-- Geode automatically logs all mutations
-- Review audit logs regularly
-- Example: check recent mutations
CALL system.audit_log(CURRENT_TIMESTAMP - DURATION 'P7D');
4. Use TLS for All Connections
Geode enforces TLS 1.3 by default - don’t disable it:
# Client connection (TLS required)
geode-client connect --host geode.example.com:3141 --tls-verify
Performance Tuning
1. Batch Operations
Use UNWIND for bulk inserts:
-- Efficient batch insert
UNWIND $users AS user_data
INSERT (u:User {
id: user_data.id,
name: user_data.name,
email: user_data.email
});
From application:
users = [
{"id": "u1", "name": "Alice", "email": "[email protected]"},
{"id": "u2", "name": "Bob", "email": "[email protected]"},
# ... thousands more
]
await client.query(
"UNWIND $users AS u INSERT (user:User {id: u.id, name: u.name})",
{"users": users}
)
2. Use Connection Pooling
Configure appropriate pool sizes:
# Python
client = Client(
"localhost:3141",
pool_size=20, # Adjust based on workload
max_overflow=10
)
// Go
db, err := sql.Open("geode", "quic://localhost:3141?pool_size=20")
3. Monitor Query Performance
Track slow queries:
import logging
# Log slow queries
async def execute_with_timing(query, params):
start = time.time()
result, _ = await client.query(query, params)
duration = time.time() - start
if duration > 1.0: # Log queries over 1 second
logging.warning(f"Slow query ({duration:.2f}s): {query}")
return result
4. Cache Frequently Accessed Data
from functools import lru_cache
import asyncio
# Cache user lookups
user_cache = {}
async def get_user(user_id):
if user_id in user_cache:
return user_cache[user_id]
result, _ = await client.query(
"MATCH (u:User {id: $id}) RETURN u",
{"id": user_id}
)
user_cache[user_id] = result
return result
Application Integration
1. Use Prepared Statements
# Prepare once, execute many times
prepared = await client.prepare(
"MATCH (p:Person {id: $id}) RETURN p.name, p.email"
)
# Execute with different parameters
for user_id in user_ids:
result, _ = await prepared.execute({"id": user_id})
2. Handle Concurrent Access
import asyncio
# Process queries concurrently
async def fetch_user_data(user_ids):
tasks = [
client.query("MATCH (u:User {id: $id}) RETURN u", {"id": uid})
for uid in user_ids
]
results = await asyncio.gather(*tasks)
return results
3. Implement Retry Logic
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10)
)
async def query_with_retry(query, params):
return await client.query(query, params)
Production Deployment
1. Resource Planning
Minimum Resources:
- CPU: 4 cores
- RAM: 8GB
- Storage: SSD recommended
Scale Based on:
- Number of nodes/relationships
- Query complexity
- Concurrent connections
- Write throughput
2. Backup Strategy
# Daily backups
0 2 * * * /usr/local/bin/geode backup \
--output /backups/geode-$(date +\%Y\%m\%d).tar.gz
# Retention: keep 30 days
find /backups -name "geode-*.tar.gz" -mtime +30 -delete
3. Monitoring
Monitor key metrics:
- Query latency (p50, p95, p99)
- Throughput (queries/sec)
- Connection pool usage
- Transaction rate
- Storage growth
- Cache hit rates
# Example: Prometheus metrics
from prometheus_client import Counter, Histogram
query_counter = Counter('geode_queries_total', 'Total queries')
query_duration = Histogram('geode_query_duration_seconds', 'Query duration')
@query_duration.time()
async def monitored_query(query, params):
query_counter.inc()
return await client.query(query, params)
4. Log Management
# logging.conf
version: 1
handlers:
file:
class: logging.handlers.RotatingFileHandler
filename: /var/log/geode/application.log
maxBytes: 104857600 # 100MB
backupCount: 10
loggers:
geode:
level: INFO
handlers: [file]
Testing Best Practices
1. Unit Test Queries
import pytest
@pytest.mark.asyncio
async def test_user_creation():
# Setup
await client.query("DELETE (u:User {email: '[email protected]'})")
# Execute
await client.query(
"INSERT (u:User {email: $email, name: $name})",
{"email": "[email protected]", "name": "Test User"}
)
# Verify
result, _ = await client.query(
"MATCH (u:User {email: $email}) RETURN u.name",
{"email": "[email protected]"}
)
assert result.rows[0]['u.name'] == "Test User"
# Cleanup
await client.query("DELETE (u:User {email: '[email protected]'})")
2. Integration Tests
@pytest.mark.integration
async def test_order_workflow():
# Create test data
await tx.query("INSERT (u:User {id: 'test_user'})")
await tx.query("INSERT (p:Product {id: 'test_prod', price: 99.99})")
# Test workflow
async with client.connection() as tx:
await tx.begin()
await tx.execute(
"INSERT (o:Order {id: 'test_order', total: 99.99})"
)
await tx.execute(
"MATCH (u:User {id: 'test_user'}), (o:Order {id: 'test_order'}) "
"INSERT (u)-[:PLACED]->(o)"
)
await tx.commit()
# Verify
result, _ = await tx.query(
"MATCH (u:User {id: 'test_user'})-[:PLACED]->(o:Order) "
"RETURN o.total"
)
assert result.rows[0]['o.total'] == 99.99
Common Antipatterns to Avoid
1. Don’t Use Properties as Arrays of IDs
-- BAD
INSERT (user:User {friend_ids: ['u2', 'u3', 'u4']});
-- GOOD
INSERT (u1:User)-[:FRIENDS_WITH]->(u2:User);
INSERT (u1)-[:FRIENDS_WITH]->(u3:User);
2. Don’t Create Redundant Relationships
-- BAD: Redundant relationship
INSERT (user)-[:LIVES_IN]->(city);
INSERT (city)-[:IN_STATE]->(state);
INSERT (user)-[:LIVES_IN_STATE]->(state); -- Derivable!
-- GOOD: Derive when needed
MATCH (user)-[:LIVES_IN]->(city)-[:IN_STATE]->(state)
RETURN state.name;
3. Don’t Over-Index
-- BAD: Too many indexes
CREATE INDEX user_id ON User(id);
CREATE INDEX user_name ON User(name);
CREATE INDEX user_email ON User(email);
CREATE INDEX user_created ON User(created);
CREATE INDEX user_updated ON User(updated);
-- ... etc
-- GOOD: Strategic indexes
CREATE INDEX user_id ON User(id); -- Primary lookup
CREATE INDEX user_email ON User(email); -- Login
Conclusion
Following these best practices ensures your Geode deployments are performant, secure, and maintainable. Focus on:
- Schema Design: Model for queries, use appropriate granularity
- Query Optimization: Profile, index strategically, use parameters
- Security: RLS policies, parameterized queries, audit logging
- Performance: Batch operations, connection pooling, caching
- Production: Monitoring, backups, resource planning
Explore the documentation below for detailed guides on specific topics, advanced patterns, and troubleshooting techniques.