Advanced Features in Geode
Geode is designed for enterprise-scale graph workloads with advanced features that go beyond basic graph operations. This comprehensive guide explores sophisticated capabilities including distributed transactions, graph algorithms, temporal queries, row-level security, and performance optimization techniques that distinguish Geode from other graph databases.
Enterprise-Grade Transaction Management
Distributed ACID Transactions
Geode implements full ACID compliance with support for distributed transactions across multiple graph partitions. Unlike some graph databases that sacrifice consistency for scale, Geode maintains strict transactional guarantees even in distributed deployments.
-- Begin a distributed transaction
BEGIN TRANSACTION;
-- Operations span multiple graph partitions
INSERT (p:Person {id: 'user123', name: 'Alice'});
INSERT (c:Company {id: 'corp456', name: 'TechCo'});
INSERT (p)-[:WORKS_AT {since: DATE '2024-01-15'}]->(c);
-- Atomic commit ensures all-or-nothing semantics
COMMIT;
Savepoint Support
Advanced transaction control with savepoints enables partial rollback without aborting entire transactions:
BEGIN TRANSACTION;
-- Create initial data
INSERT (u:User {id: 'u1', name: 'Bob'});
-- Create savepoint
SAVEPOINT sp1;
-- Risky operations
INSERT (u:User {id: 'u2', email: 'invalid'}); -- May fail validation
-- Rollback to savepoint if needed
ROLLBACK TO SAVEPOINT sp1;
-- Continue with valid operations
INSERT (u:User {id: 'u3', name: 'Charlie'});
COMMIT;
Transaction Isolation Levels
Geode supports multiple isolation levels to balance consistency and performance:
- READ UNCOMMITTED: Maximum performance, minimal isolation
- READ COMMITTED: Default level, prevents dirty reads
- REPEATABLE READ: Prevents non-repeatable reads
- SERIALIZABLE: Full isolation, prevents phantom reads
-- Set isolation level for current session
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Row-Level Security (RLS)
Geode’s row-level security system provides fine-grained access control at the node and relationship level, enabling multi-tenant applications and secure data segregation.
Defining Security Policies
-- Create policy restricting access to user's own data
CREATE POLICY user_isolation ON Person
USING (current_user_id = id);
-- Create policy for role-based access
CREATE POLICY manager_access ON Employee
USING (current_user_role IN ['manager', 'admin'] OR id = current_user_id);
-- Relationship-level security
CREATE POLICY confidential_relationships ON FOLLOWS
USING (visibility = 'public' OR source_id = current_user_id);
Policy Application
RLS policies are automatically enforced in all queries without requiring application-level filtering:
-- Automatically filtered by RLS policies
MATCH (p:Person)
RETURN p.name, p.email;
-- Only returns nodes/relationships permitted by active policies
MATCH (e:Employee)-[:REPORTS_TO]->(m:Manager)
RETURN e.name, m.name;
Graph Algorithms
Geode provides built-in graph algorithms for common analytical tasks, implemented efficiently in the database engine.
Path Analysis
-- Shortest path between two nodes
MATCH path = SHORTEST (a:Person {name: 'Alice'})-[:KNOWS*]->(b:Person {name: 'Bob'})
RETURN path;
-- All shortest paths
MATCH paths = ALL SHORTEST (a)-[:KNOWS*]->(b)
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN paths;
-- K-shortest paths
MATCH paths = K SHORTEST 5 (a:Airport {code: 'SFO'})-[:FLIGHT*]->(b:Airport {code: 'JFK'})
RETURN paths, reduce(cost = 0, r IN relationships(paths) | cost + r.price) AS total_cost;
Centrality Measures
-- Degree centrality (count of connections)
MATCH (p:Person)
OPTIONAL MATCH (p)-[r:KNOWS]-()
RETURN p.name, count(r) AS degree
ORDER BY degree DESC
LIMIT 10;
-- PageRank-style importance (recursive influence)
MATCH (p:Person)
CALL graph.pagerank(p, 'FOLLOWS') AS pr
RETURN p.name, pr.score
ORDER BY pr.score DESC;
Community Detection
-- Label propagation for community detection
MATCH (p:Person)
CALL graph.label_propagation(p, 'KNOWS', 10) AS community
RETURN community.id, collect(p.name) AS members;
Temporal Query Features
Geode supports temporal data types and operations for time-based graph analytics.
Temporal Data Types
- DATE: Calendar dates (YYYY-MM-DD)
- TIME: Time of day with timezone
- TIMESTAMP: Point in time with nanosecond precision
- DURATION: Time intervals
- PERIOD: Time ranges
-- Insert temporal data
INSERT (e:Event {
name: 'Product Launch',
scheduled: TIMESTAMP '2024-06-15 09:00:00-07:00',
duration: DURATION 'PT2H30M'
});
-- Temporal queries
MATCH (e:Event)
WHERE e.scheduled >= CURRENT_TIMESTAMP
AND e.scheduled <= CURRENT_TIMESTAMP + DURATION 'P7D'
RETURN e.name, e.scheduled;
Time-Series Analysis
-- Aggregate by time windows
MATCH (s:Sale)
WHERE s.timestamp >= DATE '2024-01-01'
RETURN
date_trunc('month', s.timestamp) AS month,
count(*) AS sales_count,
sum(s.amount) AS total_revenue
GROUP BY month
ORDER BY month;
Custom Aggregations
Geode extends standard aggregations (COUNT, SUM, AVG) with advanced analytical functions.
Statistical Aggregations
-- Statistical analysis
MATCH (p:Product)-[:SOLD]->(s:Sale)
RETURN
p.name,
avg(s.price) AS mean_price,
stddev(s.price) AS price_stddev,
percentile(s.price, 0.50) AS median_price,
percentile(s.price, 0.95) AS p95_price;
Custom Aggregate Functions
-- Collect unique values into lists
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN u.name, collect(DISTINCT p.category) AS categories;
-- String aggregation
MATCH (t:Team)-[:HAS_MEMBER]->(m:Member)
RETURN t.name, string_agg(m.name, ', ') AS members;
Performance Optimization
Query Profiling and Optimization
-- Profile query execution
PROFILE
MATCH (p:Person)-[:KNOWS*2..3]->(friend)
WHERE p.city = 'San Francisco'
RETURN friend.name, count(*) AS connections
ORDER BY connections DESC
LIMIT 20;
The PROFILE output shows:
- Execution plan with operator costs
- Index usage and scan types
- Memory consumption
- Execution time per operator
Index Strategies
-- Create composite index for complex queries
CREATE INDEX person_city_age ON Person(city, age);
-- Full-text search index
CREATE FULLTEXT INDEX product_search ON Product(name, description);
-- Use full-text search
MATCH (p:Product)
WHERE fulltext_search(p, 'wireless headphones')
RETURN p.name, p.price;
Query Hints
-- Force index usage
MATCH (p:Person)
USING INDEX person_city_age
WHERE p.city = 'Austin' AND p.age > 25
RETURN p.name;
-- Disable index for full scan
MATCH (p:Person)
WITHOUT INDEX
RETURN count(*);
Parallel Query Execution
Geode automatically parallelizes query execution across available CPU cores for large analytical queries.
-- Parallel aggregation over large graph
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH p, count(u) AS buyers
WHERE buyers > 100
RETURN p.category, sum(buyers) AS total_buyers
GROUP BY p.category;
Batch Operations
Efficient bulk loading and updates:
-- Bulk insert with UNWIND
UNWIND $users AS user_data
INSERT (u:User {
id: user_data.id,
name: user_data.name,
email: user_data.email
});
-- Batch update
MATCH (p:Product)
WHERE p.price IS NOT NULL
SET p.price = p.price * 1.05 -- 5% price increase
RETURN count(*) AS updated;
Advanced Graph Patterns
Variable-Length Paths with Filters
-- Find paths with constraints on intermediate nodes
MATCH path = (start:Person {name: 'Alice'})
-[:KNOWS*2..5 (r, n | n.active = true)]->(end:Person)
WHERE end.city = 'Seattle'
RETURN path, length(path) AS hops;
Multi-Pattern Matching
-- Complex pattern with multiple conditions
MATCH (author:Person)-[:WROTE]->(paper:Paper),
(paper)-[:CITES]->(cited:Paper),
(cited)<-[:WROTE]-(coauthor:Person)
WHERE author <> coauthor
AND paper.year >= 2020
AND cited.year < paper.year
RETURN author.name, coauthor.name, count(cited) AS citations
ORDER BY citations DESC;
Integration with External Systems
Foreign Data Wrappers
Connect to external data sources:
-- Query external PostgreSQL table
MATCH (p:Person)
CALL foreign.query('postgres', 'SELECT * FROM orders WHERE user_id = $1', p.id) AS orders
RETURN p.name, orders.order_id, orders.total;
Export and Materialized Views
-- Create materialized view for frequent queries
CREATE MATERIALIZED VIEW popular_products AS
MATCH (p:Product)<-[:PURCHASED]-(u:User)
WHERE p.created >= CURRENT_DATE - DURATION 'P30D'
RETURN p.id, p.name, count(u) AS purchases
ORDER BY purchases DESC
LIMIT 100;
-- Refresh materialized view
REFRESH MATERIALIZED VIEW popular_products;
Comparison with Other Graph Databases
vs. Neo4j
- Standards Compliance: Geode implements ISO/IEC 39075:2024 GQL standard; Neo4j uses proprietary Cypher
- RLS: Native row-level security in Geode; requires application-level or plugin in Neo4j
- Transactions: Full distributed ACID in Geode; Neo4j limited to single instance for full ACID
- Performance: Geode’s QUIC transport typically faster than Neo4j’s Bolt protocol
vs. Amazon Neptune
- Query Language: Geode uses standard GQL; Neptune supports Gremlin and SPARQL
- Deployment: Geode supports on-premise and cloud; Neptune is AWS-only
- Cost: Geode open-source with no vendor lock-in; Neptune proprietary with instance-hour pricing
vs. TigerGraph
- Language: GQL vs. GSQL
- ACID: Geode full ACID; TigerGraph eventual consistency in distributed mode
- Integration: Both support REST APIs; Geode also native QUIC clients
Best Practices
1. Use Appropriate Indexes
Create indexes for frequently filtered properties:
CREATE INDEX user_email ON User(email);
CREATE INDEX product_category_price ON Product(category, price);
2. Leverage Query Parameters
Always use parameterized queries to enable query plan caching:
-- Good: parameterized
MATCH (p:Person {id: $user_id})
RETURN p;
-- Avoid: literal values prevent plan caching
MATCH (p:Person {id: 'user123'})
RETURN p;
3. Limit Result Sets
Use LIMIT and pagination for large result sets:
MATCH (p:Person)
RETURN p.name
ORDER BY p.created DESC
LIMIT 100
OFFSET $page_offset;
4. Monitor Query Performance
Regularly profile slow queries:
PROFILE
MATCH (p:Person)-[:KNOWS*3..5]->(friend)
RETURN friend.name;
5. Implement RLS Policies
Use row-level security instead of application-level filtering:
-- Centralized security in database
CREATE POLICY tenant_isolation ON ALL
USING (tenant_id = current_tenant());
Getting Started with Advanced Features
1. Enable Advanced Features
Some features require configuration:
# In geode.conf
enable_rls = true
enable_parallel_query = true
max_query_parallelism = 8
2. Import Sample Data
-- Load sample graph
CALL graph.load_sample('social_network');
3. Experiment with Features
-- Try shortest path
MATCH path = SHORTEST (a:Person)-[:KNOWS*]-(b:Person)
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN path;
-- Test RLS
CREATE POLICY test_policy ON Person USING (id = current_user_id);
4. Profile and Optimize
PROFILE
MATCH (p:Product)<-[:PURCHASED]-(u:User)
WHERE p.category = 'Electronics'
RETURN p.name, count(u) AS buyers
ORDER BY buyers DESC;
Conclusion
Geode’s advanced features enable sophisticated graph analytics, enterprise-grade security, and high-performance query execution. By mastering distributed transactions, row-level security, graph algorithms, temporal queries, and optimization techniques, you can build production-ready graph applications that scale to billions of nodes and relationships.
Explore the tagged documentation below for detailed guides on specific advanced features, best practices, and real-world implementation patterns.