Query Language and Execution in Geode
Querying is the primary interface for interacting with graph databases. Geode implements the ISO/IEC 39075:2024 Graph Query Language (GQL) standard, providing a powerful, declarative language for pattern matching, traversals, aggregations, and data manipulation across graph structures.
Introduction to Graph Querying
Graph queries differ fundamentally from SQL queries:
SQL (Relational):
- Tables with rows and columns
- Joins explicitly specified
- Fixed schema required
- Path queries complex (recursive CTEs)
GQL (Graph):
- Nodes and relationships as first-class citizens
- Relationships traversed naturally with pattern matching
- Flexible schema (property graph model)
- Path queries intuitive with variable-length patterns
Example comparison:
-- SQL: Find friends-of-friends (complex)
SELECT DISTINCT u3.name
FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id
JOIN users u2 ON f1.friend_id = u2.id
JOIN friendships f2 ON u2.id = f2.user_id
JOIN users u3 ON f2.friend_id = u3.id
WHERE u1.name = 'Alice' AND u3.id != u1.id;
-- GQL: Same query (intuitive)
MATCH (u:User {name: 'Alice'})-[:FRIEND]->()-[:FRIEND]->(fof:User)
WHERE fof <> u
RETURN DISTINCT fof.name;
GQL Query Structure
Basic Pattern Matching
MATCH finds patterns in the graph:
-- Match nodes with label
MATCH (u:User)
RETURN u.name;
-- Match relationships
MATCH (u:User)-[:FRIEND]->(friend:User)
RETURN u.name, friend.name;
-- Match with properties
MATCH (p:Product {category: 'Electronics', available: true})
RETURN p.name, p.price;
-- Match with WHERE clause
MATCH (u:User)
WHERE u.age >= 18 AND u.country = 'USA'
RETURN u.name, u.email;
Variable-Length Paths
Query paths of varying lengths:
-- Friends up to 3 hops away
MATCH (u:User {id: $user_id})-[:FRIEND*1..3]->(connection:User)
RETURN DISTINCT connection.name, length(path) AS hops;
-- All paths (use cautiously!)
MATCH path = (start:Node)-[:CONNECTED*]->(end:Node)
WHERE start.id = $start_id AND end.id = $end_id
RETURN path;
-- Shortest path
MATCH path = shortestPath((start:Node)-[:EDGE*]-(end:Node))
WHERE start.id = $start AND end.id = $end
RETURN path, length(path) AS distance;
Data Manipulation
CREATE, UPDATE, DELETE operations:
-- Create nodes
CREATE (u:User {
id: $id,
name: $name,
email: $email,
created_at: timestamp()
});
-- Create relationships
MATCH (u:User {id: $user_id}), (p:Product {id: $product_id})
CREATE (u)-[:PURCHASED {
timestamp: timestamp(),
price: $price,
quantity: $quantity
}]->(p);
-- Update properties
MATCH (u:User {id: $user_id})
SET u.last_login = timestamp(),
u.login_count = u.login_count + 1;
-- Delete (node and relationships)
MATCH (u:User {id: $user_id})
DETACH DELETE u; -- Deletes node and all its relationships
Aggregations
Group and aggregate data:
-- Count by category
MATCH (p:Product)
RETURN p.category, count(p) AS product_count
ORDER BY product_count DESC;
-- Average, sum, min, max
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN u.name,
count(p) AS purchases,
avg(p.price) AS avg_price,
sum(p.price) AS total_spent,
min(p.price) AS cheapest,
max(p.price) AS most_expensive;
-- Collect into lists
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN u.name, collect(p.name) AS purchased_products;
Query Execution Process
Understanding how Geode executes queries helps write efficient queries:
1. Parsing
Query text parsed into Abstract Syntax Tree (AST):
MATCH (u:User {country: 'USA'})-[:PURCHASED]->(p:Product)
WHERE p.price > 100
RETURN u.name, p.name;
Parsed structure:
- MATCH clause: Pattern with node filters
- WHERE clause: Additional predicates
- RETURN clause: Projection
2. Planning
Optimizer generates execution plan:
EXPLAIN
MATCH (u:User {country: 'USA'})-[:PURCHASED]->(p:Product)
WHERE p.price > 100
RETURN u.name, p.name;
-- Plan:
Project(u.name, p.name)
Filter(p.price > 100)
Expand((User)-[:PURCHASED]->(Product))
IndexSeek(User.country = 'USA')
3. Execution
Plan executed step-by-step:
- IndexSeek: Find Users in USA (fast)
- Expand: Follow PURCHASED relationships
- Filter: Keep only products with price > 100
- Project: Return requested properties
4. Result Streaming
Results streamed to client as they’re produced (not buffered entirely).
Common Query Patterns
Pattern 1: Recommendation Engine
Collaborative filtering using graph traversals:
-- Find products bought by similar users
MATCH (me:User {id: $my_id})-[:PURCHASED]->(p:Product)
<-[:PURCHASED]-(similar:User)-[:PURCHASED]->(recommendation:Product)
WHERE NOT (me)-[:PURCHASED]->(recommendation)
WITH recommendation, count(similar) AS similar_users
WHERE similar_users >= 3
RETURN recommendation.name,
recommendation.price,
similar_users
ORDER BY similar_users DESC
LIMIT 10;
Pattern 2: Social Network Analysis
Find influential users (high centrality):
-- Count followers (in-degree centrality)
MATCH (u:User)<-[:FOLLOWS]-(follower)
WITH u, count(follower) AS follower_count
WHERE follower_count > 100
RETURN u.name, follower_count
ORDER BY follower_count DESC
LIMIT 50;
-- Find mutual connections
MATCH (me:User {id: $my_id})-[:FRIEND]-(mutual)-[:FRIEND]-(them:User {id: $their_id})
RETURN mutual.name, mutual.email;
Pattern 3: Fraud Detection
Detect suspicious patterns:
-- Find accounts sharing devices but claiming to be different people
MATCH (u1:User)-[:LOGGED_IN_FROM]->(device:Device)<-[:LOGGED_IN_FROM]-(u2:User)
WHERE u1.id < u2.id
AND u1.email <> u2.email
AND u1.address = u2.address
RETURN u1.name, u2.name, device.fingerprint,
count(*) AS shared_logins
ORDER BY shared_logins DESC;
-- Circular transfers (possible money laundering)
MATCH path = (a1:Account)-[:TRANSFER*4..6]->(a1)
WHERE reduce(total = 0, r IN relationships(path) | total + r.amount) > 10000
RETURN path, reduce(total = 0, r IN relationships(path) | total + r.amount) AS total_amount;
Pattern 4: Knowledge Graph Queries
Semantic queries over structured knowledge:
-- Find entities related through multiple relationship types
MATCH (entity:Entity {name: 'Artificial Intelligence'})
-[r:RELATED_TO|PART_OF|INFLUENCES*1..2]-(related:Entity)
RETURN related.name, collect(type(r)) AS relationship_types, count(r) AS connection_strength
ORDER BY connection_strength DESC
LIMIT 20;
-- Find experts on a topic
MATCH (topic:Topic {name: $topic_name})<-[:WROTE_ABOUT]-(article:Article)
<-[:AUTHORED]-(author:Person)
WITH author, count(article) AS article_count
WHERE article_count >= 5
RETURN author.name, author.affiliation, article_count
ORDER BY article_count DESC;
Query Optimization Tips
Tip 1: Use Indexes
Create indexes on frequently filtered properties:
-- Without index: slow (full scan)
MATCH (u:User {email: 'alice@example.com'})
RETURN u;
-- Create index
CREATE INDEX user_email ON User(email);
-- With index: fast (logarithmic lookup)
MATCH (u:User {email: 'alice@example.com'})
RETURN u;
Tip 2: Filter Early
Push filters as close to data source as possible:
-- BAD: Filter after expensive traversal
MATCH (u:User)-[:FRIEND*1..3]->(connection:User)
WHERE u.country = 'USA'
RETURN connection.name;
-- GOOD: Filter before traversal
MATCH (u:User {country: 'USA'})-[:FRIEND*1..3]->(connection:User)
RETURN connection.name;
Tip 3: Limit Intermediate Results
Use LIMIT to bound explosion of intermediate results:
-- Potentially huge intermediate result
MATCH (u:User)-[:FRIEND]->(friend)
WITH friend
MATCH (friend)-[:LIKES]->(item)
RETURN item.name, count(friend) AS likes;
-- Bounded intermediate result
MATCH (u:User)-[:FRIEND]->(friend)
WITH friend
LIMIT 1000 -- Limit before expensive second traversal
MATCH (friend)-[:LIKES]->(item)
RETURN item.name, count(friend) AS likes;
Tip 4: Use Parameters
Parameterized queries enable plan caching:
-- Plan cached and reused
MATCH (u:User {id: $user_id})
RETURN u;
-- Different plan for each query (slower)
MATCH (u:User {id: 123})
RETURN u;
Client Libraries
Python Client
from geode_client import Client
import asyncio
async def query_example():
client = Client(host="localhost", port=3141)
async with client.connection() as conn:
# Simple query
result, _ = await conn.query("""
MATCH (u:User)
WHERE u.country = $country
RETURN u.name, u.email
ORDER BY u.name
LIMIT 10
""", {"country": "USA"})
# Iterate results
for row in result.rows:
print(f"{row['name'].as_string}: {row['email'].as_string}")
# Transaction
await conn.begin()
try:
await conn.execute(
"""
CREATE (u:User {id: $id, name: $name})
""",
{"id": 123, "name": "Alice"},
)
await conn.execute(
"""
MATCH (u:User {id: $id})
SET u.verified = true
""",
{"id": 123},
)
await conn.commit()
except Exception:
await conn.rollback()
raise
asyncio.run(query_example())
Go Client
import (
"database/sql"
_ "geodedb.com/geode"
)
func queryExample() {
db, err := sql.Open("geode", "quic://localhost:3141")
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Parameterized query
rows, err := db.Query(`
MATCH (u:User)
WHERE u.age >= $1
RETURN u.name, u.age
ORDER BY u.age DESC
`, 18)
if err != nil {
log.Fatal(err)
}
defer rows.Close()
for rows.Next() {
var name string
var age int
if err := rows.Scan(&name, &age); err != nil {
log.Fatal(err)
}
fmt.Printf("%s: %d\n", name, age)
}
}
Performance Considerations
Query latency depends on:
- Index utilization (indexed vs. full scan)
- Result cardinality (10 rows vs. 1M rows)
- Graph complexity (1 hop vs. 6 hops)
- Concurrent load (single query vs. high-concurrency workloads)
Typical performance (mid-range server):
- Indexed point lookup: <1ms (typically 100-500us)
- Simple 1-2 hop traversal: 1-5ms
- Complex 3-4 hop pattern: 10-100ms
- Analytical aggregation: 100-500ms for ~1M rows
Use PROFILE to measure:
PROFILE
MATCH (u:User {id: $id})-[:FRIEND*1..2]->(connection)
RETURN connection.name;
-- Shows:
-- - Time per operation
-- - Rows processed
-- - Index usage
-- - Cache hits
Related Topics
- GQL : GQL standard and syntax
- Query Optimization : Deep dive into optimization
- Pattern Matching : Pattern matching techniques
- Performance : Performance tuning
- Indexing : Index strategies
- Explain : Query plan analysis
Further Reading
- GQL Language Guide:
/docs/gql/language-guide/ - Query Patterns:
/docs/gql/query-patterns/ - Optimization Guide:
/docs/performance/query-optimization/ - Client Libraries:
/docs/client-libraries/overview/ - API Reference:
/docs/api-reference/gql-syntax/
Advanced Query Optimization Techniques
Query Plan Caching
Geode caches compiled query plans for repeated queries:
from geode_client import Client
import asyncio
async def demonstrate_plan_caching():
client = Client(host="localhost", port=3141)
async with client.connection() as conn:
# Prepared statement - plan cached
stmt = await conn.prepare("""
MATCH (u:User {id: $user_id})-[:PURCHASED]->(p:Product)
WHERE p.price > $min_price
RETURN p.name, p.price
ORDER BY p.price DESC
""")
# Execute multiple times - uses cached plan
for user_id in range(1, 1000):
result, _ = await stmt.execute(
user_id=user_id,
min_price=100.0
)
products = [row for row in result.rows]
await process_products(products)
async def process_products(products):
pass # Process product list
asyncio.run(demonstrate_plan_caching())
Adaptive Query Execution
Geode’s optimizer adapts based on actual data distribution:
-- Initial execution: Optimizer estimates cardinality
MATCH (u:User)-[:FRIEND]->(f:User)
WHERE u.country = $country
RETURN count(f) AS friend_count;
-- After several executions with different countries:
-- Optimizer learns actual selectivity and adjusts plan
--
-- High selectivity country (few users):
-- Plan: IndexSeek -> Expand -> Count
--
-- Low selectivity country (many users):
-- Plan: IndexSeek -> Parallel Expand -> Parallel Count
Join Order Optimization
Understanding join reordering for complex queries:
-- Original query
MATCH (u:User)-[:PURCHASED]->(p:Product)<-[:MANUFACTURED]-(m:Manufacturer)
WHERE u.country = 'USA'
AND m.certified = true
AND p.category = 'Electronics'
RETURN u.name, p.name, m.name;
-- Optimizer reorders to minimize intermediate results:
-- 1. Filter manufacturers (certified = true) - smallest set
-- 2. Expand to products in Electronics - medium set
-- 3. Expand to users in USA - largest set
--
-- Execution plan:
-- Project(u.name, p.name, m.name)
-- HashJoin(p)
-- HashJoin(p)
-- IndexSeek(Manufacturer.certified = true)
-- Expand(MANUFACTURED)
-- IndexSeek(Product.category = 'Electronics')
-- Expand(PURCHASED)
-- IndexSeek(User.country = 'USA')
Materialized Views
Create pre-computed results for expensive queries:
-- Create materialized view
CREATE MATERIALIZED VIEW popular_products AS
MATCH (p:Product)<-[:PURCHASED]-(u:User)
WITH p, count(u) AS purchase_count
WHERE purchase_count > 100
RETURN p.id, p.name, p.category, purchase_count
ORDER BY purchase_count DESC;
-- Query uses materialized view (fast)
MATCH (p:PopularProduct)
WHERE p.category = 'Electronics'
RETURN p.name, p.purchase_count
LIMIT 10;
-- Refresh materialized view
REFRESH MATERIALIZED VIEW popular_products;
Query Complexity Analysis
Computational Complexity
Understanding query complexity helps predict performance:
-- O(1) - Constant time: Direct node lookup by ID
MATCH (u:User)
WHERE id(u) = $id
RETURN u;
-- O(log n) - Logarithmic: Indexed property lookup
MATCH (u:User {email: $email}) -- Assumes index on email
RETURN u;
-- O(n) - Linear: Full table scan
MATCH (u:User)
WHERE u.age > 18
RETURN count(u);
-- O(n * m) - Quadratic: Cartesian product
MATCH (u:User), (p:Product) -- No relationship filter!
RETURN u.name, p.name; -- Danger: n * m rows
-- O(n^k) - Exponential: Variable-length paths
MATCH (u:User)-[:FRIEND*1..5]->(f:User) -- Can explode
RETURN DISTINCT f.name;
-- O(n log n) - Linearithmic: Sort operation
MATCH (u:User)
RETURN u.name
ORDER BY u.created_at
Query Cost Estimation
# Estimate query cost before execution
from geode_client import Client
async def estimate_query_cost(query, params):
client = Client(host="localhost", port=3141)
async with client.connection() as conn:
# Get query plan
plan = await client.explain(query, params)
# Analyze estimated costs
total_rows = plan.estimated_rows
total_cost = plan.estimated_cost
print(f"Estimated rows: {total_rows}")
print(f"Estimated cost: {total_cost}")
print(f"Plan operators: {len(plan.operators)}")
if total_cost > 100000:
print("WARNING: Expensive query detected")
print("Consider adding indexes or rewriting")
return plan
Real-Time Query Monitoring
Query Performance Tracking
from prometheus_client import Histogram, Counter
import time
# Metrics
query_latency = Histogram(
'geode_query_latency_seconds',
'Query execution latency',
['query_type', 'cache_status'],
buckets=[0.001, 0.01, 0.1, 1.0, 10.0]
)
query_total = Counter(
'geode_queries_total',
'Total number of queries',
['status']
)
async def monitored_query(client, query, params):
start = time.time()
cache_hit = False
try:
result, _ = await client.query(query, params)
query_total.labels(status='success').inc()
# Check if result came from cache
if hasattr(result, 'from_cache'):
cache_hit = result.from_cache
rows = [row for row in result.rows]
latency = time.time() - start
query_latency.labels(
query_type='read' if 'MATCH' in query else 'write',
cache_status='hit' if cache_hit else 'miss'
).observe(latency)
return rows
except Exception as e:
query_total.labels(status='error').inc()
raise
Slow Query Logging
import logging
from functools import wraps
logger = logging.getLogger('geode.slow_queries')
def log_slow_queries(threshold_seconds=1.0):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
start = time.time()
result = await func(*args, **kwargs)
duration = time.time() - start
if duration > threshold_seconds:
logger.warning(
f"Slow query detected: {duration:.2f}s\n"
f"Query: {kwargs.get('query', 'unknown')}\n"
f"Params: {kwargs.get('params', {})}"
)
return result
return wrapper
return decorator
@log_slow_queries(threshold_seconds=2.0)
async def execute_query(client, query, params):
result, _ = await client.query(query, params)
return result
Query Pattern Catalog
Pattern: Pagination
-- Offset-based pagination (simple but slow for large offsets)
MATCH (p:Product)
WHERE p.available = true
RETURN p.id, p.name, p.price
ORDER BY p.created_at DESC
SKIP $offset
LIMIT $page_size;
-- Cursor-based pagination (fast and scalable)
MATCH (p:Product)
WHERE p.available = true
AND p.created_at < $cursor_timestamp
RETURN p.id, p.name, p.price, p.created_at
ORDER BY p.created_at DESC
LIMIT $page_size;
-- Keyset pagination (best for large datasets)
MATCH (p:Product)
WHERE p.available = true
AND (p.created_at < $cursor_date
OR (p.created_at = $cursor_date AND p.id > $cursor_id))
RETURN p.id, p.name, p.price, p.created_at
ORDER BY p.created_at DESC, p.id
LIMIT $page_size;
Pattern: Deduplication
-- Remove duplicates in results
MATCH (u:User)-[:FRIEND*1..2]->(connection:User)
WHERE u.id = $user_id
RETURN DISTINCT connection.id, connection.name;
-- Deduplicate using aggregation
MATCH (u:User)-[:LIKES]->(item:Item)
WITH item, collect(DISTINCT u.id) AS likers
RETURN item.name, size(likers) AS unique_likes;
Pattern: Top-K with Aggregation
-- Find top 10 users by purchase amount
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH u, sum(p.price) AS total_spent
ORDER BY total_spent DESC
LIMIT 10
RETURN u.name, total_spent;
-- Top products per category
MATCH (p:Product)
WITH p.category AS category, p
ORDER BY p.sales DESC
WITH category, collect(p)[0..10] AS top_products
RETURN category, [p IN top_products | {name: p.name, sales: p.sales}];
Pattern: Time-Based Windows
-- Aggregate data into time buckets
MATCH (event:Event)
WHERE event.timestamp >= datetime('2025-01-01T00:00:00Z')
WITH duration.truncate(event.timestamp, 'hour') AS hour,
event.type AS event_type
RETURN hour,
event_type,
count(*) AS event_count
ORDER BY hour, event_type;
-- Rolling window aggregation
MATCH (metric:Metric)
WHERE metric.timestamp >= datetime() - duration('P1D')
WITH metric
ORDER BY metric.timestamp
WITH metric,
avg(metric.value) OVER (
ORDER BY metric.timestamp
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS moving_average
RETURN metric.timestamp, metric.value, moving_average;
Troubleshooting Query Issues
Issue: Cartesian Product
Symptom: Query returns millions of rows unexpectedly
-- PROBLEM: Accidental Cartesian product
MATCH (u:User), (p:Product)
WHERE u.country = 'USA' -- Filters users but creates cross product
RETURN u.name, p.name; -- Returns every user × every product
-- SOLUTION: Add relationship constraint
MATCH (u:User)-[:INTERESTED_IN]->(p:Product)
WHERE u.country = 'USA'
RETURN u.name, p.name;
Issue: Unbounded Variable-Length Paths
Symptom: Query never completes
-- PROBLEM: Unbounded path traversal
MATCH (u:User {id: $id})-[:FRIEND*]->(f:User)
RETURN f.name; -- May traverse entire graph!
-- SOLUTION: Add upper bound
MATCH (u:User {id: $id})-[:FRIEND*1..3]->(f:User)
RETURN DISTINCT f.name
LIMIT 100; -- Also add result limit
Issue: Missing Index
Symptom: Full table scan on large dataset
-- Check current indexes
CALL dbms.indexes.list() YIELD name, labels, properties
RETURN name, labels, properties;
-- Identify missing index from slow query
PROFILE
MATCH (u:User)
WHERE u.email = $email
RETURN u;
-- Shows: TableScan instead of IndexSeek
-- Create missing index
CREATE INDEX user_email_idx FOR (u:User) ON (u.email);
-- Verify index usage
EXPLAIN
MATCH (u:User)
WHERE u.email = $email
RETURN u;
-- Now shows: IndexSeek(user_email_idx)
Issue: Lock Contention
Symptom: Transaction aborts with serialization errors
# Solution: Implement retry with exponential backoff
import asyncio
import random
from geode_client import QueryError
async def execute_with_retry(client, query, params, max_attempts=5):
for attempt in range(max_attempts):
async with client.connection() as conn:
await conn.begin()
try:
result, _ = await conn.query(query, params)
await conn.commit()
return [row for row in result.rows]
except QueryError as exc:
await conn.rollback()
if "SERIALIZATION" not in str(exc):
raise
if attempt == max_attempts - 1:
raise
# Exponential backoff with jitter
delay = (2 ** attempt) * 0.1 * random.uniform(0.5, 1.5)
await asyncio.sleep(delay)