The Graph Query Language (GQL) category provides comprehensive documentation on writing queries in Geode using the ISO/IEC 39075:2024 standard. From basic pattern matching to advanced analytical queries, this collection covers everything you need to effectively query graph data.
Introduction to GQL
Graph Query Language (GQL) is the ISO-standardized language for querying graph databases, analogous to SQL for relational databases. Ratified as ISO/IEC 39075:2024, GQL represents decades of graph database research unified into a single, coherent specification. Geode follows the ISO/IEC 39075:2024 compliance to keep queries portable and semantics predictable.
GQL adopts a declarative approach: you describe what patterns to find, not how to find them. The query engine determines optimal execution strategies using cost-based optimization, indexes, and graph-specific algorithms. This separation of concerns lets you focus on business logic while Geode handles performance.
Core Concepts
Nodes and Relationships
Graphs consist of nodes (entities) and relationships (connections). Nodes have labels that categorize them and properties that store attributes. Relationships have types and can also contain properties.
-- Node: (variable:Label {property: value})
(person:Person {name: 'Alice', age: 30})
-- Relationship: -[variable:TYPE {property: value}]->
-[friendship:KNOWS {since: 2020}]->
-- Undirected relationship
-[connection:CONNECTED]-
Pattern Matching
Pattern matching is the foundation of GQL queries. Use MATCH to describe patterns in the graph, then RETURN to specify what to retrieve:
-- Find Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(friend:Person)
RETURN friend.name, friend.age
Patterns describe graph structure using ASCII art notation:
()- A node(n)- A node bound to variablen(n:Label)- A node with labelLabel(n:Label {prop: value})- A node with label and properties-[r:TYPE]->- A directed relationship-[r:TYPE]-- An undirected relationship-[:TYPE*1..3]->- Variable-length path (1 to 3 hops)
Variables and Binding
Variables bind to matched elements, allowing you to reference them in other clauses:
MATCH (u:User {id: $userId})-[:FOLLOWS]->(followed:User)
-[:POSTED]->(post:Post)
WHERE post.created > current_timestamp() - INTERVAL '7' DAY
RETURN followed.username, post.content, post.created
ORDER BY post.created DESC
Variables scope within a query. Use WITH to pass variables between query parts:
MATCH (u:User)
WITH u, count{(u)-[:POSTED]->()} AS post_count
WHERE post_count > 10
RETURN u.username, post_count
Basic Query Operations
Reading Data: MATCH
MATCH finds patterns in the graph:
-- Single node
MATCH (n:Person)
RETURN n.name
-- Node with specific properties
MATCH (n:Person {country: 'USA', age: 25})
RETURN n.name
-- Node with relationship
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a.name, b.name
-- Multi-hop relationship
MATCH (a:Person)-[:KNOWS]->(:Person)-[:KNOWS]->(c:Person)
WHERE a.name = 'Alice'
RETURN c.name
Creating Data: CREATE
CREATE adds new nodes and relationships:
-- Create single node
CREATE (n:Person {name: 'Alice', age: 30})
-- Create multiple nodes
CREATE (a:Person {name: 'Alice'}),
(b:Person {name: 'Bob'})
-- Create nodes with relationship
CREATE (a:Person {name: 'Alice'})
-[:KNOWS {since: 2020}]->
(b:Person {name: 'Bob'})
Updating Data: SET and REMOVE
SET modifies properties:
-- Set single property
MATCH (n:Person {name: 'Alice'})
SET n.age = 31
-- Set multiple properties
MATCH (n:Person {name: 'Alice'})
SET n.age = 31, n.city = 'New York'
-- Add label
MATCH (n:Person {name: 'Alice'})
SET n:VIP
-- Copy properties
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
SET b = a
REMOVE deletes properties or labels:
-- Remove property
MATCH (n:Person {name: 'Alice'})
REMOVE n.age
-- Remove label
MATCH (n:Person {name: 'Alice'})
REMOVE n:VIP
Deleting Data: DELETE
DELETE removes nodes and relationships:
-- Delete relationship
MATCH (a:Person)-[r:KNOWS]->(b:Person)
WHERE a.name = 'Alice' AND b.name = 'Bob'
DELETE r
-- Delete node and its relationships
MATCH (n:Person {name: 'Alice'})
DETACH DELETE n
-- Delete all matching nodes
MATCH (n:TempData)
DELETE n
DETACH DELETE automatically removes all relationships connected to deleted nodes.
Filtering with WHERE
WHERE filters matched patterns using predicates:
-- Comparison operators
MATCH (n:Person)
WHERE n.age > 25 AND n.age < 40
RETURN n.name
-- String matching
MATCH (n:Person)
WHERE n.name STARTS WITH 'A'
OR n.email CONTAINS '@example.com'
RETURN n.name
-- NULL checks
MATCH (n:Person)
WHERE n.middle_name IS NULL
RETURN n.name
-- List membership
MATCH (n:Person)
WHERE n.country IN ['USA', 'Canada', 'Mexico']
RETURN n.name
-- Pattern existence
MATCH (u:User)
WHERE EXISTS {
MATCH (u)-[:PURCHASED]->(:Product {category: 'Electronics'})
}
RETURN u.name
Aggregations and Grouping
GQL provides statistical aggregation functions:
-- Count
MATCH (n:Person)
RETURN count(n) AS total_people
-- Sum, average, min, max
MATCH (u:User)-[t:TRANSACTION]->()
RETURN u.name,
sum(t.amount) AS total_spent,
avg(t.amount) AS avg_transaction,
min(t.amount) AS smallest,
max(t.amount) AS largest
-- Group by
MATCH (p:Product)
RETURN p.category, count(p) AS products_per_category
GROUP BY p.category
ORDER BY products_per_category DESC
-- Collect into list
MATCH (u:User {name: 'Alice'})-[:KNOWS]->(friend:Person)
RETURN u.name, collect(friend.name) AS friend_names
Advanced Pattern Matching
Variable-Length Paths
Match paths with varying length:
-- 1 to 3 hops
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(friend:Person)
RETURN friend.name
-- Exactly 2 hops
MATCH (a:Person {name: 'Alice'})-[:KNOWS*2]->(friend:Person)
RETURN friend.name
-- Any length (use cautiously!)
MATCH path = (a:Person {name: 'Alice'})-[:KNOWS*]->(b:Person {name: 'Bob'})
RETURN path
Shortest Paths
Find minimal-length paths:
-- Shortest path
MATCH path = SHORTEST (a:Person {name: 'Alice'})
-[:KNOWS*]->
(b:Person {name: 'Bob'})
RETURN path, length(path) AS hops
-- All shortest paths
MATCH paths = ALL SHORTEST (a:Person {name: 'Alice'})
-[:KNOWS*]->
(b:Person {name: 'Bob'})
RETURN paths
Optional Patterns
Match patterns that may not exist:
-- OPTIONAL MATCH returns NULL if pattern doesn't exist
MATCH (u:User)
OPTIONAL MATCH (u)-[:PURCHASED]->(p:Product)
RETURN u.name, p.name
-- Multiple optional patterns
MATCH (u:User {name: 'Alice'})
OPTIONAL MATCH (u)-[:LIKES]->(post:Post)
OPTIONAL MATCH (u)-[:FOLLOWS]->(other:User)
RETURN u.name, post.title, other.username
Subqueries
Execute nested queries:
-- Correlated subquery
MATCH (u:User)
WHERE EXISTS {
MATCH (u)-[:PURCHASED]->(p:Product)
WHERE p.price > 100
}
RETURN u.name
-- COUNT subquery
MATCH (u:User)
WITH u, count{(u)-[:POSTED]->()} AS post_count
WHERE post_count > 10
RETURN u.username, post_count
Ordering and Limiting Results
Control result ordering and size:
-- Order by single property
MATCH (p:Product)
RETURN p.name, p.price
ORDER BY p.price DESC
-- Order by multiple properties
MATCH (p:Person)
RETURN p.lastName, p.firstName
ORDER BY p.lastName, p.firstName
-- Limit results
MATCH (p:Product)
RETURN p.name, p.price
ORDER BY p.price DESC
LIMIT 10
-- Skip and limit (pagination)
MATCH (p:Product)
RETURN p.name, p.price
ORDER BY p.price DESC
SKIP 20
LIMIT 10
Working with Data Types
GQL supports rich data types:
-- Strings
CREATE (n:Node {text: 'Hello, World!'})
-- Numbers (integers and floats)
CREATE (n:Node {count: 42, ratio: 3.14159})
-- Booleans
CREATE (n:Node {active: true, deleted: false})
-- Dates and times
CREATE (n:Node {
created: DATE '2024-01-15',
updated: TIMESTAMP '2024-01-15T10:30:00Z',
duration: DURATION 'P1Y2M3DT4H5M6S'
})
-- Lists
CREATE (n:Node {tags: ['important', 'urgent', 'review']})
-- Maps
CREATE (n:Node {metadata: {author: 'Alice', version: 2}})
-- NULL values
CREATE (n:Node {optional_field: NULL})
Parameterized Queries
Use parameters for security and performance:
import geode_client
client = geode_client.open_database("localhost:3141")
async with client.connection() as client:
result, _ = await client.query("""
MATCH (u:User {email: $email})
SET u.last_login = $timestamp
RETURN u
""", {
"email": "[email protected]",
"timestamp": datetime.now(timezone.utc)
})
Parameters prevent SQL injection, enable query plan caching, and improve readability.
Transaction Control
Manage multi-statement transactions:
async with client.connection() as tx:
await tx.begin()
# All operations in one transaction
await tx.execute("""
MATCH (a:Account {id: $from})
SET a.balance = a.balance - $amount
""", {"from": 123, "amount": 100})
await tx.execute("""
MATCH (a:Account {id: $to})
SET a.balance = a.balance + $amount
""", {"to": 456, "amount": 100})
# Commit atomically
await tx.commit()
Best Practices
- Use parameters: Never concatenate user input into queries
- Filter early: Place selective predicates in
MATCHpatterns when possible - Limit results: Always use
LIMITfor unbounded queries - Index wisely: Create indexes on properties used in
WHEREandMATCH - Avoid Cartesian products: Ensure patterns are connected
- Use OPTIONAL carefully: Optional matches can complicate query plans
- Profile queries: Use
EXPLAINandPROFILEto understand execution - Document complex queries: Explain business logic in comments
Common Patterns
Recommendation Query
-- Collaborative filtering
MATCH (me:User {id: $userId})-[:LIKES]->(item)<-[:LIKES]-(similar:User)
-[:LIKES]->(recommendation)
WHERE NOT EXISTS {MATCH (me)-[:LIKES]->(recommendation)}
RETURN recommendation.title, COUNT(similar) AS score
ORDER BY score DESC
LIMIT 10
Fraud Detection
-- Find suspicious transaction patterns
MATCH (a1:Account)-[t1:TRANSFER]->(a2:Account)-[t2:TRANSFER]->(a3:Account)
WHERE a3.id = a1.id
AND t1.timestamp > current_timestamp() - INTERVAL '24' HOUR
RETURN a1.id, a2.id, SUM(t1.amount) AS circular_flow
Social Network Analysis
-- Find friends of friends who aren't yet friends
MATCH (me:User {id: $userId})-[:FRIEND]->(friend)-[:FRIEND]->(suggestion)
WHERE NOT EXISTS {MATCH (me)-[:FRIEND]->(suggestion)}
AND suggestion.id <> $userId
RETURN suggestion.name, COUNT(friend) AS mutual_friends
ORDER BY mutual_friends DESC
LIMIT 10
Related Topics
- GQL Syntax Reference - Detailed syntax documentation
- Pattern Matching - Advanced pattern techniques
- Query Optimization - Performance tuning
- Aggregation Functions - Statistical operations
- ISO GQL Standard - Official specification
- MATCH Clause - Pattern matching details
- Operators - Comparison and logical operators
Advanced GQL Features
Window Functions
Window functions perform calculations across sets of rows related to the current row:
-- Rank products within categories
MATCH (p:Product)
RETURN p.category,
p.name,
p.price,
RANK() OVER (PARTITION BY p.category ORDER BY p.price DESC) AS price_rank,
AVG(p.price) OVER (PARTITION BY p.category) AS category_avg_price
ORDER BY p.category, price_rank
Supported Window Functions:
ROW_NUMBER(): Sequential number for each rowRANK(): Rank with gaps for tiesDENSE_RANK(): Rank without gapsPERCENT_RANK(): Relative rank (0 to 1)NTILE(n): Divide rows into n bucketsLAG(expr, offset): Access previous rowLEAD(expr, offset): Access next rowFIRST_VALUE(expr): First value in windowLAST_VALUE(expr): Last value in window
Common Table Expressions (CTEs)
CTEs create named temporary result sets for complex queries:
-- Find users and their influence score
WITH user_metrics AS (
MATCH (u:User)
RETURN u,
COUNT{(u)<-[:FOLLOWS]-()} AS follower_count,
COUNT{(u)-[:POSTED]->()} AS post_count
),
engagement AS (
MATCH (u:User)-[:POSTED]->(p:Post)
RETURN u,
AVG(p.likes + p.comments) AS avg_engagement
)
MATCH (u:User)
WITH u,
user_metrics.follower_count AS followers,
user_metrics.post_count AS posts,
engagement.avg_engagement AS engagement
WHERE user_metrics.u = u AND engagement.u = u
RETURN u.username,
followers,
posts,
engagement,
(followers * 0.4 + posts * 0.2 + engagement * 0.4) AS influence_score
ORDER BY influence_score DESC
LIMIT 100
Recursive Queries
Find all nodes reachable through recursive traversal:
-- Find all descendants in organizational hierarchy
MATCH (root:Employee {name: 'CEO'})
WITH root
MATCH path = (root)-[:MANAGES*0..]->(subordinate:Employee)
RETURN subordinate.name,
subordinate.title,
LENGTH(path) AS reporting_levels
ORDER BY reporting_levels, subordinate.name
Depth Control:
-- Limit recursion depth
MATCH (start:Node {id: 1})-[*1..5]->(reachable)
RETURN DISTINCT reachable
-- Find all reachable (use cautiously)
MATCH (start:Node {id: 1})-[*]->(reachable)
RETURN DISTINCT reachable
Graph Pattern Expressions
Combine multiple patterns with Boolean logic:
-- Find users matching complex criteria
MATCH (u:User)
WHERE (
// Pattern 1: Active premium users
(u.subscription = 'premium' AND u.last_login > current_date() - INTERVAL '7' DAY)
OR
// Pattern 2: High engagement free users
(u.subscription = 'free' AND COUNT{(u)-[:POSTED]->()) > 50}
)
AND NOT EXISTS {
// Exclude suspended users
MATCH (u)-[:SUSPENDED]->(s:Suspension)
WHERE s.active = true
}
RETURN u.username, u.subscription, u.last_login
List Comprehensions
Process lists with filtering and transformation:
-- Extract and process relationship properties
MATCH (u:User {id: $user_id})-[follows:FOLLOWS]->(friend:User)
RETURN u.name,
[f IN COLLECT(friend) WHERE f.active = true | f.name] AS active_friends,
[f IN COLLECT(follows) | f.since] AS friendship_dates,
REDUCE(total = 0, f IN COLLECT(friend) | total + f.follower_count) AS total_network_reach
List Functions:
COLLECT(expr): Aggregate to listHEAD(list): First elementTAIL(list): All but firstLAST(list): Last elementSIZE(list): List lengthREVERSE(list): Reverse orderREDUCE(...): Fold/reduce operation
Map Projections
Create maps from node properties:
-- Return structured data
MATCH (u:User {id: $user_id})
RETURN {
user: {
id: u.id,
name: u.name,
email: u.email
},
stats: {
follower_count: COUNT{(u)<-[:FOLLOWS]-()},
following_count: COUNT{(u)-[:FOLLOWS]->()},
post_count: COUNT{(u)-[:POSTED]->()}
},
recent_activity: [
(u)-[:POSTED]->(p:Post)
WHERE p.created > current_date() - INTERVAL '7' DAY |
{title: p.title, created: p.created, likes: p.likes}
]
} AS user_profile
Query Optimization Techniques
Index Hints
Guide the query planner to use specific indexes:
-- Force index usage
MATCH (u:User)
USING INDEX u:User(email)
WHERE u.email = $email
RETURN u
-- Scan hint for small result sets
MATCH (u:User)
USING SCAN u:User
WHERE u.role = 'admin'
RETURN u
Join Hints
Control join strategy:
-- Force hash join
MATCH (a:User)-[:FRIEND]-(b:User)
OPTION (join=hash)
WHERE a.city = 'NYC' AND b.city = 'SF'
RETURN a.name, b.name
-- Force nested loop
MATCH (a:User)-[:FRIEND]-(b:User)
OPTION (join=nested_loop)
WHERE a.id = $user_id
RETURN b.name
Query Rewriting
Before optimization:
-- Inefficient: Cartesian product
MATCH (u:User), (p:Product)
WHERE u.city = p.warehouse_city
RETURN u.name, p.name
After optimization:
-- Efficient: Explicit relationship or property join
MATCH (u:User)
MATCH (p:Product {warehouse_city: u.city})
RETURN u.name, p.name
Projection Pushdown
Select only needed columns early:
-- Inefficient: Materializes all properties
MATCH (u:User)-[:POSTED]->(p:Post)
WITH u, p
WHERE p.likes > 100
RETURN u.name, p.title
-- Efficient: Project early
MATCH (u:User)-[:POSTED]->(p:Post)
WHERE p.likes > 100
RETURN u.name, p.title
Functions and Operators
String Functions
-- String manipulation
MATCH (u:User)
WHERE toLower(u.name) CONTAINS 'alice'
RETURN u.name,
toUpper(u.email) AS email_upper,
substring(u.email, 0, indexOf(u.email, '@')) AS email_prefix,
replace(u.phone, '-', '') AS phone_clean,
trim(u.address) AS address_trimmed,
left(u.postal_code, 5) AS zip5,
right(u.postal_code, 4) AS zip4_extension
Mathematical Functions
-- Math operations
MATCH (product:Product)
RETURN product.name,
product.price,
round(product.price * 1.08, 2) AS price_with_tax,
floor(product.price) AS price_floor,
ceil(product.price) AS price_ceil,
abs(product.profit_margin) AS abs_margin,
sqrt(product.variance) AS std_deviation,
power(product.growth_rate, 12) AS annual_growth
Temporal Functions
-- Date and time operations
MATCH (event:Event)
WHERE event.date >= date('2026-01-01')
AND event.date < date('2026-02-01')
RETURN event.name,
event.date,
duration_between(event.start_time, event.end_time) AS event_duration,
date_add(event.date, duration('P7D')) AS week_later,
extract(month FROM event.date) AS event_month,
extract(year FROM event.date) AS event_year,
format_temporal(event.date, 'YYYY-MM-DD') AS formatted_date
Aggregation Functions
-- Statistical aggregations
MATCH (product:Product)-[:IN_CATEGORY]->(category:Category)
RETURN category.name,
COUNT(product) AS product_count,
MIN(product.price) AS min_price,
MAX(product.price) AS max_price,
AVG(product.price) AS avg_price,
SUM(product.inventory) AS total_inventory,
STDDEV(product.price) AS price_stddev,
VARIANCE(product.price) AS price_variance,
PERCENTILE_CONT(product.price, 0.5) AS median_price,
PERCENTILE_CONT(product.price, 0.95) AS p95_price
Graph Functions
-- Graph-specific functions
MATCH path = (a:Person {name: 'Alice'})-[:KNOWS*1..5]-(b:Person {name: 'Bob'})
RETURN path,
LENGTH(path) AS path_length,
nodes(path) AS all_nodes,
relationships(path) AS all_relationships,
[n IN nodes(path) | n.name] AS node_names,
[r IN relationships(path) | type(r)] AS relationship_types,
SIZE(nodes(path)) AS node_count
Error Handling in Queries
COALESCE for Null Handling
-- Provide defaults for null values
MATCH (u:User)
RETURN u.name,
COALESCE(u.middle_name, '') AS middle_name,
COALESCE(u.phone, 'No phone') AS contact,
COALESCE(u.age, 0) AS age
CASE Expressions
-- Conditional logic
MATCH (product:Product)
RETURN product.name,
product.inventory,
CASE
WHEN product.inventory = 0 THEN 'Out of Stock'
WHEN product.inventory < 10 THEN 'Low Stock'
WHEN product.inventory < 50 THEN 'In Stock'
ELSE 'Well Stocked'
END AS stock_status,
CASE product.category
WHEN 'Electronics' THEN product.price * 0.9
WHEN 'Clothing' THEN product.price * 0.8
ELSE product.price * 0.95
END AS sale_price
TRY-CATCH Pattern
# Application-level error handling
async def safe_query_execution(client, query, params):
"""Execute query with comprehensive error handling."""
try:
result, _ = await client.query(query, params)
return result
except geode_client.ConstraintViolationError as e:
logger.error(f"Constraint violation: {e}")
# Handle uniqueness or other constraint issues
return None
except geode_client.QueryTimeoutError as e:
logger.warning(f"Query timeout: {e}")
# Potentially retry with increased timeout
return None
except geode_client.SyntaxError as e:
logger.error(f"Invalid query syntax: {e}")
# Fix query and retry
return None
except Exception as e:
logger.exception(f"Unexpected error: {e}")
raise
Query Performance Patterns
Batch Operations
-- Batch inserts for efficiency
UNWIND $users AS user_data
CREATE (u:User)
SET u = user_data
-- Batch updates
UNWIND $updates AS update
MATCH (u:User {id: update.id})
SET u.last_login = update.timestamp
Pagination Strategies
Offset-based pagination (simple but slower for large offsets):
MATCH (p:Product)
RETURN p
ORDER BY p.created DESC
SKIP $offset
LIMIT $page_size
Cursor-based pagination (efficient for large datasets):
-- First page
MATCH (p:Product)
WHERE p.id > 0
RETURN p
ORDER BY p.id
LIMIT $page_size
-- Subsequent pages
MATCH (p:Product)
WHERE p.id > $last_seen_id
RETURN p
ORDER BY p.id
LIMIT $page_size
Materialized Aggregates
-- Pre-compute expensive aggregates
MATCH (user:User)
WITH user,
COUNT{(user)-[:POSTED]->()} AS post_count,
COUNT{(user)<-[:FOLLOWS]-()} AS follower_count
SET user.cached_post_count = post_count,
user.cached_follower_count = follower_count,
user.cache_updated = current_timestamp()
-- Use cached values in queries
MATCH (u:User)
WHERE u.cached_follower_count > 1000
RETURN u.name, u.cached_follower_count
Security Best Practices
Parameterized Queries
Never concatenate user input:
# WRONG - SQL injection vulnerable
user_input = request.args.get('email')
query = f"MATCH (u:User {{email: '{user_input}'}}) RETURN u"
# CORRECT - Parameterized
query = "MATCH (u:User {email: $email}) RETURN u"
result, _ = await client.query(query, {"email": user_input})
Input Validation
def validate_query_params(params: dict) -> dict:
"""Validate and sanitize query parameters."""
validated = {}
if 'email' in params:
email = params['email']
if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
raise ValueError("Invalid email format")
validated['email'] = email
if 'age' in params:
age = int(params['age'])
if age < 0 or age > 150:
raise ValueError("Invalid age")
validated['age'] = age
return validated
Query Debugging Tools
EXPLAIN Plans
-- View query execution plan
EXPLAIN
MATCH (u:User)-[:FRIEND]-(f:User)
WHERE u.city = 'NYC'
RETURN f.name
-- Example output:
-- NodeIndexSeek :User(city) (estimated rows: 1000)
-- Expand (u)-[:FRIEND]-(f)
-- Project f.name
PROFILE for Actual Execution
-- Profile actual execution with real data
PROFILE
MATCH (u:User)-[:FRIEND]-(f:User)
WHERE u.city = 'NYC'
RETURN f.name
-- Shows actual row counts, execution time, memory usage
Further Reading
- GQL Tutorial - Step-by-step learning path
- API Reference - Complete function reference
- Examples - Real-world query examples
- GQL Specification - ISO standard details
- Query Optimization Guide - Performance tuning
- Advanced Patterns - Complex query techniques
- Function Reference - Complete function catalog
- Best Practices - Query writing guidelines