Graph algorithms are specialized computational methods designed to analyze and extract insights from graph-structured data. Geode provides both built-in GQL capabilities and extensibility for implementing advanced graph algorithms on your connected data.
What are Graph Algorithms?
Graph algorithms solve problems inherent to network structures by leveraging the relationships between entities. Unlike traditional table-based computations, graph algorithms excel at:
- Pathfinding: Discovering routes and connections between nodes
- Centrality Analysis: Identifying influential or important nodes
- Community Detection: Finding clusters of related entities
- Similarity Computation: Measuring how closely nodes are related
- Network Flow: Optimizing resource distribution through networks
These algorithms power applications ranging from social network analysis to logistics optimization, fraud detection, and recommendation systems.
Algorithm Categories
Pathfinding Algorithms
Shortest Path
Find the shortest path between two nodes:
// Simple shortest path
MATCH path = SHORTEST (a:Person {name: 'Alice'})-[:KNOWS*]->(b:Person {name: 'Bob'})
RETURN path
// Weighted shortest path with relationship properties
MATCH path = SHORTEST (a:City {name: 'NYC'})-[r:ROAD*]-(b:City {name: 'SF'})
WHERE ALL(rel IN r WHERE rel.distance IS NOT NULL)
RETURN path, REDUCE(total = 0, rel IN r | total + rel.distance) AS total_distance
All Paths
Enumerate all paths between nodes:
// Find all paths up to length 5
MATCH path = (start:Node {id: 1})-[*1..5]->(end:Node {id: 100})
RETURN path, LENGTH(path) AS path_length
ORDER BY path_length
A* Search
Implement heuristic pathfinding with property-based weighting:
// Weighted path with heuristic
MATCH path = (start:Location)-[r:CONNECTED*]-(goal:Location)
WHERE start.name = 'Start' AND goal.name = 'Goal'
WITH path,
REDUCE(cost = 0, rel IN r | cost + rel.weight) AS total_cost,
LENGTH(path) AS hops
ORDER BY total_cost + hops * 10 // A* heuristic: cost + estimated remaining
LIMIT 1
RETURN path, total_cost
Centrality Algorithms
Degree Centrality
Count incoming and outgoing relationships:
// Out-degree centrality (influence)
MATCH (u:User)-[r:FOLLOWS]->()
RETURN u.name, COUNT(r) AS followers
ORDER BY followers DESC
LIMIT 10
// In-degree centrality (popularity)
MATCH (u:User)<-[r:FOLLOWS]-()
RETURN u.name, COUNT(r) AS following
ORDER BY following DESC
LIMIT 10
Betweenness Centrality
Identify nodes that frequently lie on shortest paths:
// Nodes that bridge communities
MATCH path = SHORTEST (a:Person)-[:KNOWS*]-(b:Person)
WHERE a.id < b.id // Avoid duplicates
WITH NODES(path) AS nodes
UNWIND nodes AS node
WITH node, COUNT(*) AS times_on_shortest_path
WHERE times_on_shortest_path > 1
RETURN node.name, times_on_shortest_path
ORDER BY times_on_shortest_path DESC
LIMIT 10
PageRank
Iterative algorithm for ranking node importance:
# Python implementation using Geode client
from geode_client import Client
async def pagerank(client, damping=0.85, iterations=20):
# Initialize PageRank scores
await client.query("""
MATCH (n:Page)
SET n.pagerank = 1.0
""")
for i in range(iterations):
# Update PageRank scores
result, _ = await client.query("""
MATCH (source:Page)-[:LINKS]->(target:Page)
WITH target, source, COUNT(*) AS out_degree
MATCH (source)-[:LINKS]->()
WITH target, SUM(source.pagerank / out_degree) AS rank_sum
SET target.new_pagerank = $damping * rank_sum + (1 - $damping)
RETURN AVG(ABS(target.new_pagerank - target.pagerank)) AS delta
""", {"damping": damping})
# Commit new scores
await client.query("MATCH (n:Page) SET n.pagerank = n.new_pagerank")
if result.rows[0]['delta'] < 0.0001:
break
return await client.query("""
MATCH (n:Page)
RETURN n.url, n.pagerank
ORDER BY n.pagerank DESC
LIMIT 20
""")
Community Detection
Label Propagation
Detect communities by propagating labels:
// Initialize labels
MATCH (n:User)
SET n.community = n.id
// Propagate (repeat multiple times)
MATCH (n:User)-[:KNOWS]-(neighbor:User)
WITH n, neighbor.community AS neighbor_label, COUNT(*) AS frequency
ORDER BY n, frequency DESC
WITH n, COLLECT(neighbor_label)[0] AS most_common
SET n.community = most_common
Connected Components
Find disconnected subgraphs:
// Identify weakly connected components
MATCH path = (n:Node)-[*]-(m:Node)
WITH n, COLLECT(DISTINCT m.id) AS component
RETURN MIN(component) AS component_id, COUNT(*) AS size
ORDER BY size DESC
Triangle Counting
Count triangles for clustering coefficient:
// Find triangles in social network
MATCH (a:User)-[:KNOWS]->(b:User)-[:KNOWS]->(c:User)-[:KNOWS]->(a)
WHERE a.id < b.id AND b.id < c.id // Avoid duplicates
RETURN COUNT(*) AS triangle_count
// Clustering coefficient per node
MATCH (u:User)-[:KNOWS]-(neighbor:User)
WITH u, COLLECT(DISTINCT neighbor) AS neighbors, COUNT(DISTINCT neighbor) AS degree
WHERE degree > 1
UNWIND neighbors AS n1
UNWIND neighbors AS n2
WITH u, n1, n2, degree
WHERE n1.id < n2.id
MATCH (n1)-[r:KNOWS]-(n2)
WITH u, COUNT(r) AS actual_triangles, degree * (degree - 1) / 2 AS possible_triangles
RETURN u.name, actual_triangles * 1.0 / possible_triangles AS clustering_coefficient
ORDER BY clustering_coefficient DESC
Similarity Algorithms
Jaccard Similarity
Compute similarity based on common neighbors:
// Find similar users by common interests
MATCH (u1:User {id: 123})-[:LIKES]->(item:Item)<-[:LIKES]-(u2:User)
WHERE u1.id <> u2.id
WITH u1, u2, COUNT(DISTINCT item) AS common_interests
MATCH (u1)-[:LIKES]->(item1:Item)
WITH u1, u2, common_interests, COUNT(DISTINCT item1) AS u1_interests
MATCH (u2)-[:LIKES]->(item2:Item)
WITH u1, u2, common_interests, u1_interests, COUNT(DISTINCT item2) AS u2_interests
WITH u1, u2,
common_interests * 1.0 / (u1_interests + u2_interests - common_interests) AS jaccard
WHERE jaccard > 0.3
RETURN u2.name, jaccard
ORDER BY jaccard DESC
LIMIT 10
Cosine Similarity
Measure similarity using property vectors:
# Python implementation with embeddings
async def cosine_similarity(client, user_id, limit=10):
result, _ = await client.query("""
MATCH (u1:User {id: $user_id})
MATCH (u2:User) WHERE u2.id <> $user_id
WITH u1, u2,
REDUCE(dot = 0.0, i IN RANGE(0, SIZE(u1.embedding) - 1) |
dot + u1.embedding[i] * u2.embedding[i]) AS dot_product,
SQRT(REDUCE(norm1 = 0.0, val IN u1.embedding | norm1 + val * val)) AS norm1,
SQRT(REDUCE(norm2 = 0.0, val IN u2.embedding | norm2 + val * val)) AS norm2
WITH u2, dot_product / (norm1 * norm2) AS similarity
WHERE similarity > 0.5
RETURN u2.name, similarity
ORDER BY similarity DESC
LIMIT $limit
""", {"user_id": user_id, "limit": limit})
return result
Advanced Algorithm Patterns
Iterative Algorithms
Many graph algorithms require multiple passes:
// Go implementation: iterative belief propagation
func beliefPropagation(db *sql.DB, iterations int) error {
for i := 0; i < iterations; i++ {
_, err := db.Exec(`
MATCH (n:Node)-[r:EDGE]-(m:Node)
WITH n, AVG(m.belief) AS avg_belief
SET n.new_belief = 0.5 * n.prior + 0.5 * avg_belief
`)
if err != nil {
return err
}
_, err = db.Exec(`MATCH (n:Node) SET n.belief = n.new_belief`)
if err != nil {
return err
}
}
return nil
}
Recursive Algorithms
Depth-first search and traversals:
// Recursive category hierarchy traversal
MATCH path = (root:Category {name: 'Electronics'})-[:SUBCATEGORY*]->(leaf:Category)
WHERE NOT EXISTS { MATCH (leaf)-[:SUBCATEGORY]->() }
RETURN leaf.name, LENGTH(path) AS depth
ORDER BY depth DESC
Parallel Algorithm Execution
Execute independent computations in parallel:
// Rust: parallel centrality computation
use tokio::task::JoinSet;
async fn parallel_centrality(client: &GeodeClient, node_ids: Vec<i64>) -> Result<Vec<f64>> {
let mut tasks = JoinSet::new();
for node_id in node_ids {
let client = client.clone();
tasks.spawn(async move {
let result = client.query(&format!(
"MATCH (n {{id: {}}})-[r]->() RETURN COUNT(r) AS degree",
node_id
)).await?;
Ok::<_, Error>(result[0]["degree"].as_f64().unwrap_or(0.0))
});
}
let mut centralities = Vec::new();
while let Some(result) = tasks.join_next().await {
centralities.push(result??);
}
Ok(centralities)
}
Performance Optimization
Algorithm Complexity
Be aware of computational complexity:
- Shortest Path (unweighted): O(V + E) with BFS
- Shortest Path (weighted): O(E + V log V) with Dijkstra
- PageRank: O(iterations × E)
- Triangle Counting: O(E^1.5) worst case
- Label Propagation: O(iterations × E)
Optimization Techniques
- Limit Traversal Depth: Use path length limits to avoid expensive deep traversals
- Filter Early: Apply WHERE clauses before expensive operations
- Use Indexes: Index properties used in algorithm filters
- Batch Processing: Process large graphs in chunks
- Materialize Results: Store computed metrics as properties for reuse
// Optimize by limiting depth and filtering early
MATCH path = (start:User {id: $id})-[:KNOWS*1..4]-(end:User)
WHERE end.active = true // Filter early
WITH path, LENGTH(path) AS depth
WHERE depth <= 3 // Limit depth
RETURN end.name, depth
ORDER BY depth
LIMIT 100
Algorithm Libraries
Custom Algorithm Implementation
Extend Geode with custom algorithms:
class GraphAlgorithms:
def __init__(self, client):
self.client = client
async def detect_cycles(self, start_node_id):
"""Detect cycles using depth-first search"""
visited = set()
stack = set()
cycles = []
async def dfs(node_id, path):
if node_id in stack:
cycles.append(path + [node_id])
return
if node_id in visited:
return
visited.add(node_id)
stack.add(node_id)
result = await self.client.query("""
MATCH (n {id: $id})-[:EDGE]->(m)
RETURN m.id AS next_id
""", {"id": node_id})
for row in result.rows:
await dfs(row['next_id'], path + [node_id])
stack.remove(node_id)
await dfs(start_node_id, [])
return cycles
Real-World Applications
Social Network Analysis
// Identify influencers
MATCH (u:User)
OPTIONAL MATCH (u)<-[:FOLLOWS]-(follower:User)
WITH u, COUNT(follower) AS followers
OPTIONAL MATCH (u)-[:POSTED]->(p:Post)<-[:LIKED]-(liker:User)
WITH u, followers, COUNT(DISTINCT liker) AS total_likes
RETURN u.name, followers, total_likes, (followers + total_likes * 0.5) AS influence_score
ORDER BY influence_score DESC
LIMIT 20
Fraud Detection
// Detect suspicious transaction rings
MATCH path = (a:Account)-[:TRANSFERRED*3..6]->(a)
WHERE ALL(r IN relationships(path) WHERE r.amount > 10000)
WITH path, REDUCE(total = 0, r IN relationships(path) | total + r.amount) AS ring_total
WHERE ring_total > 100000
RETURN path, ring_total
ORDER BY ring_total DESC
Recommendation Systems
// Collaborative filtering recommendations
MATCH (u:User {id: $user_id})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar:User)
WHERE similar.id <> $user_id
WITH similar, COUNT(p) AS common_purchases
ORDER BY common_purchases DESC
LIMIT 10
MATCH (similar)-[:PURCHASED]->(recommendation:Product)
WHERE NOT EXISTS {
MATCH (u:User {id: $user_id})-[:PURCHASED]->(recommendation)
}
WITH recommendation, COUNT(similar) AS recommender_count
RETURN recommendation.name, recommender_count
ORDER BY recommender_count DESC
LIMIT 5
Best Practices
- Understand Complexity: Profile algorithms with realistic data before production deployment
- Set Limits: Always bound path lengths and iteration counts
- Use Sampling: For large graphs, sample subgraphs for approximate results
- Cache Results: Materialize frequently computed metrics as node properties
- Monitor Performance: Track algorithm execution time and resource usage
- Parallelize When Possible: Leverage concurrent execution for independent computations
- Validate Results: Test algorithms on known datasets with expected outcomes
Common Pitfalls
- Unbounded Traversals: Not limiting path depth can cause exponential explosion
- Dense Node Performance: Algorithms on hub nodes (millions of relationships) can be slow
- Ignoring Direction: Forgetting to consider relationship direction in directed graphs
- Memory Constraints: Large intermediate results can exceed available memory
- Algorithm Mismatch: Using wrong algorithm for the problem (e.g., shortest path when all paths needed)
Related Topics
- Graph Algorithms - Graph algorithm implementations
- Query Optimization - Tune GQL queries for algorithm performance
- Indexing - Create indexes to accelerate algorithm execution
- Performance - Optimize database configuration for graph algorithms
Further Reading
- Graph Algorithms Tutorial - Complete graph algorithms tutorial
- GQL Reference - Complete language documentation
- Client Libraries - Implement algorithms in Python, Go, Rust, Zig
Advanced Pathfinding Algorithms
K-Shortest Paths
Find multiple alternative paths:
async def k_shortest_paths(client, start_id, end_id, k=5):
"""Find k shortest paths between nodes using Yen's algorithm"""
paths = []
# Find first shortest path
result, _ = await client.query("""
MATCH path = shortestPath((start {id: $start})-[*]-(end {id: $end}))
RETURN path
""", {"start": start_id, "end": end_id})
if not result:
return paths
paths.append(result.rows[0]['path'])
# Find k-1 additional paths
for _ in range(k - 1):
candidates = []
# For each spur node in previous path
for i in range(len(paths[-1].nodes) - 1):
# Remove edges from previous paths
# Find detour through remaining graph
# Add to candidates
pass # Implementation details
if not candidates:
break
# Add best candidate to paths
paths.append(min(candidates, key=lambda p: path_cost(p)))
return paths
Bidirectional Search
Faster pathfinding for long paths:
-- Search from both ends simultaneously
MATCH forward = (start {id: $start})-[*1..10]->(mid)
MATCH backward = (end {id: $end})-[*1..10]->(mid)
WHERE start.id <> end.id
RETURN forward + backward AS complete_path,
length(forward) + length(backward) AS total_length
ORDER BY total_length
LIMIT 1
Constrained Pathfinding
Paths that satisfy conditions:
-- Path through specific node types
MATCH path = (start:Airport {code: 'SFO'})-[:FLIGHT*]-(end:Airport {code: 'JFK'})
WHERE ALL(node IN nodes(path) WHERE node:Airport OR node:Hub)
AND NONE(rel IN relationships(path) WHERE rel.canceled = true)
RETURN path,
REDUCE(cost = 0, rel IN relationships(path) | cost + rel.price) AS total_cost
ORDER BY total_cost
LIMIT 1
-- Time-constrained paths
MATCH path = (start:Station)-[:ROUTE*]->(end:Station)
WHERE start.name = 'Downtown' AND end.name = 'Airport'
AND REDUCE(time = 0, rel IN relationships(path) | time + rel.travel_minutes) <= 60
RETURN path
ORDER BY length(path)
LIMIT 1
Production Centrality Analysis
Eigenvector Centrality
Measure influence based on connection quality:
import numpy as np
async def eigenvector_centrality(client, iterations=100, tolerance=1e-6):
"""Compute eigenvector centrality using power iteration"""
# Get adjacency matrix
nodes, _ = await client.query("MATCH (n) RETURN n.id AS id ORDER BY id")
node_ids = [row['id'] for row in nodes]
n = len(node_ids)
id_to_index = {node_id: i for i, node_id in enumerate(node_ids)}
# Build adjacency matrix
adj = np.zeros((n, n))
edges, _ = await client.query("""
MATCH (a)-[:CONNECTED]->(b)
RETURN a.id AS source, b.id AS target
""")
for edge in edges:
i = id_to_index[edge['source']]
j = id_to_index[edge['target']]
adj[i][j] = 1
# Power iteration
x = np.ones(n) / n
for _ in range(iterations):
x_new = adj @ x
x_new = x_new / np.linalg.norm(x_new)
if np.linalg.norm(x_new - x) < tolerance:
break
x = x_new
# Store results back
for node_id, score in zip(node_ids, x):
await client.execute("""
MATCH (n {id: $id})
SET n.eigenvector_centrality = $score
""", {"id": node_id, "score": float(score)})
return dict(zip(node_ids, x))
Closeness Centrality
Measure average distance to all other nodes:
-- Average shortest path length from each node
MATCH (source:Person)
MATCH (target:Person)
WHERE source.id <> target.id
WITH source,
REDUCE(total = 0, path IN
COLLECT(shortestPath((source)-[*]-(target))) |
total + length(path)) AS sum_distances,
COUNT(target) AS target_count
WITH source,
(target_count - 1) / sum_distances AS closeness
RETURN source.name, closeness
ORDER BY closeness DESC
LIMIT 10
Harmonic Centrality
Robust variant of closeness centrality:
-- Sum of reciprocal distances
MATCH (source:Person)
OPTIONAL MATCH path = shortestPath((source)-[*]-(target:Person))
WHERE source.id <> target.id
WITH source,
SUM(1.0 / length(path)) AS harmonic_centrality
RETURN source.name, harmonic_centrality
ORDER BY harmonic_centrality DESC
Community Detection Algorithms
Louvain Modularity
Hierarchical community detection:
async def louvain_algorithm(client, resolution=1.0):
"""Louvain community detection algorithm"""
# Phase 1: Assign each node to own community
await client.execute("""
MATCH (n)
SET n.community = n.id
""")
improved = True
iteration = 0
while improved and iteration < 100:
improved = False
iteration += 1
# For each node, try moving to neighbor's community
nodes, _ = await client.query("MATCH (n) RETURN n.id AS id")
for node in nodes:
node_id = node['id']
# Get current modularity
current_modularity = await compute_modularity(client)
# Try each neighbor's community
neighbors, _ = await client.query("""
MATCH (n {id: $id})-[:CONNECTED]-(neighbor)
RETURN DISTINCT neighbor.community AS community
""", {"id": node_id})
best_community = None
best_modularity = current_modularity
for neighbor in neighbors:
# Try moving to this community
test_modularity = await test_community_move(
client, node_id, neighbor['community']
)
if test_modularity > best_modularity:
best_modularity = test_modularity
best_community = neighbor['community']
# Move to best community if improvement found
if best_community is not None:
await client.execute("""
MATCH (n {id: $id})
SET n.community = $community
""", {"id": node_id, "community": best_community})
improved = True
# Phase 2: Build super-graph of communities
if not improved:
await build_community_graph(client)
return await get_community_assignments(client)
Girvan-Newman Algorithm
Edge betweenness-based community detection:
-- Iteratively remove edges with highest betweenness
MATCH path = allShortestPaths((a:Node)-[*]-(b:Node))
WHERE a.id < b.id
UNWIND relationships(path) AS edge
WITH edge, COUNT(*) AS betweenness
ORDER BY betweenness DESC
LIMIT 1
DELETE edge
-- Repeat until desired number of communities reached
Strongly Connected Components
Find maximal strongly connected subgraphs:
async def tarjan_scc(client):
"""Tarjan's algorithm for strongly connected components"""
# Get all nodes
nodes, _ = await client.query("MATCH (n) RETURN n.id AS id")
node_list = [n['id'] for n in nodes]
index = 0
stack = []
indices = {}
lowlinks = {}
on_stack = set()
sccs = []
async def strongconnect(node_id):
nonlocal index
indices[node_id] = index
lowlinks[node_id] = index
index += 1
stack.append(node_id)
on_stack.add(node_id)
# Get successors
successors, _ = await client.query("""
MATCH (n {id: $id})-[:EDGE]->(successor)
RETURN successor.id AS id
""", {"id": node_id})
for succ in successors:
succ_id = succ['id']
if succ_id not in indices:
await strongconnect(succ_id)
lowlinks[node_id] = min(lowlinks[node_id], lowlinks[succ_id])
elif succ_id in on_stack:
lowlinks[node_id] = min(lowlinks[node_id], indices[succ_id])
# Root of SCC
if lowlinks[node_id] == indices[node_id]:
scc = []
while True:
w = stack.pop()
on_stack.remove(w)
scc.append(w)
if w == node_id:
break
sccs.append(scc)
for node_id in node_list:
if node_id not in indices:
await strongconnect(node_id)
return sccs
Advanced Similarity Algorithms
SimRank
Iterative similarity based on neighborhood:
async def simrank(client, decay=0.8, max_iterations=10):
"""SimRank similarity algorithm"""
nodes, _ = await client.query("MATCH (n) RETURN n.id AS id")
node_list = [n['id'] for n in nodes]
n = len(node_list)
# Initialize similarity matrix
sim = {(a, b): 1.0 if a == b else 0.0
for a in node_list for b in node_list}
for iteration in range(max_iterations):
sim_new = {}
for a in node_list:
for b in node_list:
if a == b:
sim_new[(a, b)] = 1.0
continue
# Get in-neighbors
a_in = await get_in_neighbors(client, a)
b_in = await get_in_neighbors(client, b)
if not a_in or not b_in:
sim_new[(a, b)] = 0.0
continue
# Compute similarity
total = sum(sim[(i, j)] for i in a_in for j in b_in)
sim_new[(a, b)] = (decay / (len(a_in) * len(b_in))) * total
sim = sim_new
# Store similarities
for (a, b), score in sim.items():
if a < b and score > 0.1: # Store only significant similarities
await client.execute("""
MATCH (a {id: $a}), (b {id: $b})
MERGE (a)-[s:SIMILAR_TO]->(b)
SET s.simrank_score = $score
""", {"a": a, "b": b, "score": score})
return sim
Adamic-Adar Index
Link prediction scoring:
-- Predict future links
MATCH (a:User {id: $user_id})
MATCH (a)-[:KNOWS]-(mutual)-[:KNOWS]-(b:User)
WHERE a.id <> b.id
AND NOT EXISTS { MATCH (a)-[:KNOWS]-(b) }
WITH b, COUNT(DISTINCT mutual) AS mutual_friends,
SUM(1.0 / log(COUNT{ MATCH (mutual)-[:KNOWS]-() })) AS aa_score
RETURN b.name, mutual_friends, aa_score
ORDER BY aa_score DESC
LIMIT 10
Machine Learning Integration
Graph Embedding
Convert graph structure to vectors:
from node2vec import Node2Vec
import networkx as nx
async def generate_embeddings(client, dimensions=128):
"""Generate node embeddings using node2vec"""
# Export graph to NetworkX
G = nx.Graph()
nodes, _ = await client.query("MATCH (n) RETURN n.id AS id")
for node in nodes:
G.add_node(node['id'])
edges, _ = await client.query("""
MATCH (a)-[:CONNECTED]-(b)
RETURN a.id AS source, b.id AS target
""")
for edge in edges:
G.add_edge(edge['source'], edge['target'])
# Generate embeddings
node2vec = Node2Vec(G, dimensions=dimensions, walk_length=30, num_walks=200)
model = node2vec.fit(window=10, min_count=1, batch_words=4)
# Store embeddings back
for node_id in G.nodes():
embedding = model.wv[node_id].tolist()
await client.execute("""
MATCH (n {id: $id})
SET n.embedding = $embedding
""", {"id": node_id, "embedding": embedding})
return model
Graph Neural Networks
Classification using GNN:
import torch
import torch_geometric
async def train_gnn_classifier(client, num_classes, hidden_channels=64):
"""Train GNN for node classification"""
# Load graph data
data = await load_graph_data(client)
class GCN(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = GCNConv(data.num_features, hidden_channels)
self.conv2 = GCNConv(hidden_channels, num_classes)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index)
x = x.relu()
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index)
return x
model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
# Training loop
model.train()
for epoch in range(200):
optimizer.zero_grad()
out = model(data.x, data.edge_index)
loss = criterion(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
# Store predictions
model.eval()
pred = model(data.x, data.edge_index).argmax(dim=1)
for node_id, prediction in zip(data.node_ids, pred):
await client.execute("""
MATCH (n {id: $id})
SET n.predicted_class = $class
""", {"id": node_id, "class": int(prediction)})
return model
Algorithm Performance Optimization
Parallel Execution
Distribute computation across cores:
import asyncio
from concurrent.futures import ThreadPoolExecutor
async def parallel_centrality_computation(client, node_ids):
"""Compute centrality for nodes in parallel"""
async def compute_node_centrality(node_id):
result, _ = await client.query("""
MATCH (n {id: $id})-[r]-()
RETURN COUNT(r) AS degree
""", {"id": node_id})
return node_id, result.rows[0]['degree']
# Create tasks for all nodes
tasks = [compute_node_centrality(nid) for nid in node_ids]
# Execute in parallel (limited concurrency)
semaphore = asyncio.Semaphore(50)
async def bounded_task(task):
async with semaphore:
return await task
results = await asyncio.gather(*[bounded_task(t) for t in tasks])
return dict(results)
Approximate Algorithms
Trade accuracy for speed:
-- Approximate PageRank with sampling
MATCH (n:Node)
WITH n, rand() AS random
WHERE random < 0.1 // Sample 10% of nodes
MATCH (n)-[:CONNECTED*1..5]-(connected)
WITH n, COUNT(DISTINCT connected) AS approx_reach
RETURN n.id, approx_reach
ORDER BY approx_reach DESC
LIMIT 100
Incremental Computation
Update algorithms without full recomputation:
async def incremental_pagerank(client, new_edge):
"""Update PageRank incrementally after edge addition"""
# Only recompute affected nodes
affected, _ = await client.query("""
MATCH path = (source {id: $source})-[:CONNECTED*0..2]-(affected)
RETURN DISTINCT affected.id AS id
""", {"source": new_edge['source']})
# Recompute only affected subgraph
for node in affected:
await update_pagerank_for_node(client, node['id'])
Monitoring and Debugging Algorithms
Algorithm Profiling
Track execution performance:
import time
async def profile_algorithm(algorithm_func, *args, **kwargs):
"""Profile algorithm execution"""
start_time = time.time()
start_memory = get_memory_usage()
result = await algorithm_func(*args, **kwargs)
end_time = time.time()
end_memory = get_memory_usage()
print(f"Execution time: {end_time - start_time:.2f}s")
print(f"Memory used: {end_memory - start_memory:.2f}MB")
print(f"Results: {len(result.rows)} items")
return result
Visualization
Visualize algorithm results:
import matplotlib.pyplot as plt
import networkx as nx
async def visualize_communities(client):
"""Visualize community detection results"""
G = nx.Graph()
# Load graph with community labels
edges, _ = await client.query("""
MATCH (a)-[:CONNECTED]-(b)
RETURN a.id AS source, b.id AS target,
a.community AS comm_a, b.community AS comm_b
""")
for edge in edges:
G.add_edge(edge['source'], edge['target'])
# Color by community
communities, _ = await client.query("""
MATCH (n)
RETURN n.id AS id, n.community AS community
""")
community_map = {n['id']: n['community'] for n in communities}
colors = [community_map[node] for node in G.nodes()]
# Draw
pos = nx.spring_layout(G)
nx.draw(G, pos, node_color=colors, with_labels=True, cmap=plt.cm.rainbow)
plt.savefig('communities.png')
Graph algorithms unlock the analytical power of connected data. Start with simple pathfinding and centrality computations, then progress to advanced community detection and machine learning integration as your graph expertise grows. Geode’s GQL-native implementation provides excellent performance for production workloads while maintaining code clarity and maintainability.