Graph Algorithms & Analytics in Geode
Graph algorithms unlock powerful analytical capabilities in Geode, enabling you to extract insights from connected data that would be impossible with traditional relational queries. From finding shortest paths and detecting communities to ranking influential nodes and identifying anomalies, graph algorithms are essential tools for modern data science and analytics.
Geode provides optimized implementations of classic and advanced graph algorithms, all accessible through GQL queries. These algorithms leverage Geode’s native graph storage and parallel processing capabilities to deliver high performance even on massive graphs with billions of nodes and relationships.
This comprehensive guide explores essential graph algorithms, their mathematical foundations, complexity analysis, implementation patterns, and optimization strategies for production deployments.
Theoretical Foundations
Graph Complexity Classes
Understanding algorithmic complexity is crucial for choosing the right algorithm for your data size:
Polynomial Time (P):
- Single-source shortest path: O(V + E) with BFS, O((V + E) log V) with Dijkstra
- Minimum spanning tree: O(E log V) with Kruskal’s, O(E + V log V) with Prim’s
- PageRank (fixed iterations): O(k × E) where k is iteration count
- Strongly connected components: O(V + E) with Tarjan’s algorithm
NP-Complete (Intractable):
- Hamiltonian path/cycle: O(2^V × V^2)
- Graph coloring: O(k^V) where k is number of colors
- Traveling salesman problem (TSP): O(V! / 2)
- Maximum clique: O(3^(V/3))
Approximation Algorithms (when exact solutions are impractical):
- Community detection: Modularity optimization O(V × E)
- Betweenness centrality sampling: O(k × V × E) where k « V
- Graph partitioning: Spectral methods O(V^3)
Graph Properties and Invariants
Degree Distribution: Characterizes network topology
- Regular graphs: All nodes have same degree
- Scale-free networks: Power-law degree distribution P(k) ~ k^(-γ)
- Small-world networks: High clustering, short average path length
Clustering Coefficient: Measures local density
C(v) = 2 × |{(u,w) : u,w ∈ N(v) ∧ (u,w) ∈ E}| / (deg(v) × (deg(v) - 1))
Graph Diameter: Maximum shortest path between any two nodes Graph Density: Ratio of actual edges to possible edges = E / (V × (V-1) / 2)
Pathfinding Algorithms
Shortest Path (Dijkstra):
-- Find shortest path between two nodes
MATCH path = shortestPath(
(start:Location {name: 'New York'})-[:ROUTE*]-(end:Location {name: 'Los Angeles'})
)
RETURN path, length(path) as hops;
-- Weighted shortest path
MATCH path = shortestPath(
(start:Location {name: 'New York'})-[r:ROUTE*]-(end:Location {name: 'Los Angeles'})
)
RETURN path, reduce(dist = 0, rel in relationships(path) | dist + rel.distance) as total_distance;
All Shortest Paths:
-- Find all shortest paths (when multiple exist)
MATCH paths = allShortestPaths(
(a:Person {id: 123})-[:KNOWS*]-(b:Person {id: 456})
)
RETURN paths, length(paths) as path_length;
K-Shortest Paths:
-- Find top K shortest paths
MATCH path = (start:City {name: 'Boston'})-[:CONNECTED*1..10]-(end:City {name: 'Seattle'})
WITH path, reduce(cost = 0, r in relationships(path) | cost + r.weight) as total_cost
ORDER BY total_cost
LIMIT 5
RETURN path, total_cost;
Centrality Algorithms
Degree Centrality:
-- Calculate in-degree (followers)
MATCH (u:User)
OPTIONAL MATCH (u)<-[:FOLLOWS]-(follower:User)
RETURN u.name, count(follower) as in_degree
ORDER BY in_degree DESC
LIMIT 10;
-- Out-degree (following)
MATCH (u:User)
OPTIONAL MATCH (u)-[:FOLLOWS]->(following:User)
RETURN u.name, count(following) as out_degree
ORDER BY out_degree DESC;
-- Total degree
MATCH (u:User)
OPTIONAL MATCH (u)-[:KNOWS]-(connected:User)
RETURN u.name, count(DISTINCT connected) as degree
ORDER BY degree DESC;
Betweenness Centrality (nodes on shortest paths):
-- Simplified betweenness calculation
MATCH (n:Person)
MATCH path = shortestPath((a:Person)-[:KNOWS*]-(b:Person))
WHERE n IN nodes(path) AND a <> n AND b <> n
WITH n, count(path) as paths_through
RETURN n.name, paths_through
ORDER BY paths_through DESC
LIMIT 10;
Closeness Centrality (average distance to others):
-- Measure how close a node is to all others
MATCH (u:User {id: 123})
MATCH path = shortestPath((u)-[:KNOWS*]-(other:User))
WHERE other <> u
WITH u, avg(length(path)) as avg_distance
RETURN u.name, 1.0 / avg_distance as closeness_centrality;
PageRank and Influence
Basic PageRank Implementation:
-- Simplified PageRank-style scoring
MATCH (u:User)<-[:FOLLOWS]-(follower:User)
WITH u, count(follower) as direct_followers
MATCH (follower)<-[:FOLLOWS]-(indirect:User)
RETURN
u.name,
direct_followers,
count(DISTINCT indirect) as indirect_reach,
direct_followers * 1.0 + count(DISTINCT indirect) * 0.15 as influence_score
ORDER BY influence_score DESC;
Weighted Influence:
-- Account for follower quality
MATCH (u:User)<-[f:FOLLOWS]-(follower:User)
OPTIONAL MATCH (follower)<-[:FOLLOWS]-(follower_of_follower:User)
WITH u, follower, count(follower_of_follower) as follower_influence
RETURN
u.name,
sum(follower_influence) as weighted_influence
ORDER BY weighted_influence DESC;
Community Detection
Connected Components:
-- Find connected components
MATCH (start:User {id: 123})-[:KNOWS*]-(connected:User)
RETURN collect(DISTINCT connected.id) as community_members,
count(DISTINCT connected) as community_size;
Triangle Counting (clustered communities):
-- Count triangles (A knows B, B knows C, C knows A)
MATCH (a:User)-[:KNOWS]->(b:User)-[:KNOWS]->(c:User)-[:KNOWS]->(a)
RETURN count(*) / 3 as triangle_count;
-- Local clustering coefficient
MATCH (u:User {id: 123})-[:KNOWS]-(neighbor:User)
WITH u, collect(DISTINCT neighbor) as neighbors, count(DISTINCT neighbor) as degree
UNWIND neighbors as n1
UNWIND neighbors as n2
MATCH (n1)-[r:KNOWS]-(n2)
WHERE n1 <> n2
WITH u, degree, count(DISTINCT r) as connected_pairs
RETURN u.name,
degree,
connected_pairs * 2.0 / (degree * (degree - 1)) as clustering_coefficient;
Label Propagation (community discovery):
-- Simplified label propagation
MATCH (u:User)-[:KNOWS]-(neighbor:User)
WITH u, neighbor.community_id as neighbor_community, count(*) as freq
ORDER BY freq DESC
LIMIT 1
SET u.community_id = neighbor_community
RETURN u.name, u.community_id;
Similarity and Recommendations
Jaccard Similarity (common connections):
-- Find similar users by shared connections
MATCH (u1:User {id: 123})-[:FOLLOWS]->(common:User)<-[:FOLLOWS]-(u2:User)
WHERE u1 <> u2
WITH u1, u2, count(DISTINCT common) as common_count
MATCH (u1)-[:FOLLOWS]->(u1_following:User)
MATCH (u2)-[:FOLLOWS]->(u2_following:User)
WITH u1, u2, common_count,
count(DISTINCT u1_following) as u1_total,
count(DISTINCT u2_following) as u2_total
RETURN u2.name,
common_count * 1.0 / (u1_total + u2_total - common_count) as jaccard_similarity
ORDER BY jaccard_similarity DESC
LIMIT 10;
Collaborative Filtering:
-- Recommend products based on similar users' purchases
MATCH (u:User {id: 123})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar:User)
WHERE similar <> u
WITH similar, count(DISTINCT p) as common_purchases
ORDER BY common_purchases DESC
LIMIT 50
MATCH (similar)-[:PURCHASED]->(recommendation:Product)
WHERE NOT EXISTS {(u)-[:PURCHASED]->(recommendation)}
RETURN recommendation.name,
count(*) as recommendation_score
ORDER BY recommendation_score DESC
LIMIT 10;
Cosine Similarity (vector-based):
-- Compare users by behavior vectors
MATCH (u1:User {id: 123})-[r1:RATED]->(item:Product)<-[r2:RATED]-(u2:User)
WHERE u1 <> u2
WITH u1, u2,
sum(r1.rating * r2.rating) as dot_product,
sqrt(sum(r1.rating * r1.rating)) as u1_magnitude,
sqrt(sum(r2.rating * r2.rating)) as u2_magnitude
RETURN u2.name,
dot_product / (u1_magnitude * u2_magnitude) as cosine_similarity
ORDER BY cosine_similarity DESC;
Graph Traversal Algorithms
Breadth-First Search (BFS):
-- Explore network level by level
MATCH path = (start:User {id: 123})-[:KNOWS*1..4]-(connected:User)
RETURN connected.name, length(path) as distance
ORDER BY distance, connected.name;
Depth-First Search (DFS):
-- Deep exploration of paths
MATCH path = (start:User {id: 123})-[:FOLLOWS*1..5]->(reached:User)
RETURN DISTINCT reached.name, length(path) as depth
ORDER BY depth DESC, reached.name;
Cycle Detection:
-- Find cycles in the graph
MATCH path = (n:Node)-[:CONNECTS*]->(n)
WHERE length(path) > 1
RETURN path, length(path) as cycle_length
ORDER BY cycle_length
LIMIT 10;
Flow and Optimization
Maximum Flow (network capacity):
-- Simplified max flow calculation
MATCH path = (source:Node {id: 'A'})-[r:PIPE*]->(sink:Node {id: 'B'})
WITH path, reduce(flow = 999999, rel in relationships(path) |
CASE WHEN rel.capacity < flow THEN rel.capacity ELSE flow END
) as path_capacity
RETURN path, path_capacity
ORDER BY path_capacity DESC;
Minimum Spanning Tree:
-- Connect all nodes with minimum total edge weight
MATCH (n:Location)
WITH collect(n) as nodes
MATCH path = (a:Location)-[r:ROAD]-(b:Location)
WHERE a IN nodes AND b IN nodes
RETURN path, r.distance as weight
ORDER BY weight;
Anomaly Detection
Unusual Pattern Detection:
-- Find users with anomalous behavior
MATCH (u:User)
OPTIONAL MATCH (u)-[:PURCHASED]->(p:Product)
WITH u, count(p) as purchase_count,
avg(p.price) as avg_spend
WITH avg(purchase_count) as avg_purchases,
stddev(purchase_count) as stddev_purchases,
avg(avg_spend) as global_avg_spend,
stddev(avg_spend) as stddev_spend
MATCH (u:User)
OPTIONAL MATCH (u)-[:PURCHASED]->(p:Product)
WITH u, count(p) as user_purchases, avg(p.price) as user_avg_spend,
avg_purchases, stddev_purchases, global_avg_spend, stddev_spend
WHERE abs(user_purchases - avg_purchases) > 3 * stddev_purchases
OR abs(user_avg_spend - global_avg_spend) > 3 * stddev_spend
RETURN u.name, user_purchases, user_avg_spend;
Fraud Ring Detection:
-- Identify tightly connected suspicious accounts
MATCH (suspect:Account {flagged: true})-[:TRANSACTED_WITH*1..2]-(connected:Account)
WITH suspect, collect(DISTINCT connected) as ring_members
WHERE size(ring_members) > 5
MATCH (m1:Account)-[:TRANSACTED_WITH]-(m2:Account)
WHERE m1 IN ring_members AND m2 IN ring_members
RETURN suspect.id, ring_members, count(*) as internal_transactions
ORDER BY internal_transactions DESC;
Graph Metrics and Statistics
Diameter (longest shortest path):
-- Calculate graph diameter
MATCH path = shortestPath((a:Node)-[*]-(b:Node))
WHERE a < b
RETURN max(length(path)) as diameter;
Density (edge to node ratio):
-- Calculate graph density
MATCH (n:Node)
WITH count(n) as node_count
MATCH ()-[r:EDGE]-()
RETURN count(r) * 1.0 / (node_count * (node_count - 1)) as density;
Degree Distribution:
-- Analyze degree distribution
MATCH (n:Node)
OPTIONAL MATCH (n)-[r]-(m:Node)
WITH n, count(r) as degree
RETURN degree, count(n) as node_count
ORDER BY degree;
Performance Optimization for Algorithms
Limit Traversal Depth:
-- Always bound variable-length patterns
MATCH path = (a)-[*1..6]-(b) -- Good: bounded
-- MATCH path = (a)-[*]-(b) -- Bad: unbounded
RETURN path;
Use Early Filtering:
-- Filter before expensive traversals
MATCH (start:User)
WHERE start.active = true -- Filter first
AND start.country = 'US'
MATCH path = (start)-[:KNOWS*1..3]-(connected:User)
RETURN path;
Leverage Indexes:
-- Start with indexed lookups
CREATE INDEX user_id ON :User(id);
MATCH (u:User {id: 123}) -- Uses index
MATCH path = (u)-[:FOLLOWS*1..4]->(reached:User)
RETURN path;
Parallel Processing:
-- Break large computations into batches
MATCH (u:User)
WHERE u.id >= 0 AND u.id < 1000 -- Batch 1
WITH u
MATCH (u)-[:KNOWS]-(friend:User)
RETURN u.id, count(friend) as degree;
Use Cases by Industry
Social Networks: Influence ranking, friend recommendations, community detection
Fraud Detection: Ring identification, anomaly detection, pattern analysis
Supply Chain: Shortest paths, bottleneck identification, flow optimization
Recommendation Engines: Collaborative filtering, similarity scoring, personalization
Network Security: Intrusion path analysis, vulnerability assessment, access patterns
Biology: Protein interaction analysis, pathway discovery, disease networks
Best Practices
- Always bound variable-length paths to prevent explosions
- Use appropriate indexes for algorithm starting points
- Consider graph density when choosing algorithms
- Profile algorithm performance on representative data
- Implement incremental updates for frequently computed metrics
- Cache results for expensive computations
- Use sampling for approximate results on huge graphs
- Parallelize independent computations across nodes
- Monitor memory usage for large traversals
- Validate results on known test cases
Related Topics
- Query Patterns
- Performance Optimization
- PageRank Algorithm
- Community Detection
- Pattern Matching
- Analytics and ML
Advanced Algorithm Implementations
All-Pairs Shortest Paths (Floyd-Warshall)
Compute distances between all node pairs for dense graphs:
-- Initialize distance matrix
MATCH (source:Node), (target:Node)
WHERE source <> target
MERGE (source)-[d:DISTANCE]->(target)
SET d.distance = CASE
WHEN EXISTS((source)-[:CONNECTED]-(target)) THEN 1
ELSE 999999
END;
-- Floyd-Warshall iterations
MATCH (i:Node), (j:Node), (k:Node)
WHERE i <> j AND j <> k AND i <> k
MATCH (i)-[ij:DISTANCE]->(j)
MATCH (i)-[ik:DISTANCE]->(k)
MATCH (k)-[kj:DISTANCE]->(j)
WITH i, j, ij, ik.distance + kj.distance AS new_distance
WHERE new_distance < ij.distance
SET ij.distance = new_distance;
-- Query all-pairs distances
MATCH (a:Node {id: $node_a})-[d:DISTANCE]->(b:Node {id: $node_b})
RETURN a.id, b.id, d.distance AS shortest_distance;
Complexity: O(V³), suitable for V < 1000 Memory: O(V²) for distance matrix Use case: Dense graphs, transitive closure, diameter calculation
Bellman-Ford Algorithm (Negative Weights)
Handle graphs with negative edge weights:
-- Initialize distances
MATCH (n:Node)
SET n.distance = CASE
WHEN n.id = $source_id THEN 0
ELSE 999999
END;
-- Relax edges V-1 times
MATCH (u:Node)-[r:EDGE]->(v:Node)
WHERE u.distance + r.weight < v.distance
SET v.distance = u.distance + r.weight,
v.predecessor = u.id;
-- Detect negative cycles
MATCH (u:Node)-[r:EDGE]->(v:Node)
WHERE u.distance + r.weight < v.distance
RETURN 'Negative cycle detected' AS warning, u.id, v.id;
-- Extract shortest path
MATCH (target:Node {id: $target_id})
WITH target.id AS current_id, [] AS path
CALL {
WITH current_id, path
MATCH (n:Node {id: current_id})
WHERE n.predecessor IS NOT NULL
WITH n.predecessor AS pred, path + [current_id] AS new_path
RETURN pred AS next_id, new_path AS result_path
}
RETURN result_path AS shortest_path;
Complexity: O(V × E) Advantages: Handles negative weights, detects negative cycles Use case: Currency arbitrage, price optimization
Tarjan’s Strongly Connected Components
Find strongly connected components in O(V + E) time:
-- Initialize node state
MATCH (n:Node)
SET n.index = null,
n.lowlink = null,
n.on_stack = false;
-- Recursive DFS for SCC detection
MATCH (v:Node)
WHERE v.index IS NULL
CALL {
WITH v
SET v.index = $current_index,
v.lowlink = $current_index,
v.on_stack = true
WITH v, $current_index + 1 AS next_index
MATCH (v)-[:EDGE]->(w:Node)
CALL {
WITH w, next_index
CASE
WHEN w.index IS NULL THEN
// Recurse
SET w.index = next_index
WHEN w.on_stack = true THEN
// Update lowlink
SET v.lowlink = CASE
WHEN w.index < v.lowlink THEN w.index
ELSE v.lowlink
END
END
}
// If v is root of SCC
WITH v
WHERE v.lowlink = v.index
MATCH (w:Node)
WHERE w.on_stack = true AND w.index >= v.index
SET w.component_id = v.id,
w.on_stack = false
RETURN v.id AS scc_root
};
-- Query component membership
MATCH (n:Node)
WHERE n.component_id IS NOT NULL
RETURN n.component_id AS component,
COUNT(n) AS size,
COLLECT(n.id) AS members
ORDER BY size DESC;
Applications: Web crawling, social network analysis, compiler optimization
Edmonds-Karp Maximum Flow
Compute maximum flow from source to sink:
-- Initialize residual graph
MATCH (u:Node)-[e:EDGE]->(v:Node)
MERGE (u)-[forward:RESIDUAL {capacity: e.capacity, flow: 0}]->(v)
MERGE (v)-[backward:RESIDUAL {capacity: 0, flow: 0}]->(u);
-- Augmenting path search (BFS)
WITH $source_id AS source, $sink_id AS sink
MATCH path = shortestPath((s:Node {id: source})-[:RESIDUAL*]->(t:Node {id: sink}))
WHERE ALL(r IN relationships(path) WHERE r.capacity - r.flow > 0)
WITH path,
REDUCE(min_cap = 999999, r IN relationships(path) |
CASE WHEN r.capacity - r.flow < min_cap
THEN r.capacity - r.flow
ELSE min_cap
END
) AS bottleneck
WITH path, bottleneck
UNWIND relationships(path) AS edge
SET edge.flow = edge.flow + bottleneck
WITH path, bottleneck
MATCH (u)-[forward:RESIDUAL]->(v)
WHERE forward IN relationships(path)
MATCH (v)-[backward:RESIDUAL]->(u)
SET backward.capacity = backward.capacity + bottleneck;
-- Extract max flow value
MATCH (:Node {id: $source_id})-[r:RESIDUAL]->()
RETURN SUM(r.flow) AS max_flow_value;
Complexity: O(V × E²) Applications: Network capacity, bipartite matching, airline scheduling
Production-Scale Optimizations
Parallel Algorithm Execution
Leverage Geode’s parallel processing for large graphs:
-- Parallel PageRank with batch updates
MATCH (n:Node)
WITH COUNT(n) AS total_nodes
SET n.pagerank = 1.0 / total_nodes;
-- Process in parallel batches
MATCH (n:Node)
WITH n, n.id % 10 AS partition_id
CALL {
WITH n, partition_id
MATCH (n)-[:EDGE]->(neighbor:Node)
WITH neighbor,
SUM(n.pagerank / COUNT { (n)-[:EDGE]->() }) AS incoming_rank
SET neighbor.pagerank_new = 0.15 + 0.85 * incoming_rank
} IN TRANSACTIONS OF 100000 ROWS
PER PARTITION (partition_id);
-- Parallel strongly-connected component detection
MATCH (n:Node)
CALL {
WITH n
MATCH path = (n)-[:EDGE*]->(n)
WITH n, COLLECT(DISTINCT [node IN nodes(path) | ID(node)]) AS cycles
SET n.scc_candidates = cycles
} IN TRANSACTIONS OF 10000 ROWS;
Sampling and Approximation
Achieve faster results with approximate algorithms:
-- Approximate betweenness centrality (sample-based)
WITH 100 AS sample_size // Sample 100 random nodes instead of all pairs
MATCH (sample:Node)
WITH sample
ORDER BY rand()
LIMIT $sample_size
MATCH path = shortestPath((sample)-[:EDGE*]-(target:Node))
WHERE sample <> target
WITH nodes(path) AS path_nodes
UNWIND path_nodes AS node
WHERE node <> path_nodes[0] AND node <> path_nodes[-1]
WITH node, COUNT(*) AS sampled_betweenness
RETURN node.id,
sampled_betweenness * ($total_nodes / $sample_size) AS estimated_betweenness
ORDER BY estimated_betweenness DESC;
Accuracy: ~95% correlation with exact values at 10× speedup
Incremental Algorithm Updates
Update algorithm results without full recomputation:
-- Incremental PageRank after edge addition
MATCH (u:Node {id: $new_source}), (v:Node {id: $new_target})
MERGE (u)-[:EDGE]->(v);
-- Localized update (affected nodes only)
MATCH (affected:Node)
WHERE affected IN [u, v]
OR EXISTS((affected)-[:EDGE*1..2]-(u))
OR EXISTS((affected)-[:EDGE*1..2]-(v))
WITH affected
MATCH (affected)-[:EDGE]->(neighbor:Node)
WITH neighbor,
SUM(affected.pagerank / COUNT { (affected)-[:EDGE]->() }) AS delta
SET neighbor.pagerank = neighbor.pagerank + 0.85 * delta,
neighbor.needs_full_recompute = false;
-- Incremental community detection after node addition
MATCH (new_node:Node {id: $new_node_id})
MATCH (new_node)-[:EDGE]-(neighbor:Node)
WITH neighbor.community AS candidate_community, COUNT(*) AS connections
ORDER BY connections DESC
LIMIT 1
SET new_node.community = candidate_community;
Memory-Efficient Graph Traversal
Handle billion-edge graphs with streaming:
-- Streaming BFS for large graphs
WITH $start_id AS start_id, 0 AS level
CALL {
WITH start_id, level
MATCH (current:Node {id: start_id})
SET current.level = level
RETURN current.id AS processed_id
}
WITH level + 1 AS next_level
CALL {
WITH next_level
MATCH (current:Node)-[:EDGE]->(next:Node)
WHERE current.level = next_level - 1
AND next.level IS NULL
SET next.level = next_level
RETURN COUNT(next) AS nodes_at_level
} IN TRANSACTIONS OF 50000 ROWS
RETURN next_level AS level, nodes_at_level;
Algorithm Benchmarking Framework
Performance Testing
Benchmark algorithm performance on your data:
-- Benchmark PageRank iterations
WITH datetime() AS start_time
MATCH (n:Node)
SET n.pagerank = 1.0 / COUNT { MATCH (m:Node) };
// Run 20 iterations
WITH start_time, range(1, 20) AS iterations
UNWIND iterations AS iter
CALL {
MATCH (source:Node)-[:EDGE]->(target:Node)
WITH target, SUM(source.pagerank / COUNT { (source)-[:EDGE]->() }) AS rank
SET target.pagerank_new = 0.15 + 0.85 * rank
}
MATCH (n:Node)
SET n.pagerank = n.pagerank_new
REMOVE n.pagerank_new
WITH start_time, datetime() AS end_time
RETURN duration.between(start_time, end_time).milliseconds AS runtime_ms,
runtime_ms / 20.0 AS ms_per_iteration;
-- Memory usage profiling
CALL dbms.memory.usage()
YIELD used, total, peak
RETURN used / 1024 / 1024 AS used_mb,
total / 1024 / 1024 AS total_mb,
peak / 1024 / 1024 AS peak_mb;
Convergence Analysis
Monitor algorithm convergence:
-- Track PageRank convergence
WITH [] AS convergence_history
MATCH (n:Node)
WITH convergence_history + [n.pagerank] AS history
// After iteration
MATCH (n:Node)
WITH history, n.pagerank_new - n.pagerank AS delta
WITH AVG(ABS(delta)) AS mean_change,
MAX(ABS(delta)) AS max_change,
STDDEV(ABS(delta)) AS stddev_change
RETURN mean_change,
max_change,
stddev_change,
CASE WHEN mean_change < 0.0001 THEN 'CONVERGED' ELSE 'CONTINUE' END AS status;
Industry-Specific Applications
Financial Networks
Detect systemic risk and contagion:
-- Bank exposure network analysis
MATCH (bank:Bank)-[exp:EXPOSURE]->(counterparty:Bank)
WITH bank,
SUM(exp.amount) AS total_exposure,
bank.capital AS capital
WITH bank,
total_exposure / capital AS leverage_ratio
WHERE leverage_ratio > 10.0
MATCH path = (bank)-[:EXPOSURE*1..3]->(systemic:Bank)
WHERE systemic.systemically_important = true
RETURN bank.name,
leverage_ratio,
COUNT(DISTINCT path) AS contagion_paths,
SUM([r IN relationships(path) | r.amount]) AS total_risk
ORDER BY total_risk DESC;
Supply Chain Optimization
Identify bottlenecks and optimize flows:
-- Critical path analysis
MATCH path = (supplier:Supplier)-[:SUPPLIES*]->(manufacturer:Manufacturer)
-[:SHIPS*]->(distributor:Distributor)
WITH path,
REDUCE(time = 0, r IN relationships(path) | time + r.lead_time_days) AS total_time
ORDER BY total_time DESC
LIMIT 1
RETURN [n IN nodes(path) | n.name] AS critical_path,
total_time AS longest_lead_time,
[r IN relationships(path) | r.lead_time_days] AS segment_times;
-- Bottleneck identification (max flow)
MATCH (factory:Factory {id: $factory_id})
MATCH (warehouse:Warehouse {id: $warehouse_id})
CALL algorithms.maxFlow(factory, warehouse, 'SHIPS', 'capacity')
YIELD path, flow
WITH REDUCE(min_cap = 999999, r IN relationships(path) |
CASE WHEN r.capacity < min_cap THEN r.capacity ELSE min_cap END
) AS bottleneck_capacity,
path
RETURN [n IN nodes(path) | n.name] AS supply_chain_path,
bottleneck_capacity,
'Bottleneck at ' + [r IN relationships(path) WHERE r.capacity = bottleneck_capacity][0].name AS bottleneck_location;
Biological Networks
Analyze protein interactions and pathways:
-- Protein-protein interaction centrality
MATCH (protein:Protein)
OPTIONAL MATCH (protein)-[:INTERACTS_WITH]-(partner:Protein)
WITH protein,
COUNT(DISTINCT partner) AS degree,
AVG(partner.expression_level) AS avg_partner_expression
WHERE degree >= 10 // Hub proteins
MATCH path = shortestPath((protein)-[:INTERACTS_WITH*]-(disease:Disease))
RETURN protein.name,
degree AS interaction_count,
avg_partner_expression,
LENGTH(path) AS disease_proximity,
protein.function AS biological_role
ORDER BY degree DESC, disease_proximity ASC;
Visualization and Analysis
Graph Metric Dashboard
Compute comprehensive graph statistics:
-- Complete graph analytics dashboard
MATCH (n:Node)
WITH COUNT(n) AS node_count,
COLLECT(n) AS all_nodes
MATCH ()-[r:EDGE]-()
WITH node_count, all_nodes, COUNT(r) AS edge_count
WITH node_count, edge_count,
edge_count * 2.0 / (node_count * (node_count - 1)) AS density,
edge_count * 1.0 / node_count AS avg_degree
MATCH (n:Node)
OPTIONAL MATCH (n)-[r:EDGE]-()
WITH node_count, edge_count, density, avg_degree,
COUNT(r) AS node_degree
WITH node_count, edge_count, density, avg_degree,
COLLECT(node_degree) AS degree_dist
WITH node_count, edge_count, density, avg_degree, degree_dist,
percentile_cont(degree_dist, 0.5) AS median_degree,
percentile_cont(degree_dist, 0.95) AS p95_degree
MATCH path = shortestPath((a:Node)-[:EDGE*]-(b:Node))
WHERE a <> b
WITH node_count, edge_count, density, avg_degree,
median_degree, p95_degree,
MAX(LENGTH(path)) AS diameter,
AVG(LENGTH(path)) AS avg_path_length
RETURN node_count,
edge_count,
density,
avg_degree,
median_degree,
p95_degree,
diameter,
avg_path_length,
CASE
WHEN avg_path_length / log(node_count) < 2 THEN 'Small World'
WHEN density > 0.1 THEN 'Dense'
ELSE 'Sparse'
END AS graph_type;
Degree Distribution Analysis
Identify network topology:
-- Degree distribution histogram
MATCH (n:Node)
OPTIONAL MATCH (n)-[r:EDGE]-()
WITH n, COUNT(r) AS degree
WITH degree, COUNT(n) AS frequency
ORDER BY degree
WITH COLLECT({degree: degree, count: frequency}) AS distribution
RETURN distribution,
CASE
WHEN distribution[-1].count > distribution[0].count THEN 'Power Law (Scale-Free)'
WHEN AVG([d IN distribution | ABS(d.count - AVG([x IN distribution | x.count]))]) < 2
THEN 'Regular Network'
ELSE 'Random Network'
END AS topology_type;
Further Reading
- Graph Algorithms: Practical Examples in Network Science
- Network Analysis with GQL: Advanced Query Patterns and Optimization
- Scalable Graph Processing: Distributed Algorithms for Billion-Node Graphs
- Machine Learning on Graphs: Graph Neural Networks and Embedding Techniques
- Social Network Analysis: Community Detection and Influence Propagation
- Graph Algorithm Complexity: Theoretical Foundations and Practical Bounds
- Production Graph Analytics: Real-Time Processing at Scale
Related Topics
Browse the tagged content below to discover documentation, tutorials, and guides for implementing graph algorithms in your Geode applications.