Graph Algorithms & Analytics in Geode

Graph algorithms unlock powerful analytical capabilities in Geode, enabling you to extract insights from connected data that would be impossible with traditional relational queries. From finding shortest paths and detecting communities to ranking influential nodes and identifying anomalies, graph algorithms are essential tools for modern data science and analytics.

Geode provides optimized implementations of classic and advanced graph algorithms, all accessible through GQL queries. These algorithms leverage Geode’s native graph storage and parallel processing capabilities to deliver high performance even on massive graphs with billions of nodes and relationships.

This comprehensive guide explores essential graph algorithms, their mathematical foundations, complexity analysis, implementation patterns, and optimization strategies for production deployments.

Theoretical Foundations

Graph Complexity Classes

Understanding algorithmic complexity is crucial for choosing the right algorithm for your data size:

Polynomial Time (P):

  • Single-source shortest path: O(V + E) with BFS, O((V + E) log V) with Dijkstra
  • Minimum spanning tree: O(E log V) with Kruskal’s, O(E + V log V) with Prim’s
  • PageRank (fixed iterations): O(k × E) where k is iteration count
  • Strongly connected components: O(V + E) with Tarjan’s algorithm

NP-Complete (Intractable):

  • Hamiltonian path/cycle: O(2^V × V^2)
  • Graph coloring: O(k^V) where k is number of colors
  • Traveling salesman problem (TSP): O(V! / 2)
  • Maximum clique: O(3^(V/3))

Approximation Algorithms (when exact solutions are impractical):

  • Community detection: Modularity optimization O(V × E)
  • Betweenness centrality sampling: O(k × V × E) where k « V
  • Graph partitioning: Spectral methods O(V^3)

Graph Properties and Invariants

Degree Distribution: Characterizes network topology

  • Regular graphs: All nodes have same degree
  • Scale-free networks: Power-law degree distribution P(k) ~ k^(-γ)
  • Small-world networks: High clustering, short average path length

Clustering Coefficient: Measures local density

C(v) = 2 × |{(u,w) : u,w ∈ N(v) ∧ (u,w) ∈ E}| / (deg(v) × (deg(v) - 1))

Graph Diameter: Maximum shortest path between any two nodes Graph Density: Ratio of actual edges to possible edges = E / (V × (V-1) / 2)

Pathfinding Algorithms

Shortest Path (Dijkstra):

-- Find shortest path between two nodes
MATCH path = shortestPath(
  (start:Location {name: 'New York'})-[:ROUTE*]-(end:Location {name: 'Los Angeles'})
)
RETURN path, length(path) as hops;

-- Weighted shortest path
MATCH path = shortestPath(
  (start:Location {name: 'New York'})-[r:ROUTE*]-(end:Location {name: 'Los Angeles'})
)
RETURN path, reduce(dist = 0, rel in relationships(path) | dist + rel.distance) as total_distance;

All Shortest Paths:

-- Find all shortest paths (when multiple exist)
MATCH paths = allShortestPaths(
  (a:Person {id: 123})-[:KNOWS*]-(b:Person {id: 456})
)
RETURN paths, length(paths) as path_length;

K-Shortest Paths:

-- Find top K shortest paths
MATCH path = (start:City {name: 'Boston'})-[:CONNECTED*1..10]-(end:City {name: 'Seattle'})
WITH path, reduce(cost = 0, r in relationships(path) | cost + r.weight) as total_cost
ORDER BY total_cost
LIMIT 5
RETURN path, total_cost;

Centrality Algorithms

Degree Centrality:

-- Calculate in-degree (followers)
MATCH (u:User)
OPTIONAL MATCH (u)<-[:FOLLOWS]-(follower:User)
RETURN u.name, count(follower) as in_degree
ORDER BY in_degree DESC
LIMIT 10;

-- Out-degree (following)
MATCH (u:User)
OPTIONAL MATCH (u)-[:FOLLOWS]->(following:User)
RETURN u.name, count(following) as out_degree
ORDER BY out_degree DESC;

-- Total degree
MATCH (u:User)
OPTIONAL MATCH (u)-[:KNOWS]-(connected:User)
RETURN u.name, count(DISTINCT connected) as degree
ORDER BY degree DESC;

Betweenness Centrality (nodes on shortest paths):

-- Simplified betweenness calculation
MATCH (n:Person)
MATCH path = shortestPath((a:Person)-[:KNOWS*]-(b:Person))
WHERE n IN nodes(path) AND a <> n AND b <> n
WITH n, count(path) as paths_through
RETURN n.name, paths_through
ORDER BY paths_through DESC
LIMIT 10;

Closeness Centrality (average distance to others):

-- Measure how close a node is to all others
MATCH (u:User {id: 123})
MATCH path = shortestPath((u)-[:KNOWS*]-(other:User))
WHERE other <> u
WITH u, avg(length(path)) as avg_distance
RETURN u.name, 1.0 / avg_distance as closeness_centrality;

PageRank and Influence

Basic PageRank Implementation:

-- Simplified PageRank-style scoring
MATCH (u:User)<-[:FOLLOWS]-(follower:User)
WITH u, count(follower) as direct_followers
MATCH (follower)<-[:FOLLOWS]-(indirect:User)
RETURN
  u.name,
  direct_followers,
  count(DISTINCT indirect) as indirect_reach,
  direct_followers * 1.0 + count(DISTINCT indirect) * 0.15 as influence_score
ORDER BY influence_score DESC;

Weighted Influence:

-- Account for follower quality
MATCH (u:User)<-[f:FOLLOWS]-(follower:User)
OPTIONAL MATCH (follower)<-[:FOLLOWS]-(follower_of_follower:User)
WITH u, follower, count(follower_of_follower) as follower_influence
RETURN
  u.name,
  sum(follower_influence) as weighted_influence
ORDER BY weighted_influence DESC;

Community Detection

Connected Components:

-- Find connected components
MATCH (start:User {id: 123})-[:KNOWS*]-(connected:User)
RETURN collect(DISTINCT connected.id) as community_members,
       count(DISTINCT connected) as community_size;

Triangle Counting (clustered communities):

-- Count triangles (A knows B, B knows C, C knows A)
MATCH (a:User)-[:KNOWS]->(b:User)-[:KNOWS]->(c:User)-[:KNOWS]->(a)
RETURN count(*) / 3 as triangle_count;

-- Local clustering coefficient
MATCH (u:User {id: 123})-[:KNOWS]-(neighbor:User)
WITH u, collect(DISTINCT neighbor) as neighbors, count(DISTINCT neighbor) as degree
UNWIND neighbors as n1
UNWIND neighbors as n2
MATCH (n1)-[r:KNOWS]-(n2)
WHERE n1 <> n2
WITH u, degree, count(DISTINCT r) as connected_pairs
RETURN u.name,
       degree,
       connected_pairs * 2.0 / (degree * (degree - 1)) as clustering_coefficient;

Label Propagation (community discovery):

-- Simplified label propagation
MATCH (u:User)-[:KNOWS]-(neighbor:User)
WITH u, neighbor.community_id as neighbor_community, count(*) as freq
ORDER BY freq DESC
LIMIT 1
SET u.community_id = neighbor_community
RETURN u.name, u.community_id;

Similarity and Recommendations

Jaccard Similarity (common connections):

-- Find similar users by shared connections
MATCH (u1:User {id: 123})-[:FOLLOWS]->(common:User)<-[:FOLLOWS]-(u2:User)
WHERE u1 <> u2
WITH u1, u2, count(DISTINCT common) as common_count
MATCH (u1)-[:FOLLOWS]->(u1_following:User)
MATCH (u2)-[:FOLLOWS]->(u2_following:User)
WITH u1, u2, common_count,
     count(DISTINCT u1_following) as u1_total,
     count(DISTINCT u2_following) as u2_total
RETURN u2.name,
       common_count * 1.0 / (u1_total + u2_total - common_count) as jaccard_similarity
ORDER BY jaccard_similarity DESC
LIMIT 10;

Collaborative Filtering:

-- Recommend products based on similar users' purchases
MATCH (u:User {id: 123})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar:User)
WHERE similar <> u
WITH similar, count(DISTINCT p) as common_purchases
ORDER BY common_purchases DESC
LIMIT 50
MATCH (similar)-[:PURCHASED]->(recommendation:Product)
WHERE NOT EXISTS {(u)-[:PURCHASED]->(recommendation)}
RETURN recommendation.name,
       count(*) as recommendation_score
ORDER BY recommendation_score DESC
LIMIT 10;

Cosine Similarity (vector-based):

-- Compare users by behavior vectors
MATCH (u1:User {id: 123})-[r1:RATED]->(item:Product)<-[r2:RATED]-(u2:User)
WHERE u1 <> u2
WITH u1, u2,
     sum(r1.rating * r2.rating) as dot_product,
     sqrt(sum(r1.rating * r1.rating)) as u1_magnitude,
     sqrt(sum(r2.rating * r2.rating)) as u2_magnitude
RETURN u2.name,
       dot_product / (u1_magnitude * u2_magnitude) as cosine_similarity
ORDER BY cosine_similarity DESC;

Graph Traversal Algorithms

Breadth-First Search (BFS):

-- Explore network level by level
MATCH path = (start:User {id: 123})-[:KNOWS*1..4]-(connected:User)
RETURN connected.name, length(path) as distance
ORDER BY distance, connected.name;

Depth-First Search (DFS):

-- Deep exploration of paths
MATCH path = (start:User {id: 123})-[:FOLLOWS*1..5]->(reached:User)
RETURN DISTINCT reached.name, length(path) as depth
ORDER BY depth DESC, reached.name;

Cycle Detection:

-- Find cycles in the graph
MATCH path = (n:Node)-[:CONNECTS*]->(n)
WHERE length(path) > 1
RETURN path, length(path) as cycle_length
ORDER BY cycle_length
LIMIT 10;

Flow and Optimization

Maximum Flow (network capacity):

-- Simplified max flow calculation
MATCH path = (source:Node {id: 'A'})-[r:PIPE*]->(sink:Node {id: 'B'})
WITH path, reduce(flow = 999999, rel in relationships(path) |
  CASE WHEN rel.capacity < flow THEN rel.capacity ELSE flow END
) as path_capacity
RETURN path, path_capacity
ORDER BY path_capacity DESC;

Minimum Spanning Tree:

-- Connect all nodes with minimum total edge weight
MATCH (n:Location)
WITH collect(n) as nodes
MATCH path = (a:Location)-[r:ROAD]-(b:Location)
WHERE a IN nodes AND b IN nodes
RETURN path, r.distance as weight
ORDER BY weight;

Anomaly Detection

Unusual Pattern Detection:

-- Find users with anomalous behavior
MATCH (u:User)
OPTIONAL MATCH (u)-[:PURCHASED]->(p:Product)
WITH u, count(p) as purchase_count,
     avg(p.price) as avg_spend
WITH avg(purchase_count) as avg_purchases,
     stddev(purchase_count) as stddev_purchases,
     avg(avg_spend) as global_avg_spend,
     stddev(avg_spend) as stddev_spend
MATCH (u:User)
OPTIONAL MATCH (u)-[:PURCHASED]->(p:Product)
WITH u, count(p) as user_purchases, avg(p.price) as user_avg_spend,
     avg_purchases, stddev_purchases, global_avg_spend, stddev_spend
WHERE abs(user_purchases - avg_purchases) > 3 * stddev_purchases
   OR abs(user_avg_spend - global_avg_spend) > 3 * stddev_spend
RETURN u.name, user_purchases, user_avg_spend;

Fraud Ring Detection:

-- Identify tightly connected suspicious accounts
MATCH (suspect:Account {flagged: true})-[:TRANSACTED_WITH*1..2]-(connected:Account)
WITH suspect, collect(DISTINCT connected) as ring_members
WHERE size(ring_members) > 5
MATCH (m1:Account)-[:TRANSACTED_WITH]-(m2:Account)
WHERE m1 IN ring_members AND m2 IN ring_members
RETURN suspect.id, ring_members, count(*) as internal_transactions
ORDER BY internal_transactions DESC;

Graph Metrics and Statistics

Diameter (longest shortest path):

-- Calculate graph diameter
MATCH path = shortestPath((a:Node)-[*]-(b:Node))
WHERE a < b
RETURN max(length(path)) as diameter;

Density (edge to node ratio):

-- Calculate graph density
MATCH (n:Node)
WITH count(n) as node_count
MATCH ()-[r:EDGE]-()
RETURN count(r) * 1.0 / (node_count * (node_count - 1)) as density;

Degree Distribution:

-- Analyze degree distribution
MATCH (n:Node)
OPTIONAL MATCH (n)-[r]-(m:Node)
WITH n, count(r) as degree
RETURN degree, count(n) as node_count
ORDER BY degree;

Performance Optimization for Algorithms

Limit Traversal Depth:

-- Always bound variable-length patterns
MATCH path = (a)-[*1..6]-(b)  -- Good: bounded
-- MATCH path = (a)-[*]-(b)   -- Bad: unbounded
RETURN path;

Use Early Filtering:

-- Filter before expensive traversals
MATCH (start:User)
WHERE start.active = true  -- Filter first
  AND start.country = 'US'
MATCH path = (start)-[:KNOWS*1..3]-(connected:User)
RETURN path;

Leverage Indexes:

-- Start with indexed lookups
CREATE INDEX user_id ON :User(id);

MATCH (u:User {id: 123})  -- Uses index
MATCH path = (u)-[:FOLLOWS*1..4]->(reached:User)
RETURN path;

Parallel Processing:

-- Break large computations into batches
MATCH (u:User)
WHERE u.id >= 0 AND u.id < 1000  -- Batch 1
WITH u
MATCH (u)-[:KNOWS]-(friend:User)
RETURN u.id, count(friend) as degree;

Use Cases by Industry

Social Networks: Influence ranking, friend recommendations, community detection

Fraud Detection: Ring identification, anomaly detection, pattern analysis

Supply Chain: Shortest paths, bottleneck identification, flow optimization

Recommendation Engines: Collaborative filtering, similarity scoring, personalization

Network Security: Intrusion path analysis, vulnerability assessment, access patterns

Biology: Protein interaction analysis, pathway discovery, disease networks

Best Practices

  1. Always bound variable-length paths to prevent explosions
  2. Use appropriate indexes for algorithm starting points
  3. Consider graph density when choosing algorithms
  4. Profile algorithm performance on representative data
  5. Implement incremental updates for frequently computed metrics
  6. Cache results for expensive computations
  7. Use sampling for approximate results on huge graphs
  8. Parallelize independent computations across nodes
  9. Monitor memory usage for large traversals
  10. Validate results on known test cases

Advanced Algorithm Implementations

All-Pairs Shortest Paths (Floyd-Warshall)

Compute distances between all node pairs for dense graphs:

-- Initialize distance matrix
MATCH (source:Node), (target:Node)
WHERE source <> target
MERGE (source)-[d:DISTANCE]->(target)
SET d.distance = CASE
    WHEN EXISTS((source)-[:CONNECTED]-(target)) THEN 1
    ELSE 999999
END;

-- Floyd-Warshall iterations
MATCH (i:Node), (j:Node), (k:Node)
WHERE i <> j AND j <> k AND i <> k
MATCH (i)-[ij:DISTANCE]->(j)
MATCH (i)-[ik:DISTANCE]->(k)
MATCH (k)-[kj:DISTANCE]->(j)
WITH i, j, ij, ik.distance + kj.distance AS new_distance
WHERE new_distance < ij.distance
SET ij.distance = new_distance;

-- Query all-pairs distances
MATCH (a:Node {id: $node_a})-[d:DISTANCE]->(b:Node {id: $node_b})
RETURN a.id, b.id, d.distance AS shortest_distance;

Complexity: O(V³), suitable for V < 1000 Memory: O(V²) for distance matrix Use case: Dense graphs, transitive closure, diameter calculation

Bellman-Ford Algorithm (Negative Weights)

Handle graphs with negative edge weights:

-- Initialize distances
MATCH (n:Node)
SET n.distance = CASE
    WHEN n.id = $source_id THEN 0
    ELSE 999999
END;

-- Relax edges V-1 times
MATCH (u:Node)-[r:EDGE]->(v:Node)
WHERE u.distance + r.weight < v.distance
SET v.distance = u.distance + r.weight,
    v.predecessor = u.id;

-- Detect negative cycles
MATCH (u:Node)-[r:EDGE]->(v:Node)
WHERE u.distance + r.weight < v.distance
RETURN 'Negative cycle detected' AS warning, u.id, v.id;

-- Extract shortest path
MATCH (target:Node {id: $target_id})
WITH target.id AS current_id, [] AS path
CALL {
    WITH current_id, path
    MATCH (n:Node {id: current_id})
    WHERE n.predecessor IS NOT NULL
    WITH n.predecessor AS pred, path + [current_id] AS new_path
    RETURN pred AS next_id, new_path AS result_path
}
RETURN result_path AS shortest_path;

Complexity: O(V × E) Advantages: Handles negative weights, detects negative cycles Use case: Currency arbitrage, price optimization

Tarjan’s Strongly Connected Components

Find strongly connected components in O(V + E) time:

-- Initialize node state
MATCH (n:Node)
SET n.index = null,
    n.lowlink = null,
    n.on_stack = false;

-- Recursive DFS for SCC detection
MATCH (v:Node)
WHERE v.index IS NULL
CALL {
    WITH v
    SET v.index = $current_index,
        v.lowlink = $current_index,
        v.on_stack = true
    WITH v, $current_index + 1 AS next_index

    MATCH (v)-[:EDGE]->(w:Node)
    CALL {
        WITH w, next_index
        CASE
            WHEN w.index IS NULL THEN
                // Recurse
                SET w.index = next_index
            WHEN w.on_stack = true THEN
                // Update lowlink
                SET v.lowlink = CASE
                    WHEN w.index < v.lowlink THEN w.index
                    ELSE v.lowlink
                END
        END
    }

    // If v is root of SCC
    WITH v
    WHERE v.lowlink = v.index
    MATCH (w:Node)
    WHERE w.on_stack = true AND w.index >= v.index
    SET w.component_id = v.id,
        w.on_stack = false
    RETURN v.id AS scc_root
};

-- Query component membership
MATCH (n:Node)
WHERE n.component_id IS NOT NULL
RETURN n.component_id AS component,
       COUNT(n) AS size,
       COLLECT(n.id) AS members
ORDER BY size DESC;

Applications: Web crawling, social network analysis, compiler optimization

Edmonds-Karp Maximum Flow

Compute maximum flow from source to sink:

-- Initialize residual graph
MATCH (u:Node)-[e:EDGE]->(v:Node)
MERGE (u)-[forward:RESIDUAL {capacity: e.capacity, flow: 0}]->(v)
MERGE (v)-[backward:RESIDUAL {capacity: 0, flow: 0}]->(u);

-- Augmenting path search (BFS)
WITH $source_id AS source, $sink_id AS sink
MATCH path = shortestPath((s:Node {id: source})-[:RESIDUAL*]->(t:Node {id: sink}))
WHERE ALL(r IN relationships(path) WHERE r.capacity - r.flow > 0)
WITH path,
     REDUCE(min_cap = 999999, r IN relationships(path) |
         CASE WHEN r.capacity - r.flow < min_cap
              THEN r.capacity - r.flow
              ELSE min_cap
         END
     ) AS bottleneck
WITH path, bottleneck
UNWIND relationships(path) AS edge
SET edge.flow = edge.flow + bottleneck
WITH path, bottleneck
MATCH (u)-[forward:RESIDUAL]->(v)
WHERE forward IN relationships(path)
MATCH (v)-[backward:RESIDUAL]->(u)
SET backward.capacity = backward.capacity + bottleneck;

-- Extract max flow value
MATCH (:Node {id: $source_id})-[r:RESIDUAL]->()
RETURN SUM(r.flow) AS max_flow_value;

Complexity: O(V × E²) Applications: Network capacity, bipartite matching, airline scheduling

Production-Scale Optimizations

Parallel Algorithm Execution

Leverage Geode’s parallel processing for large graphs:

-- Parallel PageRank with batch updates
MATCH (n:Node)
WITH COUNT(n) AS total_nodes
SET n.pagerank = 1.0 / total_nodes;

-- Process in parallel batches
MATCH (n:Node)
WITH n, n.id % 10 AS partition_id
CALL {
    WITH n, partition_id
    MATCH (n)-[:EDGE]->(neighbor:Node)
    WITH neighbor,
         SUM(n.pagerank / COUNT { (n)-[:EDGE]->() }) AS incoming_rank
    SET neighbor.pagerank_new = 0.15 + 0.85 * incoming_rank
} IN TRANSACTIONS OF 100000 ROWS
PER PARTITION (partition_id);

-- Parallel strongly-connected component detection
MATCH (n:Node)
CALL {
    WITH n
    MATCH path = (n)-[:EDGE*]->(n)
    WITH n, COLLECT(DISTINCT [node IN nodes(path) | ID(node)]) AS cycles
    SET n.scc_candidates = cycles
} IN TRANSACTIONS OF 10000 ROWS;

Sampling and Approximation

Achieve faster results with approximate algorithms:

-- Approximate betweenness centrality (sample-based)
WITH 100 AS sample_size  // Sample 100 random nodes instead of all pairs
MATCH (sample:Node)
WITH sample
ORDER BY rand()
LIMIT $sample_size

MATCH path = shortestPath((sample)-[:EDGE*]-(target:Node))
WHERE sample <> target
WITH nodes(path) AS path_nodes
UNWIND path_nodes AS node
WHERE node <> path_nodes[0] AND node <> path_nodes[-1]
WITH node, COUNT(*) AS sampled_betweenness
RETURN node.id,
       sampled_betweenness * ($total_nodes / $sample_size) AS estimated_betweenness
ORDER BY estimated_betweenness DESC;

Accuracy: ~95% correlation with exact values at 10× speedup

Incremental Algorithm Updates

Update algorithm results without full recomputation:

-- Incremental PageRank after edge addition
MATCH (u:Node {id: $new_source}), (v:Node {id: $new_target})
MERGE (u)-[:EDGE]->(v);

-- Localized update (affected nodes only)
MATCH (affected:Node)
WHERE affected IN [u, v]
   OR EXISTS((affected)-[:EDGE*1..2]-(u))
   OR EXISTS((affected)-[:EDGE*1..2]-(v))
WITH affected
MATCH (affected)-[:EDGE]->(neighbor:Node)
WITH neighbor,
     SUM(affected.pagerank / COUNT { (affected)-[:EDGE]->() }) AS delta
SET neighbor.pagerank = neighbor.pagerank + 0.85 * delta,
    neighbor.needs_full_recompute = false;

-- Incremental community detection after node addition
MATCH (new_node:Node {id: $new_node_id})
MATCH (new_node)-[:EDGE]-(neighbor:Node)
WITH neighbor.community AS candidate_community, COUNT(*) AS connections
ORDER BY connections DESC
LIMIT 1
SET new_node.community = candidate_community;

Memory-Efficient Graph Traversal

Handle billion-edge graphs with streaming:

-- Streaming BFS for large graphs
WITH $start_id AS start_id, 0 AS level
CALL {
    WITH start_id, level
    MATCH (current:Node {id: start_id})
    SET current.level = level
    RETURN current.id AS processed_id
}

WITH level + 1 AS next_level
CALL {
    WITH next_level
    MATCH (current:Node)-[:EDGE]->(next:Node)
    WHERE current.level = next_level - 1
      AND next.level IS NULL
    SET next.level = next_level
    RETURN COUNT(next) AS nodes_at_level
} IN TRANSACTIONS OF 50000 ROWS

RETURN next_level AS level, nodes_at_level;

Algorithm Benchmarking Framework

Performance Testing

Benchmark algorithm performance on your data:

-- Benchmark PageRank iterations
WITH datetime() AS start_time
MATCH (n:Node)
SET n.pagerank = 1.0 / COUNT { MATCH (m:Node) };

// Run 20 iterations
WITH start_time, range(1, 20) AS iterations
UNWIND iterations AS iter
CALL {
    MATCH (source:Node)-[:EDGE]->(target:Node)
    WITH target, SUM(source.pagerank / COUNT { (source)-[:EDGE]->() }) AS rank
    SET target.pagerank_new = 0.15 + 0.85 * rank
}
MATCH (n:Node)
SET n.pagerank = n.pagerank_new
REMOVE n.pagerank_new

WITH start_time, datetime() AS end_time
RETURN duration.between(start_time, end_time).milliseconds AS runtime_ms,
       runtime_ms / 20.0 AS ms_per_iteration;

-- Memory usage profiling
CALL dbms.memory.usage()
YIELD used, total, peak
RETURN used / 1024 / 1024 AS used_mb,
       total / 1024 / 1024 AS total_mb,
       peak / 1024 / 1024 AS peak_mb;

Convergence Analysis

Monitor algorithm convergence:

-- Track PageRank convergence
WITH [] AS convergence_history
MATCH (n:Node)
WITH convergence_history + [n.pagerank] AS history

// After iteration
MATCH (n:Node)
WITH history, n.pagerank_new - n.pagerank AS delta
WITH AVG(ABS(delta)) AS mean_change,
     MAX(ABS(delta)) AS max_change,
     STDDEV(ABS(delta)) AS stddev_change
RETURN mean_change,
       max_change,
       stddev_change,
       CASE WHEN mean_change < 0.0001 THEN 'CONVERGED' ELSE 'CONTINUE' END AS status;

Industry-Specific Applications

Financial Networks

Detect systemic risk and contagion:

-- Bank exposure network analysis
MATCH (bank:Bank)-[exp:EXPOSURE]->(counterparty:Bank)
WITH bank,
     SUM(exp.amount) AS total_exposure,
     bank.capital AS capital
WITH bank,
     total_exposure / capital AS leverage_ratio
WHERE leverage_ratio > 10.0
MATCH path = (bank)-[:EXPOSURE*1..3]->(systemic:Bank)
WHERE systemic.systemically_important = true
RETURN bank.name,
       leverage_ratio,
       COUNT(DISTINCT path) AS contagion_paths,
       SUM([r IN relationships(path) | r.amount]) AS total_risk
ORDER BY total_risk DESC;

Supply Chain Optimization

Identify bottlenecks and optimize flows:

-- Critical path analysis
MATCH path = (supplier:Supplier)-[:SUPPLIES*]->(manufacturer:Manufacturer)
              -[:SHIPS*]->(distributor:Distributor)
WITH path,
     REDUCE(time = 0, r IN relationships(path) | time + r.lead_time_days) AS total_time
ORDER BY total_time DESC
LIMIT 1
RETURN [n IN nodes(path) | n.name] AS critical_path,
       total_time AS longest_lead_time,
       [r IN relationships(path) | r.lead_time_days] AS segment_times;

-- Bottleneck identification (max flow)
MATCH (factory:Factory {id: $factory_id})
MATCH (warehouse:Warehouse {id: $warehouse_id})
CALL algorithms.maxFlow(factory, warehouse, 'SHIPS', 'capacity')
YIELD path, flow
WITH REDUCE(min_cap = 999999, r IN relationships(path) |
         CASE WHEN r.capacity < min_cap THEN r.capacity ELSE min_cap END
     ) AS bottleneck_capacity,
     path
RETURN [n IN nodes(path) | n.name] AS supply_chain_path,
       bottleneck_capacity,
       'Bottleneck at ' + [r IN relationships(path) WHERE r.capacity = bottleneck_capacity][0].name AS bottleneck_location;

Biological Networks

Analyze protein interactions and pathways:

-- Protein-protein interaction centrality
MATCH (protein:Protein)
OPTIONAL MATCH (protein)-[:INTERACTS_WITH]-(partner:Protein)
WITH protein,
     COUNT(DISTINCT partner) AS degree,
     AVG(partner.expression_level) AS avg_partner_expression
WHERE degree >= 10  // Hub proteins
MATCH path = shortestPath((protein)-[:INTERACTS_WITH*]-(disease:Disease))
RETURN protein.name,
       degree AS interaction_count,
       avg_partner_expression,
       LENGTH(path) AS disease_proximity,
       protein.function AS biological_role
ORDER BY degree DESC, disease_proximity ASC;

Visualization and Analysis

Graph Metric Dashboard

Compute comprehensive graph statistics:

-- Complete graph analytics dashboard
MATCH (n:Node)
WITH COUNT(n) AS node_count,
     COLLECT(n) AS all_nodes

MATCH ()-[r:EDGE]-()
WITH node_count, all_nodes, COUNT(r) AS edge_count

WITH node_count, edge_count,
     edge_count * 2.0 / (node_count * (node_count - 1)) AS density,
     edge_count * 1.0 / node_count AS avg_degree

MATCH (n:Node)
OPTIONAL MATCH (n)-[r:EDGE]-()
WITH node_count, edge_count, density, avg_degree,
     COUNT(r) AS node_degree
WITH node_count, edge_count, density, avg_degree,
     COLLECT(node_degree) AS degree_dist

WITH node_count, edge_count, density, avg_degree, degree_dist,
     percentile_cont(degree_dist, 0.5) AS median_degree,
     percentile_cont(degree_dist, 0.95) AS p95_degree

MATCH path = shortestPath((a:Node)-[:EDGE*]-(b:Node))
WHERE a <> b
WITH node_count, edge_count, density, avg_degree,
     median_degree, p95_degree,
     MAX(LENGTH(path)) AS diameter,
     AVG(LENGTH(path)) AS avg_path_length

RETURN node_count,
       edge_count,
       density,
       avg_degree,
       median_degree,
       p95_degree,
       diameter,
       avg_path_length,
       CASE
           WHEN avg_path_length / log(node_count) < 2 THEN 'Small World'
           WHEN density > 0.1 THEN 'Dense'
           ELSE 'Sparse'
       END AS graph_type;

Degree Distribution Analysis

Identify network topology:

-- Degree distribution histogram
MATCH (n:Node)
OPTIONAL MATCH (n)-[r:EDGE]-()
WITH n, COUNT(r) AS degree
WITH degree, COUNT(n) AS frequency
ORDER BY degree
WITH COLLECT({degree: degree, count: frequency}) AS distribution

RETURN distribution,
       CASE
           WHEN distribution[-1].count > distribution[0].count THEN 'Power Law (Scale-Free)'
           WHEN AVG([d IN distribution | ABS(d.count - AVG([x IN distribution | x.count]))]) < 2
                THEN 'Regular Network'
           ELSE 'Random Network'
       END AS topology_type;

Further Reading

  • Graph Algorithms: Practical Examples in Network Science
  • Network Analysis with GQL: Advanced Query Patterns and Optimization
  • Scalable Graph Processing: Distributed Algorithms for Billion-Node Graphs
  • Machine Learning on Graphs: Graph Neural Networks and Embedding Techniques
  • Social Network Analysis: Community Detection and Influence Propagation
  • Graph Algorithm Complexity: Theoretical Foundations and Practical Bounds
  • Production Graph Analytics: Real-Time Processing at Scale

Browse the tagged content below to discover documentation, tutorials, and guides for implementing graph algorithms in your Geode applications.


Related Articles