Graph Analytics | Tags | Geode Database

Graph Analytics in Geode

Graph analytics transforms connected data into actionable insights by revealing patterns, relationships, and structures invisible in traditional databases. Geode’s implementation of the ISO/IEC 39075:2024 GQL standard provides powerful analytical capabilities for path analysis, centrality measurement, community detection, and business intelligence.

Why Graph Analytics?

Traditional relational databases excel at structured queries but struggle with relationship-heavy analytics. Graph databases like Geode naturally model and query connected data, enabling:

Relationship Discovery: Find hidden connections across multiple hops
Pattern Recognition: Identify recurring structures in complex networks
Influence Analysis: Measure node importance and information flow
Community Detection: Discover clusters and groupings
Anomaly Detection: Spot unusual patterns in connected data

Core Analytical Capabilities

Path Analysis

Path queries reveal how nodes connect through relationship chains, essential for network analysis, recommendation systems, and logistics optimization.

Shortest Path Queries

Find the most efficient route between two nodes:

-- Single shortest path
MATCH path = SHORTEST (a:Person {name: 'Alice'})-[:KNOWS*]->(b:Person {name: 'Bob'})
RETURN path, length(path) AS hops;

-- All shortest paths (when multiple exist)
MATCH paths = ALL SHORTEST (a:City {name: 'San Francisco'})
  -[:FLIGHT*]->
  (b:City {name: 'New York'})
RETURN paths,
  [r IN relationships(paths) | r.airline] AS airlines,
  reduce(cost = 0, r IN relationships(paths) | cost + r.price) AS total_cost;

K-Shortest Paths

Find multiple alternative paths:

-- Top 3 shortest paths with total distance
MATCH paths = K SHORTEST 3
  (start:Location {id: 'warehouse_1'})
  -[:ROUTE*]->
  (end:Location {id: 'customer_42'})
RETURN paths,
  reduce(dist = 0, r IN relationships(paths) | dist + r.distance) AS total_distance
ORDER BY total_distance;

Path Filtering

Apply constraints on intermediate nodes or relationships:

-- Find paths through active accounts only
MATCH path = (sender:Account {id: $from})
  -[:TRANSFER*2..5 (r, n | n.status = 'active' AND r.amount > 1000)]->
  (receiver:Account {id: $to})
WHERE all(r IN relationships(path) WHERE r.timestamp >= DATE '2024-01-01')
RETURN path;

Centrality Measures

Centrality algorithms identify the most important nodes in a graph based on different criteria.

Degree Centrality

Count of direct connections:

-- Find most connected users
MATCH (u:User)
OPTIONAL MATCH (u)-[r:FOLLOWS]-()
RETURN u.id, u.name, count(r) AS connections
ORDER BY connections DESC
LIMIT 20;

-- In-degree vs out-degree
MATCH (u:User)
OPTIONAL MATCH (u)<-[:FOLLOWS]-(followers)
OPTIONAL MATCH (u)-[:FOLLOWS]->(following)
RETURN
  u.name,
  count(DISTINCT followers) AS follower_count,
  count(DISTINCT following) AS following_count,
  count(DISTINCT followers) * 1.0 / nullif(count(DISTINCT following), 0) AS follower_ratio
ORDER BY follower_count DESC;

Betweenness Centrality

Measures how often a node appears on shortest paths between other nodes (bridge/bottleneck detection):

-- Identify key intermediaries in network
MATCH (p:Person)
CALL graph.betweenness_centrality(p, 'KNOWS') AS bc
RETURN p.name, bc.score
ORDER BY bc.score DESC
LIMIT 10;

PageRank and Influence

Recursive importance based on connections from important nodes:

-- Calculate PageRank scores
MATCH (page:WebPage)
CALL graph.pagerank(page, 'LINKS_TO', {
  iterations: 20,
  damping: 0.85
}) AS pr
RETURN page.url, pr.score
ORDER BY pr.score DESC
LIMIT 50;

-- Social influence ranking
MATCH (u:User)
CALL graph.pagerank(u, 'FOLLOWS') AS influence
WHERE influence.score > 0.01
RETURN u.name, influence.score, u.followers_count
ORDER BY influence.score DESC;

Community Detection

Identify clusters and groupings in your graph.

Label Propagation

Fast algorithm for community detection:

-- Detect communities in social network
MATCH (u:User)
CALL graph.label_propagation(u, 'FOLLOWS', 10) AS community
RETURN community.id, collect(u.name) AS members, count(u) AS size
ORDER BY size DESC;

Connected Components

Find disconnected subgraphs:

-- Identify isolated clusters
MATCH (n:Node)
CALL graph.connected_components(n, 'CONNECTS') AS component
RETURN component.id, count(n) AS size, collect(n.id) AS nodes
HAVING size > 1
ORDER BY size DESC;

Modularity-Based Clustering

-- High-quality community detection
MATCH (u:User)
CALL graph.louvain(u, 'INTERACTS', {
  levels: 3,
  resolution: 1.0
}) AS cluster
RETURN
  cluster.level,
  cluster.community_id,
  collect(u.name) AS members,
  count(u) AS size
ORDER BY cluster.level, size DESC;

Pattern Matching and Discovery

GQL’s powerful pattern matching enables complex analytical queries.

Triangle Counting

Measure clustering coefficient:

-- Find triangles in social network
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)-[:KNOWS]->(a)
WHERE a.id < b.id AND b.id < c.id  -- Avoid duplicates
RETURN count(*) AS triangle_count;

-- Triangles involving specific user
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(b)-[:KNOWS]->(c)-[:KNOWS]->(a)
RETURN DISTINCT b.name, c.name;

Cycle Detection

Find circular dependencies or loops:

-- Detect circular dependencies in task graph
MATCH path = (t:Task)-[:DEPENDS_ON*2..10]->(t)
RETURN path, length(path) AS cycle_length
ORDER BY cycle_length;

Motif Discovery

Find specific subgraph patterns:

-- Find "star" pattern: one person followed by many
MATCH (influencer:Person)<-[:FOLLOWS]-(followers:Person)
WITH influencer, collect(followers) AS follower_list
WHERE size(follower_list) > 100
RETURN influencer.name, size(follower_list) AS follower_count
ORDER BY follower_count DESC;

-- Find "bridge" pattern: users connecting different communities
MATCH (a:Person)-[:KNOWS]->(bridge:Person)-[:KNOWS]->(b:Person)
WHERE a.community_id <> b.community_id
  AND a.community_id <> bridge.community_id
  AND b.community_id <> bridge.community_id
RETURN bridge.name, count(DISTINCT a.community_id) AS communities_bridged
ORDER BY communities_bridged DESC;

Aggregations and Statistics

Basic Aggregations

-- Sales analytics
MATCH (p:Product)<-[:PURCHASED]-(u:User)
RETURN
  p.category,
  count(u) AS total_purchases,
  count(DISTINCT u) AS unique_customers,
  avg(p.price) AS avg_price,
  sum(p.price) AS total_revenue
GROUP BY p.category
ORDER BY total_revenue DESC;

Statistical Functions

-- Price distribution analysis
MATCH (p:Product)
WHERE p.category = 'Electronics'
RETURN
  p.subcategory,
  count(*) AS product_count,
  avg(p.price) AS mean_price,
  stddev(p.price) AS price_stddev,
  percentile(p.price, 0.25) AS q1_price,
  percentile(p.price, 0.50) AS median_price,
  percentile(p.price, 0.75) AS q3_price,
  percentile(p.price, 0.95) AS p95_price,
  min(p.price) AS min_price,
  max(p.price) AS max_price
GROUP BY p.subcategory;

Window Functions

-- Rank products within categories
MATCH (p:Product)
RETURN
  p.category,
  p.name,
  p.sales,
  rank() OVER (PARTITION BY p.category ORDER BY p.sales DESC) AS rank_in_category,
  p.sales - avg(p.sales) OVER (PARTITION BY p.category) AS deviation_from_avg
ORDER BY p.category, rank_in_category;

Time-Series Analytics

Geode’s temporal support enables time-based graph analysis.

Temporal Aggregations

-- Sales trends over time
MATCH (s:Sale)
WHERE s.timestamp >= DATE '2024-01-01'
RETURN
  date_trunc('month', s.timestamp) AS month,
  count(*) AS sale_count,
  sum(s.amount) AS revenue,
  avg(s.amount) AS avg_order_value
GROUP BY month
ORDER BY month;

Temporal Pattern Analysis

-- User activity patterns by day of week
MATCH (u:User)-[:LOGGED_IN]->(session:Session)
RETURN
  extract(dow from session.timestamp) AS day_of_week,
  extract(hour from session.timestamp) AS hour,
  count(*) AS session_count
GROUP BY day_of_week, hour
ORDER BY day_of_week, hour;

Change Detection

-- Identify sudden spikes in activity
WITH
  DATE '2024-01-01' AS start_date,
  DATE '2024-12-31' AS end_date
MATCH (e:Event)
WHERE e.timestamp >= start_date AND e.timestamp <= end_date
WITH
  date_trunc('day', e.timestamp) AS day,
  count(*) AS daily_count
WITH
  day,
  daily_count,
  avg(daily_count) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
WHERE daily_count > moving_avg * 2  -- Spike: 2x the moving average
RETURN day, daily_count, moving_avg
ORDER BY day;

Real-Time Analytics

Streaming Aggregations

-- Recent activity dashboard (last 15 minutes)
MATCH (u:User)-[:ACTION]->(a:Action)
WHERE a.timestamp >= CURRENT_TIMESTAMP - DURATION 'PT15M'
RETURN
  a.type,
  count(*) AS action_count,
  count(DISTINCT u) AS unique_users,
  max(a.timestamp) AS last_occurrence
GROUP BY a.type
ORDER BY action_count DESC;

Live Monitoring

-- Active sessions right now
MATCH (u:User)-[:HAS_SESSION]->(s:Session)
WHERE s.started <= CURRENT_TIMESTAMP
  AND (s.ended IS NULL OR s.ended >= CURRENT_TIMESTAMP)
RETURN count(DISTINCT u) AS active_users, count(s) AS active_sessions;

Business Intelligence Use Cases

Customer Analytics

Customer Segmentation

-- RFM analysis (Recency, Frequency, Monetary)
MATCH (c:Customer)-[:MADE]->(o:Order)
WITH c,
  max(o.date) AS last_order_date,
  count(o) AS order_count,
  sum(o.total) AS total_spent
RETURN
  c.id,
  c.name,
  CURRENT_DATE - last_order_date AS days_since_last_order,
  order_count,
  total_spent,
  CASE
    WHEN days_since_last_order <= 30 AND total_spent > 1000 THEN 'VIP'
    WHEN days_since_last_order <= 90 AND order_count > 5 THEN 'Loyal'
    WHEN days_since_last_order > 180 THEN 'At Risk'
    ELSE 'Regular'
  END AS segment
ORDER BY total_spent DESC;

Customer Lifetime Value

-- Calculate CLV projection
MATCH (c:Customer)-[:MADE]->(o:Order)
WITH c,
  count(o) AS order_count,
  sum(o.total) AS total_spent,
  avg(o.total) AS avg_order_value,
  min(o.date) AS first_order,
  max(o.date) AS last_order
WITH c, order_count, total_spent, avg_order_value,
  (last_order - first_order) AS customer_age_days,
  (order_count * 1.0 / nullif((last_order - first_order), 0)) AS orders_per_day
RETURN
  c.id,
  total_spent AS historical_value,
  avg_order_value * orders_per_day * 365 AS projected_annual_value,
  total_spent + (avg_order_value * orders_per_day * 365 * 3) AS projected_3yr_ltv
ORDER BY projected_3yr_ltv DESC
LIMIT 100;

Recommendation Analytics

Collaborative Filtering

-- Find similar users based on purchase behavior
MATCH (u1:User {id: $user_id})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(u2:User)
WHERE u1 <> u2
WITH u2, count(DISTINCT p) AS common_products
ORDER BY common_products DESC
LIMIT 10
MATCH (u2)-[:PURCHASED]->(recommended:Product)
WHERE NOT EXISTS {
  MATCH (u1)-[:PURCHASED]->(recommended)
}
RETURN
  recommended.name,
  count(u2) AS recommended_by,
  avg(recommended.rating) AS avg_rating
GROUP BY recommended.name
ORDER BY recommended_by DESC, avg_rating DESC
LIMIT 20;

Content-Based Recommendations

-- Recommend similar products
MATCH (p:Product {id: $product_id})
MATCH (similar:Product)
WHERE similar <> p
  AND similar.category = p.category
  AND abs(similar.price - p.price) < 50
OPTIONAL MATCH (similar)<-[:PURCHASED]-(u:User)-[:PURCHASED]->(p)
RETURN
  similar.name,
  similar.price,
  count(u) AS also_bought_count,
  similar.rating
ORDER BY also_bought_count DESC, similar.rating DESC
LIMIT 10;

Fraud Detection

-- Detect suspicious transaction patterns
MATCH (a1:Account)-[t1:TRANSFER]->(a2:Account)-[t2:TRANSFER]->(a3:Account)
WHERE t1.timestamp >= CURRENT_TIMESTAMP - DURATION 'PT1H'
  AND t2.timestamp > t1.timestamp
  AND t2.timestamp <= t1.timestamp + DURATION 'PT5M'
  AND t1.amount > 10000
  AND t2.amount >= t1.amount * 0.9
RETURN
  a1.id AS source,
  a2.id AS intermediary,
  a3.id AS destination,
  t1.amount AS initial_amount,
  t2.amount AS transfer_amount,
  t2.timestamp - t1.timestamp AS time_diff
ORDER BY initial_amount DESC;

Supply Chain Analytics

-- Analyze supply chain dependencies
MATCH path = (supplier:Company {type: 'supplier'})
  -[:SUPPLIES*]->
  (manufacturer:Company {type: 'manufacturer'})
  -[:SHIPS_TO]->
  (distributor:Company {type: 'distributor'})
RETURN
  supplier.name,
  manufacturer.name,
  distributor.name,
  length(path) AS supply_chain_length,
  reduce(time = 0, r IN relationships(path) | time + r.lead_time_days) AS total_lead_time
ORDER BY total_lead_time DESC;

Performance Optimization for Analytics

Index Strategy

-- Create indexes for analytical queries
CREATE INDEX user_timestamp ON User(last_active);
CREATE INDEX product_category_price ON Product(category, price);
CREATE INDEX order_customer_date ON Order(customer_id, order_date);

Query Profiling

-- Profile analytical query
PROFILE
MATCH (u:User)-[:PURCHASED]->(p:Product)
WHERE p.category = 'Electronics'
  AND u.country = 'USA'
RETURN
  p.subcategory,
  count(u) AS customers,
  sum(p.price) AS revenue
GROUP BY p.subcategory;

Materialized Views

-- Pre-compute expensive analytics
CREATE MATERIALIZED VIEW daily_sales_summary AS
  MATCH (o:Order)
  WHERE o.date >= CURRENT_DATE - DURATION 'P90D'
  RETURN
    date_trunc('day', o.date) AS day,
    o.region,
    count(*) AS order_count,
    sum(o.total) AS revenue,
    avg(o.total) AS avg_order_value
  GROUP BY day, o.region;

-- Use materialized view
MATCH (s:daily_sales_summary)
WHERE s.region = 'West' AND s.day >= DATE '2024-01-01'
RETURN s.day, s.revenue
ORDER BY s.day;

Comparison with Other Analytics Platforms

vs. Traditional OLAP

Relationship Analysis: Geode excels at multi-hop queries; OLAP struggles with joins
Real-Time: Geode supports real-time analytics; OLAP typically batch-oriented
Schema: Graph schema more flexible than star/snowflake schemas

vs. Spark GraphX

Query Language: GQL (declarative) vs. Scala/Python (procedural)
Latency: Geode optimized for low-latency queries; Spark for batch processing
Integration: Geode native graph DB; Spark requires data loading

vs. Neo4j

Standards: Geode GQL standard; Neo4j proprietary Cypher
Performance: Similar for most analytics; Geode faster for distributed workloads
Built-in Algorithms: Both support graph algorithms; different implementations

Best Practices

Index Frequently Filtered Properties: Speed up WHERE clauses
Use LIMIT: Prevent excessive memory use in analytical queries
Profile Before Optimizing: Identify actual bottlenecks
Materialize Expensive Queries: Cache common analytics
Batch Where Possible: Process historical analytics in batches
Monitor Resource Usage: Graph analytics can be memory-intensive

Getting Started

-- Import sample dataset
CALL graph.load_sample('analytics_demo');

-- Run basic analytics
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN p.category, count(u) AS customers
ORDER BY customers DESC;

-- Try path analysis
MATCH path = SHORTEST (a:User)-[:KNOWS*]-(b:User)
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN path;

Explore the documentation below for detailed guides on specific analytical techniques, optimization strategies, and industry-specific use cases.

Graph Analytics in Geode Share link

Why Graph Analytics? Share link

Core Analytical Capabilities Share link

Path Analysis Share link

Shortest Path Queries Share link

K-Shortest Paths Share link

Path Filtering Share link

Centrality Measures Share link

Degree Centrality Share link

Betweenness Centrality Share link

PageRank and Influence Share link

Community Detection Share link

Label Propagation Share link

Connected Components Share link

Modularity-Based Clustering Share link

Pattern Matching and Discovery Share link

Triangle Counting Share link

Cycle Detection Share link

Motif Discovery Share link

Aggregations and Statistics Share link

Basic Aggregations Share link

Statistical Functions Share link

Window Functions Share link

Time-Series Analytics Share link

Temporal Aggregations Share link

Temporal Pattern Analysis Share link

Change Detection Share link

Real-Time Analytics Share link

Streaming Aggregations Share link

Live Monitoring Share link

Business Intelligence Use Cases Share link

Customer Analytics Share link

Customer Segmentation Share link

Customer Lifetime Value Share link

Recommendation Analytics Share link

Collaborative Filtering Share link

Content-Based Recommendations Share link

Fraud Detection Share link

Supply Chain Analytics Share link

Performance Optimization for Analytics Share link

Index Strategy Share link

Query Profiling Share link

Materialized Views Share link

Comparison with Other Analytics Platforms Share link

vs. Traditional OLAP Share link

vs. Spark GraphX Share link

vs. Neo4j Share link

Best Practices Share link

Getting Started Share link

Related Articles

Graph Algorithms and Analytics

Aggregation Functions

Data Export Guide

Analytics

Materialized Views System