Overview

The Geode Materialized Views System provides comprehensive view management with intelligent refresh strategies, dependency tracking, and query optimization. This system improves query performance for complex analytical queries by pre-computing and storing query results with enterprise-grade reliability.

What are Materialized Views?

Materialized Views are pre-computed query results stored as physical data structures that can be queried like regular tables. Unlike regular views (which re-execute the query each time), materialized views cache results for instant access.

Key Benefits:

  • Query Acceleration: Faster access to pre-computed results
  • Resource Efficiency: Compute once, query many times
  • Automatic Refresh: Keep data current with configurable strategies
  • Dependency Tracking: Automatic updates when source data changes
  • Query Rewriting: Optimizer automatically uses views when beneficial

Architecture

System Components

MaterializedViewManager
├── ViewCatalog          - Metadata and storage management
├── DependencyTracker    - Table-view dependency detection
├── RefreshScheduler     - Priority-based refresh scheduling
├── QueryRewriter        - Automatic query optimization using views
└── ViewMetrics         - Performance monitoring and statistics

Data Flow

1. CREATE MATERIALIZED VIEW → Parse query, extract dependencies
2. ViewCatalog → Store metadata (source query, refresh strategy)
3. Initial Refresh → Execute source query, store results
4. DependencyTracker → Monitor source table changes
5. RefreshScheduler → Trigger refresh based on strategy
6. QueryRewriter → Match incoming queries to compatible views
7. Result → Return cached data (faster than re-computing)

Refresh Strategies

Manual Refresh

Use Case: Full control over refresh timing, analytical reports

-- Create view with manual refresh
CREATE MATERIALIZED VIEW quarterly_analytics AS
MATCH (s:Sale)-[:CONTAINS]->(p:Product)
WHERE s.quarter = 'Q4-2025'
RETURN p.category, sum(s.amount) AS total_sales, count(s) AS sale_count
WITH REFRESH STRATEGY manual

Characteristics:

  • Refresh only on explicit command
  • Lowest resource overhead
  • Best for infrequently changing data
  • Suitable for monthly/quarterly reports

Refresh Command:

-- Manually refresh view
REFRESH MATERIALIZED VIEW quarterly_analytics

Auto-Immediate Refresh

Use Case: Real-time dashboards, critical metrics

-- Create view with immediate refresh
CREATE MATERIALIZED VIEW real_time_metrics AS
MATCH (o:Order)
WHERE o.created_date > datetime().subtract(PT1H)
RETURN o.status, count(o) AS order_count, sum(o.total) AS revenue
WITH REFRESH STRATEGY auto_immediate

Characteristics:

  • Refreshes immediately when source data changes
  • Always returns current data
  • Higher resource usage
  • Best for mission-critical dashboards

Behavior:

1. Transaction commits to Order table
2. DependencyTracker detects change
3. RefreshScheduler queues immediate refresh
4. View refreshes within seconds
5. Next query sees updated data

Auto-Scheduled Refresh

Use Case: Regular reports, hourly/daily aggregations

-- Create view with hourly refresh
CREATE MATERIALIZED VIEW user_activity_summary AS
MATCH (u:User)-[a:ACTION]->(t:Target)
WHERE a.timestamp > datetime().subtract(P30D)
RETURN u.department, t.type, count(a) AS action_count, avg(a.duration) AS avg_duration
WITH REFRESH STRATEGY auto_scheduled INTERVAL 3600  -- Every hour

Characteristics:

  • Predictable refresh schedule
  • Balanced resource usage
  • Data freshness within refresh interval
  • Best for periodic analytics

Configuration:

# Default intervals
default_refresh_intervals:
  high_priority: 900    # 15 minutes
  normal: 3600          # 1 hour
  low_priority: 86400   # 24 hours

Incremental Refresh

Use Case: Large views with small updates

-- Create view with incremental refresh
CREATE MATERIALIZED VIEW order_summary AS
MATCH (o:Order)
RETURN o.date, o.region, count(o) AS order_count, sum(o.amount) AS revenue
WITH REFRESH STRATEGY incremental

Characteristics:

  • Only processes changed rows
  • Dramatically faster for large datasets
  • Requires change tracking
  • Best for append-only or small update patterns

Behavior:

Initial Refresh: Process all 1M rows → 60 seconds
Incremental Refresh: Process 1K changed rows → 0.5 seconds (120x faster)

Streaming Refresh

Use Case: Continuous data feeds, real-time analytics

-- Create streaming view
CREATE MATERIALIZED VIEW live_sensor_data AS
MATCH (s:Sensor)-[:REPORTS]->(m:Measurement)
WHERE m.timestamp > datetime().subtract(PT5M)
RETURN s.id, avg(m.value) AS avg_value, max(m.value) AS max_value
WITH REFRESH STRATEGY streaming

Characteristics:

  • Continuous incremental updates
  • Sub-second data freshness
  • Integrates with event streams
  • Best for IoT, monitoring, real-time dashboards

Creating Materialized Views

Basic Syntax

CREATE MATERIALIZED VIEW view_name AS
<GQL query>
[WITH REFRESH STRATEGY <strategy> [INTERVAL <seconds>]]

Simple Aggregation

-- Product sales summary
CREATE MATERIALIZED VIEW product_sales AS
MATCH (p:Product)<-[:CONTAINS]-(s:Sale)
RETURN p.category,
       count(s) AS sale_count,
       sum(s.amount) AS total_revenue,
       avg(s.amount) AS avg_sale
WITH REFRESH STRATEGY auto_scheduled INTERVAL 3600

Benefit: Returns pre-computed results instead of scanning source data.

Complex Join

-- Customer behavior analysis
CREATE MATERIALIZED VIEW customer_insights AS
MATCH (c:Customer)-[:PLACED]->(o:Order)-[:CONTAINS]->(p:Product)
WHERE o.created_date > datetime().subtract(P90D)
RETURN c.segment,
       c.region,
       count(DISTINCT o) AS order_count,
       count(DISTINCT p) AS unique_products,
       sum(o.total) AS lifetime_value,
       avg(o.total) AS avg_order_value
WITH REFRESH STRATEGY auto_scheduled INTERVAL 7200

Benefits:

  • Pre-computed joins eliminate join cost
  • Aggregations computed once
  • Perfect for dashboard queries

Time-Series Data

-- Daily metrics rollup
CREATE MATERIALIZED VIEW daily_metrics AS
MATCH (m:Metric)
WHERE m.timestamp > datetime().subtract(P365D)
RETURN date(m.timestamp) AS metric_date,
       m.type,
       count(m) AS count,
       avg(m.value) AS avg_value,
       percentile(m.value, 0.95) AS p95_value,
       percentile(m.value, 0.99) AS p99_value
WITH REFRESH STRATEGY auto_scheduled INTERVAL 86400  -- Daily

Use Case: Historical trend analysis, SLA reporting

Querying Materialized Views

Direct Query

-- Query the materialized view directly
MATCH (v:product_sales)
WHERE v.category = 'Electronics'
RETURN v.category, v.total_revenue
ORDER BY v.total_revenue DESC

Characteristics:

  • Instant results from cached data
  • No re-computation
  • Limited to pre-computed columns

Automatic Query Rewriting

Geode’s optimizer automatically rewrites queries to use compatible materialized views:

Original Query:

MATCH (p:Product)<-[:CONTAINS]-(s:Sale)
WHERE p.category = 'Electronics'
RETURN p.category, sum(s.amount) AS revenue

Optimizer Rewrite:

-- Automatically uses product_sales view
MATCH (v:product_sales)
WHERE v.category = 'Electronics'
RETURN v.category, v.total_revenue AS revenue

Performance Impact:

Original: 1.8 seconds
Rewritten: 0.015 seconds (120x speedup)

View Compatibility

Compatible Query (uses view):

-- Matches view structure
MATCH (p:Product)<-[:CONTAINS]-(s:Sale)
RETURN p.category, count(s)
--  Uses product_sales view

Incompatible Query (doesn’t use view):

-- Different aggregation
MATCH (p:Product)<-[:CONTAINS]-(s:Sale)
RETURN p.brand, count(s)  -- Grouped by brand, not category
--  Cannot use product_sales view

Dependency Tracking

Automatic Dependency Detection

Geode automatically extracts table and relationship dependencies:

CREATE MATERIALIZED VIEW user_summary AS
MATCH (u:User)-[:POSTED]->(p:Post)-[:HAS_TAG]->(t:Tag)
RETURN u.name, count(p) AS post_count, collect(DISTINCT t.name) AS tags

Detected Dependencies:

  • Tables: User, Post, Tag
  • Relationships: POSTED, HAS_TAG
  • Change to any triggers refresh consideration

Cascading Refreshes

Dependency Chain:

-- Base view
CREATE MATERIALIZED VIEW daily_sales AS
MATCH (s:Sale)
RETURN date(s.created_date) AS sale_date, sum(s.amount) AS revenue

-- Dependent view
CREATE MATERIALIZED VIEW monthly_sales AS
MATCH (d:daily_sales)
RETURN year_month(d.sale_date) AS month, sum(d.revenue) AS monthly_revenue

Refresh Behavior:

1. Sale table updated
2. daily_sales refresh triggered
3. DependencyTracker detects monthly_sales depends on daily_sales
4. monthly_sales refresh queued after daily_sales completes
5. Cascade preserves consistency

Change Detection

Change Log:

pub const ChangeLogEntry = struct {
    table_name: []const u8,
    change_type: ChangeType,  // INSERT, UPDATE, DELETE
    timestamp: i64,
    row_count: u32,
    processed: bool,
};

Incremental Refresh Logic:

1. Track changes since last refresh
2. Apply only delta changes to view
3. Update view statistics
4. Mark changes as processed

Performance Monitoring

View Statistics

-- Get view performance metrics
CALL db.view.stats('product_sales')

Output:

{
  "total_refreshes": 240,
  "successful_refreshes": 238,
  "failed_refreshes": 2,
  "average_refresh_time_ms": 1250,
  "total_query_hits": 15420,
  "cache_hit_ratio": 0.98,
  "last_refresh": "2026-01-24T10:30:00Z",
  "next_refresh": "2026-01-24T11:30:00Z",
  "storage_mb": 45.2
}

System Statistics

-- Get all views overview
CALL db.view.list()

Output:

{
  "total_views": 12,
  "active_refreshes": 2,
  "total_storage_gb": 2.4,
  "average_cache_hit_ratio": 0.95,
  "refresh_success_rate": 0.992
}

Performance Metrics

Collected Metrics:

  • total_refreshes: Lifetime refresh count
  • successful_refreshes: Successful refresh operations
  • failed_refreshes: Failed refresh attempts
  • average_refresh_time_ms: Mean refresh duration
  • total_query_hits: View usage count
  • last_query_hit_timestamp: Most recent query
  • cache_hit_ratio: Queries served from view vs re-executed
  • storage_efficiency: Compression ratio vs raw data

Configuration

View Manager Settings

# config/views.yaml
materialized_views:
  max_views: 1000
  max_view_size_mb: 1024
  default_refresh_interval_seconds: 3600
  enable_auto_refresh: true
  enable_incremental_refresh: true
  max_concurrent_refreshes: 4
  refresh_timeout_seconds: 1800

Options Explained:

  • max_views: Maximum concurrent materialized views
  • max_view_size_mb: Per-view storage limit
  • default_refresh_interval_seconds: Default auto-scheduled interval
  • enable_auto_refresh: Global auto-refresh toggle
  • enable_incremental_refresh: Enable incremental refresh capability
  • max_concurrent_refreshes: Parallel refresh limit
  • refresh_timeout_seconds: Refresh operation timeout

Refresh Priority

pub const RefreshPriority = enum {
    immediate,  // <1 minute
    high,       // <15 minutes
    normal,     // <1 hour
    low,        // <24 hours
    background, // Best effort
};

Priority Queue:

Immediate → High → Normal → Low → Background

Scheduling:
- Immediate: Pre-empts normal operations
- High: Priority over normal queries
- Normal: Standard priority
- Low: Off-peak hours preferred
- Background: Idle time only

Advanced Features

Partitioned Views

-- Create partitioned view by date
CREATE MATERIALIZED VIEW sales_by_date PARTITION BY date AS
MATCH (s:Sale)
RETURN date(s.created_date) AS sale_date,
       s.region,
       sum(s.amount) AS revenue
WITH REFRESH STRATEGY incremental

Benefits:

  • Refresh only affected partitions
  • Parallel partition processing
  • Efficient data pruning
  • Better compression

View Chaining

-- Base view
CREATE MATERIALIZED VIEW hourly_metrics AS ...

-- Derived view (uses base view)
CREATE MATERIALIZED VIEW daily_summary AS
MATCH (h:hourly_metrics)
RETURN date(h.timestamp) AS metric_date,
       avg(h.value) AS avg_daily_value
WITH REFRESH STRATEGY auto_scheduled INTERVAL 86400

Benefits:

  • Compose complex analytics pipelines
  • Reuse intermediate results
  • Optimize storage (base view can be pruned)

Custom Refresh Logic

// Custom refresh trigger
pub fn refreshOnThreshold(
    view_name: []const u8,
    threshold_rows: u32,
) !void {
    const change_count = try getChangeCount(view_name);
    if (change_count >= threshold_rows) {
        try refreshView(view_name);
    }
}

Use Cases

Real-Time Dashboards

-- Active users dashboard
CREATE MATERIALIZED VIEW active_users_now AS
MATCH (u:User)-[:SESSION]->(s:Session)
WHERE s.last_seen > datetime().subtract(PT15M)
RETURN count(DISTINCT u) AS active_users,
       count(s) AS total_sessions,
       avg(s.duration_seconds) AS avg_session_duration
WITH REFRESH STRATEGY auto_immediate

Dashboard Integration:

// Frontend refreshes every 30 seconds
setInterval(async () => {
  const metrics = await query('MATCH (v:active_users_now) RETURN v');
  updateDashboard(metrics);
}, 30000);

Analytical Reporting

-- Monthly sales report
CREATE MATERIALIZED VIEW monthly_sales_report AS
MATCH (s:Sale)-[:SOLD_BY]->(rep:SalesRep)
WHERE s.created_date >= startOfMonth()
RETURN rep.name,
       rep.region,
       count(s) AS deals_closed,
       sum(s.amount) AS total_revenue,
       avg(s.amount) AS avg_deal_size,
       percentile(s.amount, 0.5) AS median_deal
WITH REFRESH STRATEGY auto_scheduled INTERVAL 3600

Customer Segmentation

-- Customer lifetime value segments
CREATE MATERIALIZED VIEW customer_segments AS
MATCH (c:Customer)-[:PLACED]->(o:Order)
WITH c, sum(o.total) AS ltv, count(o) AS order_count
RETURN CASE
  WHEN ltv > 10000 THEN 'VIP'
  WHEN ltv > 5000 THEN 'High Value'
  WHEN ltv > 1000 THEN 'Medium Value'
  ELSE 'New Customer'
END AS segment,
count(c) AS customer_count,
avg(ltv) AS avg_ltv,
avg(order_count) AS avg_orders
WITH REFRESH STRATEGY auto_scheduled INTERVAL 86400

Troubleshooting

Refresh Failures

Issue: View refresh fails with timeout

Diagnosis:

-- Check view status
CALL db.view.status('slow_view')

Solution:

# Increase refresh timeout
materialized_views:
  refresh_timeout_seconds: 3600  # 1 hour

# Or optimize source query
# Add indexes on frequently joined columns
# Reduce aggregation complexity
# Consider partitioning

Issue: Incremental refresh not working

Diagnosis:

-- Verify change tracking
CALL db.view.changes('view_name')

Solution:

-- Rebuild view with change tracking
DROP MATERIALIZED VIEW old_view
CREATE MATERIALIZED VIEW new_view AS ...
WITH REFRESH STRATEGY incremental ENABLE_CHANGE_TRACKING true

Performance Issues

Issue: View queries slower than expected

Analysis:

PROFILE MATCH (v:product_sales)
WHERE v.category = 'Electronics'
RETURN v

Solutions:

  1. Add indexes on view columns:

    CREATE INDEX view_category_idx ON product_sales (category)
    
  2. Partition large views:

    CREATE MATERIALIZED VIEW partitioned_sales PARTITION BY region AS ...
    
  3. Prune old data:

    ALTER MATERIALIZED VIEW sales_view SET RETENTION P90D
    

Best Practices

View Design

  1. Appropriate Granularity:

    --  Good: Aggregate to useful level
    CREATE MATERIALIZED VIEW daily_sales AS ...  -- Daily granularity
    
    --  Bad: Too granular
    CREATE MATERIALIZED VIEW second_by_second AS ...  -- Too detailed
    
  2. Include Filtering Columns:

    --  Good: Include common filter columns
    RETURN sale_date, region, category, revenue
    
    --  Bad: Missing important dimensions
    RETURN revenue  -- Can't filter by date/region
    
  3. Balance Freshness vs Cost:

    -- Dashboard: Immediate refresh
    WITH REFRESH STRATEGY auto_immediate
    
    -- Reports: Scheduled refresh
    WITH REFRESH STRATEGY auto_scheduled INTERVAL 3600
    
    -- Archives: Manual refresh
    WITH REFRESH STRATEGY manual
    

Refresh Strategy Selection

Use CaseStrategyIntervalNotes
Live dashboardsauto_immediateN/AAlways current
Hourly reportsauto_scheduled3600Predictable
Daily analyticsauto_scheduled86400Overnight refresh
Large aggregationsincremental3600Process deltas only
Archive reportsmanualN/AOn-demand
Real-time feedsstreamingN/AContinuous updates

Monitoring

Automated Checks:

# Cron job: Check view health
*/15 * * * * geode query "CALL db.view.health_check()" | mail -s "View Status" [email protected]

# Alert on refresh failures
geode query "CALL db.view.failed_refreshes()" | \
  jq 'select(.failed_count > 0)' | \
  mail -s "ALERT: View refresh failures" [email protected]

References

Documentation

  • Implementation: src/query/enhanced_materialized_views.zig
  • Tests: tests/test_materialized_views_enhanced.zig
  • Integration: tests/test_materialized_views_enhanced_integration.zig
  • Source Docs: docs/MATERIALIZED_VIEWS.md

Next Steps

For New Users:

For Advanced Users:

For Administrators:


Document Version: 1.0 Last Updated: January 24, 2026 Status: Production Ready Test Coverage: 23 tests (15 unit + 8 integration) Concurrency: Tested with 50+ concurrent views