Tag: Indexes

Documentation tagged with indexes in the Geode graph database. This comprehensive collection covers index design, implementation strategies, performance optimization, maintenance procedures, and best practices for leveraging indexes to accelerate graph queries and ensure optimal database performance.

Overview

Indexes are the cornerstone of database performance, dramatically reducing query execution time by providing fast access paths to data. In Geode, indexes accelerate node lookups, property searches, relationship traversals, and complex pattern matching operations. Understanding index design and management is essential for building high-performance graph applications.

Key benefits of indexes:

Fast Lookups: O(log n) access instead of O(n) scans
Efficient Filtering: Quick predicate evaluation on indexed properties
Relationship Traversals: Accelerated edge navigation
Unique Constraints: Enforce data integrity at database level
Query Optimization: Enable better execution plans

Index Types

Property Indexes

Index individual node or edge properties:

Basic Property Index:

// Create index on single property
CREATE INDEX user_email_idx ON User(email);

// Accelerates queries like:
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u;

// Performance: Significantly faster than full scans (workload dependent)

Benefits:

Equality lookups (property = value)
Range queries (property > value, property BETWEEN x AND y)
Ordering (ORDER BY indexed_property)
Existence checks (WHERE property IS NOT NULL)

Unique Indexes

Enforce uniqueness while providing fast access:

Unique Constraint:

// Create unique index
CREATE UNIQUE INDEX user_id_idx ON User(user_id);

// Enforces constraint:
// - No two User nodes can have same user_id
// - NULL values allowed (treated as distinct)
// - Provides index performance benefits

// Accelerates lookups:
MATCH (u:User {user_id: '12345'})
RETURN u;

Use Cases:

Primary keys (user IDs, email addresses)
External identifiers (SSN, account numbers)
Natural keys (ISBN, SKU)
Usernames and handles

Composite Indexes

Index multiple properties together:

Multi-Column Index:

// Create composite index
CREATE INDEX user_location_idx ON User(city, state, country);

// Optimizes queries using indexed prefix:
// ✓ Efficient: Uses index fully
MATCH (u:User)
WHERE u.city = 'Seattle' AND u.state = 'WA'
RETURN u;

// ✓ Efficient: Uses index on city
MATCH (u:User)
WHERE u.city = 'Seattle'
RETURN u;

// ✗ Inefficient: Can't use index (no prefix match)
MATCH (u:User)
WHERE u.state = 'WA'
RETURN u;

Index Prefix Rule:

Composite indexes match queries left-to-right:

// Index: (a, b, c)

// Can optimize:
// - WHERE a = x
// - WHERE a = x AND b = y
// - WHERE a = x AND b = y AND c = z

// Cannot optimize:
// - WHERE b = y
// - WHERE c = z
// - WHERE b = y AND c = z

Covering Indexes

Include all query columns in index:

Index-Only Scans:

// Create covering index
CREATE INDEX user_profile_idx ON User(email, name, created_at);

// Query satisfied entirely from index:
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u.name, u.created_at;

// No node access needed - all data in index
// Performance: 2-5x faster than regular index

Design Strategy:

// Identify frequent query pattern
SELECT name, email, created_at
FROM User
WHERE status = 'active' AND region = 'West';

// Create covering index:
// - Filter columns first (status, region)
// - Return columns last (name, email, created_at)
CREATE INDEX user_active_region_idx
ON User(status, region, name, email, created_at);

Full-Text Indexes

Search text content efficiently:

Text Search Index:

// Create full-text index
CREATE FULLTEXT INDEX post_content_idx ON Post(title, body);

// Enables text search:
MATCH (p:Post)
WHERE p.title CONTAINS 'graph database'
   OR p.body CONTAINS 'graph database'
RETURN p
ORDER BY p.relevance DESC;

// Supports:
// - Keyword search
// - Phrase matching
// - Relevance ranking
// - Stemming and tokenization

Spatial Indexes

Optimize geographic queries:

Geo Index:

// Create spatial index for location data
CREATE SPATIAL INDEX location_idx ON Place(latitude, longitude);

// Accelerates proximity queries:
MATCH (p:Place)
WHERE distance(p.location, point({latitude: 47.6, longitude: -122.3})) < 10000
RETURN p.name, p.location
ORDER BY distance(p.location, point({latitude: 47.6, longitude: -122.3}));

// Supports:
// - Distance calculations
// - Bounding box queries
// - K-nearest neighbors

Index Design

Design Principles

Cardinality Considerations:

// High cardinality - excellent for indexes
// - email (millions of unique values)
// - user_id (all unique)
// - order_number (all unique)
CREATE UNIQUE INDEX user_email_idx ON User(email);

// Medium cardinality - good for indexes
// - city (thousands of values)
// - zip_code (tens of thousands)
// - product_category (hundreds)
CREATE INDEX user_city_idx ON User(city);

// Low cardinality - poor index candidates
// - gender (2-3 values)
// - boolean flags (2 values)
// - status (3-5 values)
// Skip indexing or use in composite index

Selectivity:

Indexes work best when they filter out most rows:

// Highly selective - great index
// Returns <0.1% of rows
MATCH (u:User)
WHERE u.email = 'specific@example.com'
RETURN u;

// Moderately selective - good index
// Returns ~5% of rows
MATCH (u:User)
WHERE u.city = 'Seattle'
RETURN u;

// Low selectivity - poor index
// Returns ~50% of rows
MATCH (u:User)
WHERE u.account_type = 'free'  -- 50% of users
RETURN u;

Index Selection Strategy

Step 1: Analyze Query Patterns:

-- Identify frequent queries
SELECT
  query_text,
  execution_count,
  avg_duration_ms,
  table_scans
FROM system.query_stats
WHERE table_scans > 0
ORDER BY execution_count DESC
LIMIT 50;

Step 2: Identify Filter Columns:

-- Extract WHERE clause columns
-- Query: MATCH (u:User) WHERE u.region = 'West' AND u.status = 'active'
-- Candidates: region, status

-- Query: MATCH (p:Product) WHERE p.category = 'Books' AND p.price > 20
-- Candidates: category, price

Step 3: Prioritize by Impact:

# Calculate index impact score
impact = execution_count * avg_duration * selectivity

# Example:
# Query 1: 10,000 executions/day, 500ms avg, 0.01 selectivity
# Impact: 10,000 * 500 * 0.01 = 50,000

# Query 2: 1,000 executions/day, 100ms avg, 0.1 selectivity
# Impact: 1,000 * 100 * 0.1 = 10,000

# Prioritize Query 1 index

Step 4: Create Strategic Indexes:

-- High-impact indexes first
CREATE INDEX user_region_status_idx ON User(region, status);

-- Measure improvement
EXPLAIN
MATCH (u:User)
WHERE u.region = 'West' AND u.status = 'active'
RETURN u;
-- Should show: Index seek instead of full scan

Index Management

Creating Indexes

Syntax:

-- Basic index
CREATE INDEX index_name ON NodeType(property);

-- Unique index
CREATE UNIQUE INDEX index_name ON NodeType(property);

-- Composite index
CREATE INDEX index_name ON NodeType(prop1, prop2, prop3);

-- Conditional index
CREATE INDEX index_name ON NodeType(property)
WHERE condition;

-- Include additional columns (covering index)
CREATE INDEX index_name ON NodeType(filter_prop)
INCLUDE (return_prop1, return_prop2);

Online Index Creation:

-- Create index without blocking writes
CREATE INDEX CONCURRENTLY user_email_idx ON User(email);

-- Progress tracking
SELECT
  index_name,
  build_progress,
  estimated_completion
FROM system.index_builds
WHERE status = 'in_progress';

Monitoring Indexes

Index Usage Statistics:

-- View index usage
SELECT
  index_name,
  table_name,
  index_scans,
  rows_read,
  last_scan_time,
  index_size_mb
FROM system.index_stats
ORDER BY index_scans DESC;

Unused Indexes:

-- Find indexes that are never used
SELECT
  index_name,
  table_name,
  index_size_mb,
  created_at
FROM system.index_stats
WHERE index_scans = 0
  AND created_at < NOW() - INTERVAL '30 days'
ORDER BY index_size_mb DESC;

Index Efficiency:

-- Identify inefficient indexes
SELECT
  index_name,
  table_name,
  index_scans,
  rows_read / NULLIF(index_scans, 0) AS avg_rows_per_scan,
  index_size_mb
FROM system.index_stats
WHERE index_scans > 0
  AND rows_read / NULLIF(index_scans, 0) > 1000  -- Low selectivity
ORDER BY avg_rows_per_scan DESC;

Index Maintenance

Rebuild Fragmented Indexes:

-- Check index health
SELECT
  index_name,
  fragmentation_percent,
  page_count,
  last_rebuild_time
FROM system.index_health
WHERE fragmentation_percent > 30;

-- Rebuild index
REINDEX INDEX user_email_idx;

-- Rebuild all indexes on table
REINDEX TABLE User;

Update Statistics:

-- Refresh optimizer statistics
ANALYZE User;

-- Full database analysis
ANALYZE;

-- Check statistics freshness
SELECT
  table_name,
  last_analyze_time,
  row_count,
  mod_count
FROM system.table_stats
WHERE last_analyze_time < NOW() - INTERVAL '7 days';

Dropping Indexes

Remove Unused Indexes:

-- Drop single index
DROP INDEX user_old_idx;

-- Drop if exists (safe)
DROP INDEX IF EXISTS user_old_idx;

-- Drop multiple indexes
DROP INDEX user_idx1, user_idx2, user_idx3;

Impact Analysis Before Dropping:

-- Check what queries use an index
SELECT DISTINCT query_text
FROM system.query_plans
WHERE index_name = 'user_region_idx';

Performance Optimization

Index Selection by Query Optimizer

How Optimizer Chooses Indexes:

-- Query with multiple index candidates
MATCH (u:User)
WHERE u.region = 'West' AND u.email = 'user@example.com'
RETURN u;

-- Available indexes:
-- 1. user_region_idx on (region)
-- 2. user_email_idx on (email)

-- Optimizer chooses user_email_idx because:
-- - Higher selectivity (unique email vs common region)
-- - Faster lookup (1 row vs many rows)

Force Index Usage:

-- Force specific index with hint
MATCH (u:User)
USING INDEX user_region_idx
WHERE u.region = 'West'
RETURN u;

Index vs. Table Scan Decision

Cost-Based Optimization:

-- When index is used:
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u;
-- Cost: Index seek (3 I/Os) + 1 row fetch = 4 I/Os

-- When table scan is used:
MATCH (u:User)
WHERE u.account_type = 'premium'
RETURN u;
-- 50% of rows match
-- Cost: Index seek (3 I/Os) + 500,000 row fetches = 500,003 I/Os
-- vs Table scan: 10,000 page reads = 10,000 I/Os
-- Optimizer chooses table scan (cheaper)

Composite Index Optimization

Column Order Matters:

-- Query pattern: Filter by status, then region, then created_at
MATCH (u:User)
WHERE u.status = 'active'
  AND u.region = 'West'
  AND u.created_at > '2025-01-01'
RETURN u;

-- Optimal index order:
// 1. Most selective first
CREATE INDEX user_composite_idx ON User(status, region, created_at);

// Or prioritize equality over range
CREATE INDEX user_composite_idx ON User(region, status, created_at);
// Equality (region, status) then range (created_at)

Best Practices

Index Design Guidelines

DO:

Index primary keys and foreign keys
Index columns frequently in WHERE clauses
Index columns used in JOIN conditions
Index columns in ORDER BY clauses
Use composite indexes for multi-column filters
Include covering columns for hot queries
Monitor index usage and remove unused indexes

DON’T:

Over-index (each index slows writes)
Index low-cardinality columns alone
Create redundant indexes
Index columns rarely queried
Forget to update statistics
Ignore index fragmentation

Write Performance Considerations

Index Overhead:

-- Each write operation updates all indexes

-- Node with 5 indexes:
CREATE (u:User {
  user_id: '12345',
  email: 'user@example.com',
  name: 'Alice',
  city: 'Seattle',
  created_at: timestamp()
});

// Updates:
// 1. Primary table storage
// 2. user_id_idx
// 3. email_idx
// 4. name_idx
// 5. city_idx
// 6. created_at_idx
// Total: 6 write operations

Batch Operations:

-- Disable indexes during bulk load
ALTER INDEX user_email_idx DISABLE;

-- Load data
LOAD CSV FROM 'users.csv' AS row
CREATE (u:User {user_id: row.id, email: row.email, ...});

-- Rebuild indexes
ALTER INDEX user_email_idx REBUILD;

Index Monitoring Workflow

Weekly Review:

-- 1. Check unused indexes
SELECT index_name, index_size_mb
FROM system.index_stats
WHERE index_scans = 0 AND created_at < NOW() - INTERVAL '30 days';

-- 2. Identify missing indexes
SELECT query_text, execution_count, avg_duration_ms
FROM system.query_stats
WHERE table_scans > 0 AND execution_count > 100
ORDER BY execution_count * avg_duration_ms DESC
LIMIT 20;

-- 3. Review index health
SELECT index_name, fragmentation_percent
FROM system.index_health
WHERE fragmentation_percent > 30;

-- 4. Update statistics
ANALYZE;

Advanced Techniques

Partial Indexes

Index subset of rows:

-- Index only active users
CREATE INDEX active_users_idx ON User(email)
WHERE status = 'active';

-- Much smaller than full index
-- Faster for queries on active users only
MATCH (u:User)
WHERE u.status = 'active' AND u.email = 'user@example.com'
RETURN u;

Expression Indexes

Index computed values:

-- Index lowercase email for case-insensitive search
CREATE INDEX user_email_lower_idx ON User(LOWER(email));

-- Enables fast case-insensitive lookup
MATCH (u:User)
WHERE LOWER(u.email) = LOWER('User@Example.com')
RETURN u;

Index-Organized Tables

Store table data in index order:

-- Create table ordered by timestamp
CREATE NODE TYPE Event (
  event_id UUID PRIMARY KEY,
  timestamp TIMESTAMP,
  event_type STRING,
  data JSON
) CLUSTERED BY (timestamp);

-- Range queries on timestamp are extremely fast
MATCH (e:Event)
WHERE e.timestamp > '2025-01-01' AND e.timestamp < '2025-01-31'
RETURN e
ORDER BY e.timestamp;
-- No sort needed, data already in order

Troubleshooting

Index Not Being Used

Diagnosis:

-- Check execution plan
EXPLAIN
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u;

-- If showing table scan instead of index seek:

Common Causes:

Function on indexed column:

-- ✗ Index not used
MATCH (u:User)
WHERE LOWER(u.email) = 'user@example.com'
RETURN u;

-- ✓ Index used
MATCH (u:User)
WHERE u.email = 'user@example.com'
RETURN u;

Type mismatch:

-- ✗ Index not used (string vs integer)
MATCH (u:User)
WHERE u.user_id = 12345  -- user_id is string
RETURN u;

-- ✓ Index used
MATCH (u:User)
WHERE u.user_id = '12345'
RETURN u;

Stale statistics:

-- Update optimizer statistics
ANALYZE User;

Slow Index Scans

Issue: Index exists but queries still slow.

Solutions:

-- 1. Check index selectivity
SELECT
  index_name,
  rows_read / NULLIF(index_scans, 0) AS avg_rows_per_scan
FROM system.index_stats
WHERE index_name = 'problematic_idx';
-- High avg_rows_per_scan indicates low selectivity

-- 2. Consider composite index for better selectivity
CREATE INDEX better_idx ON User(region, status, created_at);

-- 3. Use covering index to avoid row fetches
CREATE INDEX covering_idx ON User(region)
INCLUDE (name, email, created_at);

Integration with Geode

Indexes integrate seamlessly with Geode features:

Query Optimizer: Automatic index selection for optimal plans
EXPLAIN Command: Shows index usage in execution plans
Transactions: Indexes updated atomically within transactions
Concurrency: Online index creation without blocking writes
Monitoring: Built-in index usage and health metrics
Security: RLS policies work efficiently with indexes

Query Optimization: Using indexes for better performance
Query Performance: Index impact on throughput and latency
EXPLAIN Command: Analyzing index usage in query plans
Schema Design: Designing schemas with indexing in mind
Best Practices: Index design and management best practices
Monitoring: Tracking index usage and performance

Browse the tagged content below to discover comprehensive guides, tutorials, and best practices for indexes in Geode. Learn how to design, create, and manage indexes that dramatically improve query performance and scale your graph database applications to production workloads.

Popular

Overview

Index Types

Property Indexes

Unique Indexes

Composite Indexes

Covering Indexes

Full-Text Indexes

Spatial Indexes

Index Design

Design Principles

Index Selection Strategy

Index Management

Creating Indexes

Monitoring Indexes

Index Maintenance

Dropping Indexes

Performance Optimization

Index Selection by Query Optimizer

Index vs. Table Scan Decision

Composite Index Optimization

Best Practices

Index Design Guidelines

Write Performance Considerations

Index Monitoring Workflow

Advanced Techniques

Partial Indexes

Expression Indexes

Index-Organized Tables

Troubleshooting

Index Not Being Used

Slow Index Scans

Integration with Geode

Related Articles

Constraints and Indexes Guide

Query Performance Tuning Guide

Index Types Reference

Overview Share link

Index Types Share link

Property Indexes Share link

Unique Indexes Share link

Composite Indexes Share link

Covering Indexes Share link

Full-Text Indexes Share link

Spatial Indexes Share link

Index Design Share link

Design Principles Share link

Index Selection Strategy Share link

Index Management Share link

Creating Indexes Share link

Monitoring Indexes Share link

Index Maintenance Share link

Dropping Indexes Share link

Performance Optimization Share link

Index Selection by Query Optimizer Share link

Index vs. Table Scan Decision Share link

Composite Index Optimization Share link

Best Practices Share link

Index Design Guidelines Share link

Write Performance Considerations Share link

Index Monitoring Workflow Share link

Advanced Techniques Share link

Partial Indexes Share link

Expression Indexes Share link

Index-Organized Tables Share link

Troubleshooting Share link

Index Not Being Used Share link

Slow Index Scans Share link

Integration with Geode Share link

Related Topics Share link

Related Articles

Constraints and Indexes Guide

Query Performance Tuning Guide

Index Types Reference

Overview

Index Types

Property Indexes

Unique Indexes

Composite Indexes

Covering Indexes

Full-Text Indexes

Spatial Indexes

Index Design

Design Principles

Index Selection Strategy

Index Management

Creating Indexes

Monitoring Indexes

Index Maintenance

Dropping Indexes

Performance Optimization

Index Selection by Query Optimizer

Index vs. Table Scan Decision

Composite Index Optimization

Best Practices

Index Design Guidelines

Write Performance Considerations

Index Monitoring Workflow

Advanced Techniques

Partial Indexes

Expression Indexes

Index-Organized Tables

Troubleshooting

Index Not Being Used

Slow Index Scans

Integration with Geode

Related Topics