The Graph Modeling category covers techniques, patterns, and best practices for designing effective graph database schemas. Learn how to translate your domain into nodes and relationships, optimize for query patterns, and build scalable, maintainable graph data models.
Introduction to Graph Modeling
Graph modeling is the practice of designing how your application’s data will be represented as nodes, relationships, and properties in a graph database. Unlike relational database design with its normalization rules and table schemas, graph modeling focuses on identifying entities and their connections, emphasizing how data relates rather than how it’s stored.
Effective graph modeling makes queries natural and performant, aligns the database structure with your domain model, and enables the database to grow gracefully as requirements evolve. The flexibility of graph databases allows iteration and refinement, but good initial modeling saves significant effort later.
Fundamental Modeling Principles
Nodes Represent Entities
Nodes are the primary entities in your domain:
-- People in a social network
CREATE (alice:Person {
id: 'person_001',
name: 'Alice Anderson',
email: 'alice@example.com',
birthdate: date('1990-05-15'),
city: 'New York'
});
-- Products in an e-commerce system
CREATE (laptop:Product {
id: 'prod_123',
name: '15" Pro Laptop',
sku: 'LAPTOP-PRO-15',
price: 1299.99,
category: 'Electronics'
});
-- Locations in a logistics system
CREATE (warehouse:Location {
id: 'loc_456',
name: 'Central Warehouse',
address: '123 Industrial Blvd',
lat: 40.7128,
lon: -74.0060
});
Guideline: Create a node for anything you might want to query independently or that has properties distinct from its relationships.
Relationships Express Connections
Relationships show how entities connect:
-- Social connection
MATCH (alice:Person {id: 'person_001'}), (bob:Person {id: 'person_002'})
CREATE (alice)-[:FRIENDS_WITH {since: date('2020-01-15'), strength: 0.85}]->(bob);
-- Purchase transaction
MATCH (customer:Person {id: 'person_001'}), (product:Product {id: 'prod_123'})
CREATE (customer)-[:PURCHASED {
date: datetime('2024-03-15T10:30:00'),
price: 1299.99,
quantity: 1,
order_id: 'order_789'
}]->(product);
-- Geographic relationship
MATCH (product:Product {id: 'prod_123'}), (warehouse:Location {id: 'loc_456'})
CREATE (product)-[:STORED_AT {quantity: 50, last_restocked: date('2024-03-01')}]->(warehouse);
Guideline: Use relationships to express verbs and meaningful connections. Name relationships as actions (PURCHASED, KNOWS, MANAGES) or states (BELONGS_TO, LOCATED_IN).
Properties Store Attributes
Properties describe nodes and relationships:
-- Node properties
CREATE (alice:Person {
name: 'Alice', -- Identity
email: 'alice@ex.com', -- Contact
age: 30, -- Demographic
role: 'Engineer', -- Classification
salary: 95000, -- Quantitative
active: true, -- Boolean state
skills: ['Python', 'Go', 'SQL'], -- Lists
preferences: {theme: 'dark', language: 'en'} -- Maps
});
-- Relationship properties
CREATE (alice)-[:WORKED_ON {
start_date: date('2023-01-01'),
end_date: date('2023-12-31'),
role: 'Lead Developer',
hours: 1600,
satisfaction: 4.5
}]->(project);
Guideline: Store properties directly on the entity they describe. Avoid deeply nested structures; flatten when possible.
Modeling Patterns
Pattern 1: One-to-Many Relationships
Example: Blog posts and comments
-- Create author and posts
CREATE (author:User {name: 'Alice', email: 'alice@blog.com'});
CREATE (post1:Post {
id: 'post_001',
title: 'Graph Databases Explained',
content: '...',
published: datetime('2024-03-01T09:00:00'),
views: 1250
});
CREATE (post2:Post {
id: 'post_002',
title: 'Modeling Best Practices',
content: '...',
published: datetime('2024-03-15T14:30:00'),
views: 890
});
-- Connect author to posts
MATCH (a:User {name: 'Alice'}), (p1:Post {id: 'post_001'})
CREATE (a)-[:AUTHORED]->(p1);
MATCH (a:User {name: 'Alice'}), (p2:Post {id: 'post_002'})
CREATE (a)-[:AUTHORED]->(p2);
-- Query all posts by author
MATCH (author:User {name: 'Alice'})-[:AUTHORED]->(post:Post)
RETURN post.title, post.published, post.views
ORDER BY post.published DESC;
Pattern 2: Many-to-Many Relationships
Example: Students and courses
-- Create students
CREATE (alice:Student {id: 'student_001', name: 'Alice'});
CREATE (bob:Student {id: 'student_002', name: 'Bob'});
-- Create courses
CREATE (db:Course {id: 'course_101', name: 'Databases', credits: 3});
CREATE (algo:Course {id: 'course_102', name: 'Algorithms', credits: 4});
-- Enrollment relationships with properties
MATCH (alice:Student {id: 'student_001'}), (db:Course {id: 'course_101'})
CREATE (alice)-[:ENROLLED_IN {
semester: 'Fall 2024',
grade: 'A',
enrollment_date: date('2024-08-20')
}]->(db);
MATCH (alice:Student {id: 'student_001'}), (algo:Course {id: 'course_102'})
CREATE (alice)-[:ENROLLED_IN {
semester: 'Fall 2024',
grade: 'B+',
enrollment_date: date('2024-08-20')
}]->(algo);
MATCH (bob:Student {id: 'student_002'}), (algo:Course {id: 'course_102'})
CREATE (bob)-[:ENROLLED_IN {
semester: 'Fall 2024',
grade: 'A-',
enrollment_date: date('2024-08-21')
}]->(algo);
-- Query students in a course
MATCH (student:Student)-[enrollment:ENROLLED_IN]->(course:Course {id: 'course_102'})
RETURN student.name, enrollment.grade
ORDER BY enrollment.grade DESC;
Pattern 3: Hierarchical Structures
Example: Organizational chart
-- Create employees
CREATE (ceo:Employee {name: 'Alice', title: 'CEO'});
CREATE (cto:Employee {name: 'Bob', title: 'CTO'});
CREATE (vp_eng:Employee {name: 'Carol', title: 'VP Engineering'});
CREATE (eng1:Employee {name: 'Dave', title: 'Senior Engineer'});
CREATE (eng2:Employee {name: 'Eve', title: 'Engineer'});
-- Create reporting structure
MATCH (cto:Employee {name: 'Bob'}), (ceo:Employee {name: 'Alice'})
CREATE (cto)-[:REPORTS_TO]->(ceo);
MATCH (vp:Employee {name: 'Carol'}), (cto:Employee {name: 'Bob'})
CREATE (vp)-[:REPORTS_TO]->(cto);
MATCH (eng1:Employee {name: 'Dave'}), (vp:Employee {name: 'Carol'})
CREATE (eng1)-[:REPORTS_TO]->(vp);
MATCH (eng2:Employee {name: 'Eve'}), (eng1:Employee {name: 'Dave'})
CREATE (eng2)-[:REPORTS_TO]->(eng1);
-- Query entire management chain
MATCH (employee:Employee {name: 'Eve'})-[:REPORTS_TO*]->(manager:Employee)
RETURN manager.name, manager.title
ORDER BY length([(employee)-[:REPORTS_TO*]->(manager) | 1]);
-- Find all direct and indirect reports
MATCH (manager:Employee {name: 'Bob'})<-[:REPORTS_TO*]-(report:Employee)
RETURN count(report) as total_reports;
Pattern 4: Time-Based Modeling
Example: Historical relationships
-- Model job history with time-bound relationships
CREATE (alice:Person {name: 'Alice'});
CREATE (acme:Company {name: 'Acme Corp'});
CREATE (techco:Company {name: 'TechCo Inc'});
-- Past employment
CREATE (alice)-[:WORKED_AT {
start_date: date('2018-01-15'),
end_date: date('2021-06-30'),
title: 'Software Engineer',
current: false
}]->(acme);
-- Current employment
CREATE (alice)-[:WORKED_AT {
start_date: date('2021-07-01'),
end_date: null,
title: 'Senior Engineer',
current: true
}]->(techco);
-- Query current employer
MATCH (person:Person {name: 'Alice'})-[job:WORKED_AT]->(company:Company)
WHERE job.current = true
RETURN company.name, job.title;
-- Query complete job history
MATCH (person:Person {name: 'Alice'})-[job:WORKED_AT]->(company:Company)
RETURN company.name, job.title, job.start_date, job.end_date
ORDER BY job.start_date DESC;
Pattern 5: Intermediate Nodes for Rich Relationships
Example: Complex order system
-- Create entities
CREATE (customer:Customer {id: 'cust_001', name: 'Alice'});
CREATE (product1:Product {id: 'prod_101', name: 'Laptop', price: 1299.99});
CREATE (product2:Product {id: 'prod_102', name: 'Mouse', price: 29.99});
-- Create order as intermediate node
CREATE (order:Order {
id: 'order_001',
date: datetime('2024-03-15T10:30:00'),
status: 'shipped',
total: 1329.98
});
-- Connect customer to order
MATCH (c:Customer {id: 'cust_001'}), (o:Order {id: 'order_001'})
CREATE (c)-[:PLACED]->(o);
-- Connect order to products with line item details
MATCH (o:Order {id: 'order_001'}), (p1:Product {id: 'prod_101'})
CREATE (o)-[:CONTAINS {quantity: 1, unit_price: 1299.99, subtotal: 1299.99}]->(p1);
MATCH (o:Order {id: 'order_001'}), (p2:Product {id: 'prod_102'})
CREATE (o)-[:CONTAINS {quantity: 1, unit_price: 29.99, subtotal: 29.99}]->(p2);
-- Query customer's order details
MATCH (customer:Customer {id: 'cust_001'})-[:PLACED]->(order:Order)-[item:CONTAINS]->(product:Product)
RETURN order.id, order.date, product.name, item.quantity, item.subtotal;
Pattern 6: Metadata Nodes
Example: Tagging system
-- Create content
CREATE (post1:Post {id: 'post_001', title: 'Graph Modeling', content: '...'});
CREATE (post2:Post {id: 'post_002', title: 'GQL Syntax', content: '...'});
-- Create tags as nodes (not properties)
CREATE (db_tag:Tag {name: 'databases'});
CREATE (graph_tag:Tag {name: 'graphs'});
CREATE (gql_tag:Tag {name: 'gql'});
-- Tag posts
MATCH (post1:Post {id: 'post_001'}), (graph:Tag {name: 'graphs'})
CREATE (post1)-[:TAGGED_WITH]->(graph);
MATCH (post1:Post {id: 'post_001'}), (db:Tag {name: 'databases'})
CREATE (post1)-[:TAGGED_WITH]->(db);
MATCH (post2:Post {id: 'post_002'}), (gql:Tag {name: 'gql'})
CREATE (post2)-[:TAGGED_WITH]->(gql);
-- Find all posts with a tag
MATCH (post:Post)-[:TAGGED_WITH]->(tag:Tag {name: 'graphs'})
RETURN post.title;
-- Find related posts (shared tags)
MATCH (post:Post {id: 'post_001'})-[:TAGGED_WITH]->(tag:Tag)<-[:TAGGED_WITH]-(related:Post)
WHERE post.id <> related.id
RETURN related.title, count(tag) as shared_tags
ORDER BY shared_tags DESC;
Modeling Anti-Patterns to Avoid
Anti-Pattern 1: Properties as Nodes
Bad:
-- Don't create nodes for simple properties
CREATE (alice:Person {id: 'person_001', name: 'Alice'});
CREATE (email:Email {value: 'alice@example.com'});
CREATE (age:Age {value: 30});
CREATE (alice)-[:HAS_EMAIL]->(email);
CREATE (alice)-[:HAS_AGE]->(age);
Good:
-- Store properties directly on the node
CREATE (alice:Person {
id: 'person_001',
name: 'Alice',
email: 'alice@example.com',
age: 30
});
Exception: Create nodes for properties when:
- Multiple entities share the same property value (e.g., addresses, phone numbers)
- You need to query or aggregate by that property independently
- The property has its own properties or relationships
-- Good: Shared address as node
CREATE (home:Address {
street: '123 Main St',
city: 'New York',
zip: '10001',
country: 'USA'
});
CREATE (alice:Person {name: 'Alice'})-[:LIVES_AT]->(home);
CREATE (bob:Person {name: 'Bob'})-[:LIVES_AT]->(home);
-- Now can query: Who lives at this address?
MATCH (person:Person)-[:LIVES_AT]->(addr:Address {street: '123 Main St'})
RETURN person.name;
Anti-Pattern 2: Dense Nodes
Bad:
-- Avoid single nodes with millions of relationships
CREATE (popular:User {name: 'Celebrity', followers: 10000000});
-- 10 million relationships
// ... millions of (follower)-[:FOLLOWS]->(popular) ...
-- Queries become slow
MATCH (user:User {name: 'Celebrity'})<-[:FOLLOWS]-(follower)
RETURN count(follower); -- Very slow!
Good:
-- Use property for high-cardinality counts
CREATE (popular:User {name: 'Celebrity', follower_count: 10000000});
-- Or use intermediate aggregation nodes
CREATE (popular:User {name: 'Celebrity'});
CREATE (followers_2024_03:FollowerGroup {
month: '2024-03',
count: 50000
});
CREATE (popular)-[:HAS_FOLLOWER_GROUP]->(followers_2024_03);
-- Individual followers connect to groups
CREATE (follower)-[:MEMBER_OF]->(followers_2024_03);
Anti-Pattern 3: Relationship Types as Properties
Bad:
-- Don't use generic relationship type with type property
CREATE (alice)-[:RELATED_TO {type: 'friend'}]->(bob);
CREATE (alice)-[:RELATED_TO {type: 'coworker'}]->(carol);
CREATE (alice)-[:RELATED_TO {type: 'family'}]->(dave);
-- Can't efficiently query by relationship type
MATCH (alice:Person {name: 'Alice'})-[r:RELATED_TO]->(other)
WHERE r.type = 'friend' -- Must scan all relationships
RETURN other.name;
Good:
-- Use specific relationship types
CREATE (alice)-[:FRIEND]->(bob);
CREATE (alice)-[:COWORKER]->(carol);
CREATE (alice)-[:FAMILY]->(dave);
-- Efficient query by relationship type
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name;
Anti-Pattern 4: Disconnected Nodes
Bad:
-- Avoid nodes with no relationships (unless temporary)
CREATE (orphan:Product {id: 'prod_999', name: 'Orphan Product'});
-- No relationships to customers, orders, categories, etc.
-- Hard to discover in queries
MATCH (p:Product) WHERE NOT (p)--() RETURN p; -- Anti-pattern query
Good:
-- Always connect nodes meaningfully
CREATE (product:Product {id: 'prod_999', name: 'New Product'});
CREATE (category:Category {name: 'Electronics'});
CREATE (vendor:Vendor {name: 'Acme Corp'});
CREATE (product)-[:IN_CATEGORY]->(category);
CREATE (vendor)-[:SUPPLIES]->(product);
Query-Driven Modeling
Model for Your Queries
Design your graph to make common queries efficient:
Example: Social network “news feed” query
Requirement: Show posts from friends, ordered by time, with like counts.
Model:
-- Users
CREATE (alice:User {id: 'user_001', name: 'Alice'});
CREATE (bob:User {id: 'user_002', name: 'Bob'});
-- Friendship
CREATE (alice)-[:FRIENDS_WITH]->(bob);
-- Posts
CREATE (post:Post {
id: 'post_001',
content: 'Loving graph databases!',
timestamp: datetime('2024-03-15T10:30:00'),
like_count: 0 -- Denormalize for performance
});
CREATE (bob)-[:POSTED]->(post);
-- Likes (still track individually for other queries)
CREATE (alice)-[:LIKED {timestamp: datetime('2024-03-15T10:35:00')}]->(post);
-- Increment like count
MATCH (p:Post {id: 'post_001'})
SET p.like_count = p.like_count + 1;
-- Efficient news feed query
MATCH (me:User {id: 'user_001'})-[:FRIENDS_WITH]->(friend)-[:POSTED]->(post:Post)
RETURN friend.name, post.content, post.timestamp, post.like_count
ORDER BY post.timestamp DESC
LIMIT 20;
Optimize for Write vs. Read
Write-optimized (normalized):
-- Separate like tracking
CREATE (user)-[:LIKED]->(post);
-- Query count dynamically
MATCH (post:Post {id: 'post_001'})<-[:LIKED]-(user)
RETURN count(user) as like_count;
Read-optimized (denormalized):
-- Store count as property
CREATE (post:Post {id: 'post_001', like_count: 0});
-- Update count on each like
MATCH (post:Post {id: 'post_001'})
SET post.like_count = post.like_count + 1;
-- Fast read
MATCH (post:Post {id: 'post_001'})
RETURN post.like_count;
Hybrid approach:
-- Both individual relationships AND denormalized count
CREATE (user)-[:LIKED]->(post);
MATCH (post:Post {id: 'post_001'})
SET post.like_count = post.like_count + 1;
-- Fast reads from property, audit trail from relationships
Schema Evolution
Adding New Node Types
-- Phase 1: Basic social network
CREATE (alice:User {name: 'Alice'});
CREATE (bob:User {name: 'Bob'});
CREATE (alice)-[:FRIENDS_WITH]->(bob);
-- Phase 2: Add posts (no migration needed!)
CREATE (post:Post {content: 'Hello', author_id: 'alice'});
CREATE (alice)-[:POSTED]->(post);
-- Phase 3: Add groups
CREATE (group:Group {name: 'Graph DB Enthusiasts'});
CREATE (alice)-[:MEMBER_OF]->(group);
CREATE (bob)-[:MEMBER_OF]->(group);
-- Existing data unaffected
Adding Properties
-- Add properties to existing nodes
MATCH (alice:User {name: 'Alice'})
SET alice.email = 'alice@example.com',
alice.joined_date = date('2024-01-15');
-- Add properties to new nodes only
CREATE (carol:User {
name: 'Carol',
email: 'carol@example.com',
joined_date: date('2024-03-20'),
verification_status: 'verified' -- New property
});
-- Query handles optional properties gracefully
MATCH (user:User)
RETURN user.name, user.email, user.verification_status;
-- null for users without verification_status
Refactoring Relationships
-- Old model: Generic "RELATED_TO" with type property
MATCH (a:Person)-[r:RELATED_TO]->(b:Person)
WHERE r.type = 'friend'
-- Refactor to specific types
MATCH (a:Person)-[old:RELATED_TO]->(b:Person)
WHERE old.type = 'friend'
CREATE (a)-[:FRIEND {since: old.since}]->(b)
DELETE old;
-- Now use specific relationship types
MATCH (a:Person)-[:FRIEND]->(b:Person)
RETURN b.name;
Performance Considerations
Indexing Strategy
-- Create indexes on frequently queried properties
CREATE INDEX user_email ON :User(email);
CREATE INDEX product_sku ON :Product(sku);
CREATE INDEX post_timestamp ON :Post(timestamp);
-- Unique constraints (also create indexes)
CREATE CONSTRAINT unique_user_id ON :User(id);
CREATE CONSTRAINT unique_email ON :User(email);
-- Composite indexes for multi-property queries
CREATE INDEX user_city_age ON :User(city, age);
-- Verify index usage with EXPLAIN
EXPLAIN MATCH (u:User {email: 'alice@example.com'}) RETURN u;
Relationship Direction
-- Model relationships in the direction of primary traversal
-- Bad: Querying against relationship direction
CREATE (product)-[:PURCHASED_BY]->(customer);
MATCH (customer:Customer {id: 'cust_001'})<-[:PURCHASED_BY]-(product) -- Inefficient
RETURN product;
-- Good: Query with relationship direction
CREATE (customer)-[:PURCHASED]->(product);
MATCH (customer:Customer {id: 'cust_001'})-[:PURCHASED]->(product)
RETURN product;
Limiting Relationship Counts
-- Avoid nodes with unbounded relationship growth
-- Use time-based partitioning or aggregation
-- Time-partitioned events
CREATE (user)-[:LOGGED_IN_2024_03]->(session);
CREATE (user)-[:LOGGED_IN_2024_04]->(session);
-- Periodic aggregation
CREATE (user)-[:MONTHLY_ACTIVITY {
month: '2024-03',
login_count: 42,
avg_session_duration: 1800
}]->(stats:ActivityStats);
Related Topics
- Query Language: Optimizing GQL queries for your data model
- Performance: Indexing strategies and query optimization
- Best Practices: Production-ready modeling techniques
- Examples: Real-world modeling scenarios
- Architecture: Understanding Geode’s storage and indexing
Further Reading
- Property Graph Model: Theoretical foundations
- Domain-Driven Design: Aligning models with business domains
- Normalization vs Denormalization: Trade-offs in graph modeling
- Graph Algorithms: Modeling for algorithmic analysis
- Migration Strategies: Evolving schemas in production
Effective graph modeling transforms complex domains into intuitive, performant graph structures. By understanding core patterns, avoiding anti-patterns, and optimizing for your query patterns, you can build graph models that scale with your application and delight your users with fast, expressive queries.