The Graph Modeling category covers techniques, patterns, and best practices for designing effective graph database schemas. Learn how to translate your domain into nodes and relationships, optimize for query patterns, and build scalable, maintainable graph data models.

Introduction to Graph Modeling

Graph modeling is the practice of designing how your application’s data will be represented as nodes, relationships, and properties in a graph database. Unlike relational database design with its normalization rules and table schemas, graph modeling focuses on identifying entities and their connections, emphasizing how data relates rather than how it’s stored.

Effective graph modeling makes queries natural and performant, aligns the database structure with your domain model, and enables the database to grow gracefully as requirements evolve. The flexibility of graph databases allows iteration and refinement, but good initial modeling saves significant effort later.

Fundamental Modeling Principles

Nodes Represent Entities

Nodes are the primary entities in your domain:

-- People in a social network
CREATE (alice:Person {
    id: 'person_001',
    name: 'Alice Anderson',
    email: 'alice@example.com',
    birthdate: date('1990-05-15'),
    city: 'New York'
});

-- Products in an e-commerce system
CREATE (laptop:Product {
    id: 'prod_123',
    name: '15" Pro Laptop',
    sku: 'LAPTOP-PRO-15',
    price: 1299.99,
    category: 'Electronics'
});

-- Locations in a logistics system
CREATE (warehouse:Location {
    id: 'loc_456',
    name: 'Central Warehouse',
    address: '123 Industrial Blvd',
    lat: 40.7128,
    lon: -74.0060
});

Guideline: Create a node for anything you might want to query independently or that has properties distinct from its relationships.

Relationships Express Connections

Relationships show how entities connect:

-- Social connection
MATCH (alice:Person {id: 'person_001'}), (bob:Person {id: 'person_002'})
CREATE (alice)-[:FRIENDS_WITH {since: date('2020-01-15'), strength: 0.85}]->(bob);

-- Purchase transaction
MATCH (customer:Person {id: 'person_001'}), (product:Product {id: 'prod_123'})
CREATE (customer)-[:PURCHASED {
    date: datetime('2024-03-15T10:30:00'),
    price: 1299.99,
    quantity: 1,
    order_id: 'order_789'
}]->(product);

-- Geographic relationship
MATCH (product:Product {id: 'prod_123'}), (warehouse:Location {id: 'loc_456'})
CREATE (product)-[:STORED_AT {quantity: 50, last_restocked: date('2024-03-01')}]->(warehouse);

Guideline: Use relationships to express verbs and meaningful connections. Name relationships as actions (PURCHASED, KNOWS, MANAGES) or states (BELONGS_TO, LOCATED_IN).

Properties Store Attributes

Properties describe nodes and relationships:

-- Node properties
CREATE (alice:Person {
    name: 'Alice',           -- Identity
    email: 'alice@ex.com',   -- Contact
    age: 30,                 -- Demographic
    role: 'Engineer',        -- Classification
    salary: 95000,           -- Quantitative
    active: true,            -- Boolean state
    skills: ['Python', 'Go', 'SQL'],  -- Lists
    preferences: {theme: 'dark', language: 'en'}  -- Maps
});

-- Relationship properties
CREATE (alice)-[:WORKED_ON {
    start_date: date('2023-01-01'),
    end_date: date('2023-12-31'),
    role: 'Lead Developer',
    hours: 1600,
    satisfaction: 4.5
}]->(project);

Guideline: Store properties directly on the entity they describe. Avoid deeply nested structures; flatten when possible.

Modeling Patterns

Pattern 1: One-to-Many Relationships

Example: Blog posts and comments

-- Create author and posts
CREATE (author:User {name: 'Alice', email: 'alice@blog.com'});
CREATE (post1:Post {
    id: 'post_001',
    title: 'Graph Databases Explained',
    content: '...',
    published: datetime('2024-03-01T09:00:00'),
    views: 1250
});
CREATE (post2:Post {
    id: 'post_002',
    title: 'Modeling Best Practices',
    content: '...',
    published: datetime('2024-03-15T14:30:00'),
    views: 890
});

-- Connect author to posts
MATCH (a:User {name: 'Alice'}), (p1:Post {id: 'post_001'})
CREATE (a)-[:AUTHORED]->(p1);

MATCH (a:User {name: 'Alice'}), (p2:Post {id: 'post_002'})
CREATE (a)-[:AUTHORED]->(p2);

-- Query all posts by author
MATCH (author:User {name: 'Alice'})-[:AUTHORED]->(post:Post)
RETURN post.title, post.published, post.views
ORDER BY post.published DESC;

Pattern 2: Many-to-Many Relationships

Example: Students and courses

-- Create students
CREATE (alice:Student {id: 'student_001', name: 'Alice'});
CREATE (bob:Student {id: 'student_002', name: 'Bob'});

-- Create courses
CREATE (db:Course {id: 'course_101', name: 'Databases', credits: 3});
CREATE (algo:Course {id: 'course_102', name: 'Algorithms', credits: 4});

-- Enrollment relationships with properties
MATCH (alice:Student {id: 'student_001'}), (db:Course {id: 'course_101'})
CREATE (alice)-[:ENROLLED_IN {
    semester: 'Fall 2024',
    grade: 'A',
    enrollment_date: date('2024-08-20')
}]->(db);

MATCH (alice:Student {id: 'student_001'}), (algo:Course {id: 'course_102'})
CREATE (alice)-[:ENROLLED_IN {
    semester: 'Fall 2024',
    grade: 'B+',
    enrollment_date: date('2024-08-20')
}]->(algo);

MATCH (bob:Student {id: 'student_002'}), (algo:Course {id: 'course_102'})
CREATE (bob)-[:ENROLLED_IN {
    semester: 'Fall 2024',
    grade: 'A-',
    enrollment_date: date('2024-08-21')
}]->(algo);

-- Query students in a course
MATCH (student:Student)-[enrollment:ENROLLED_IN]->(course:Course {id: 'course_102'})
RETURN student.name, enrollment.grade
ORDER BY enrollment.grade DESC;

Pattern 3: Hierarchical Structures

Example: Organizational chart

-- Create employees
CREATE (ceo:Employee {name: 'Alice', title: 'CEO'});
CREATE (cto:Employee {name: 'Bob', title: 'CTO'});
CREATE (vp_eng:Employee {name: 'Carol', title: 'VP Engineering'});
CREATE (eng1:Employee {name: 'Dave', title: 'Senior Engineer'});
CREATE (eng2:Employee {name: 'Eve', title: 'Engineer'});

-- Create reporting structure
MATCH (cto:Employee {name: 'Bob'}), (ceo:Employee {name: 'Alice'})
CREATE (cto)-[:REPORTS_TO]->(ceo);

MATCH (vp:Employee {name: 'Carol'}), (cto:Employee {name: 'Bob'})
CREATE (vp)-[:REPORTS_TO]->(cto);

MATCH (eng1:Employee {name: 'Dave'}), (vp:Employee {name: 'Carol'})
CREATE (eng1)-[:REPORTS_TO]->(vp);

MATCH (eng2:Employee {name: 'Eve'}), (eng1:Employee {name: 'Dave'})
CREATE (eng2)-[:REPORTS_TO]->(eng1);

-- Query entire management chain
MATCH (employee:Employee {name: 'Eve'})-[:REPORTS_TO*]->(manager:Employee)
RETURN manager.name, manager.title
ORDER BY length([(employee)-[:REPORTS_TO*]->(manager) | 1]);

-- Find all direct and indirect reports
MATCH (manager:Employee {name: 'Bob'})<-[:REPORTS_TO*]-(report:Employee)
RETURN count(report) as total_reports;

Pattern 4: Time-Based Modeling

Example: Historical relationships

-- Model job history with time-bound relationships
CREATE (alice:Person {name: 'Alice'});
CREATE (acme:Company {name: 'Acme Corp'});
CREATE (techco:Company {name: 'TechCo Inc'});

-- Past employment
CREATE (alice)-[:WORKED_AT {
    start_date: date('2018-01-15'),
    end_date: date('2021-06-30'),
    title: 'Software Engineer',
    current: false
}]->(acme);

-- Current employment
CREATE (alice)-[:WORKED_AT {
    start_date: date('2021-07-01'),
    end_date: null,
    title: 'Senior Engineer',
    current: true
}]->(techco);

-- Query current employer
MATCH (person:Person {name: 'Alice'})-[job:WORKED_AT]->(company:Company)
WHERE job.current = true
RETURN company.name, job.title;

-- Query complete job history
MATCH (person:Person {name: 'Alice'})-[job:WORKED_AT]->(company:Company)
RETURN company.name, job.title, job.start_date, job.end_date
ORDER BY job.start_date DESC;

Pattern 5: Intermediate Nodes for Rich Relationships

Example: Complex order system

-- Create entities
CREATE (customer:Customer {id: 'cust_001', name: 'Alice'});
CREATE (product1:Product {id: 'prod_101', name: 'Laptop', price: 1299.99});
CREATE (product2:Product {id: 'prod_102', name: 'Mouse', price: 29.99});

-- Create order as intermediate node
CREATE (order:Order {
    id: 'order_001',
    date: datetime('2024-03-15T10:30:00'),
    status: 'shipped',
    total: 1329.98
});

-- Connect customer to order
MATCH (c:Customer {id: 'cust_001'}), (o:Order {id: 'order_001'})
CREATE (c)-[:PLACED]->(o);

-- Connect order to products with line item details
MATCH (o:Order {id: 'order_001'}), (p1:Product {id: 'prod_101'})
CREATE (o)-[:CONTAINS {quantity: 1, unit_price: 1299.99, subtotal: 1299.99}]->(p1);

MATCH (o:Order {id: 'order_001'}), (p2:Product {id: 'prod_102'})
CREATE (o)-[:CONTAINS {quantity: 1, unit_price: 29.99, subtotal: 29.99}]->(p2);

-- Query customer's order details
MATCH (customer:Customer {id: 'cust_001'})-[:PLACED]->(order:Order)-[item:CONTAINS]->(product:Product)
RETURN order.id, order.date, product.name, item.quantity, item.subtotal;

Pattern 6: Metadata Nodes

Example: Tagging system

-- Create content
CREATE (post1:Post {id: 'post_001', title: 'Graph Modeling', content: '...'});
CREATE (post2:Post {id: 'post_002', title: 'GQL Syntax', content: '...'});

-- Create tags as nodes (not properties)
CREATE (db_tag:Tag {name: 'databases'});
CREATE (graph_tag:Tag {name: 'graphs'});
CREATE (gql_tag:Tag {name: 'gql'});

-- Tag posts
MATCH (post1:Post {id: 'post_001'}), (graph:Tag {name: 'graphs'})
CREATE (post1)-[:TAGGED_WITH]->(graph);

MATCH (post1:Post {id: 'post_001'}), (db:Tag {name: 'databases'})
CREATE (post1)-[:TAGGED_WITH]->(db);

MATCH (post2:Post {id: 'post_002'}), (gql:Tag {name: 'gql'})
CREATE (post2)-[:TAGGED_WITH]->(gql);

-- Find all posts with a tag
MATCH (post:Post)-[:TAGGED_WITH]->(tag:Tag {name: 'graphs'})
RETURN post.title;

-- Find related posts (shared tags)
MATCH (post:Post {id: 'post_001'})-[:TAGGED_WITH]->(tag:Tag)<-[:TAGGED_WITH]-(related:Post)
WHERE post.id <> related.id
RETURN related.title, count(tag) as shared_tags
ORDER BY shared_tags DESC;

Modeling Anti-Patterns to Avoid

Anti-Pattern 1: Properties as Nodes

Bad:

-- Don't create nodes for simple properties
CREATE (alice:Person {id: 'person_001', name: 'Alice'});
CREATE (email:Email {value: 'alice@example.com'});
CREATE (age:Age {value: 30});
CREATE (alice)-[:HAS_EMAIL]->(email);
CREATE (alice)-[:HAS_AGE]->(age);

Good:

-- Store properties directly on the node
CREATE (alice:Person {
    id: 'person_001',
    name: 'Alice',
    email: 'alice@example.com',
    age: 30
});

Exception: Create nodes for properties when:

  • Multiple entities share the same property value (e.g., addresses, phone numbers)
  • You need to query or aggregate by that property independently
  • The property has its own properties or relationships
-- Good: Shared address as node
CREATE (home:Address {
    street: '123 Main St',
    city: 'New York',
    zip: '10001',
    country: 'USA'
});

CREATE (alice:Person {name: 'Alice'})-[:LIVES_AT]->(home);
CREATE (bob:Person {name: 'Bob'})-[:LIVES_AT]->(home);

-- Now can query: Who lives at this address?
MATCH (person:Person)-[:LIVES_AT]->(addr:Address {street: '123 Main St'})
RETURN person.name;

Anti-Pattern 2: Dense Nodes

Bad:

-- Avoid single nodes with millions of relationships
CREATE (popular:User {name: 'Celebrity', followers: 10000000});

-- 10 million relationships
// ... millions of (follower)-[:FOLLOWS]->(popular) ...

-- Queries become slow
MATCH (user:User {name: 'Celebrity'})<-[:FOLLOWS]-(follower)
RETURN count(follower);  -- Very slow!

Good:

-- Use property for high-cardinality counts
CREATE (popular:User {name: 'Celebrity', follower_count: 10000000});

-- Or use intermediate aggregation nodes
CREATE (popular:User {name: 'Celebrity'});
CREATE (followers_2024_03:FollowerGroup {
    month: '2024-03',
    count: 50000
});
CREATE (popular)-[:HAS_FOLLOWER_GROUP]->(followers_2024_03);

-- Individual followers connect to groups
CREATE (follower)-[:MEMBER_OF]->(followers_2024_03);

Anti-Pattern 3: Relationship Types as Properties

Bad:

-- Don't use generic relationship type with type property
CREATE (alice)-[:RELATED_TO {type: 'friend'}]->(bob);
CREATE (alice)-[:RELATED_TO {type: 'coworker'}]->(carol);
CREATE (alice)-[:RELATED_TO {type: 'family'}]->(dave);

-- Can't efficiently query by relationship type
MATCH (alice:Person {name: 'Alice'})-[r:RELATED_TO]->(other)
WHERE r.type = 'friend'  -- Must scan all relationships
RETURN other.name;

Good:

-- Use specific relationship types
CREATE (alice)-[:FRIEND]->(bob);
CREATE (alice)-[:COWORKER]->(carol);
CREATE (alice)-[:FAMILY]->(dave);

-- Efficient query by relationship type
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name;

Anti-Pattern 4: Disconnected Nodes

Bad:

-- Avoid nodes with no relationships (unless temporary)
CREATE (orphan:Product {id: 'prod_999', name: 'Orphan Product'});
-- No relationships to customers, orders, categories, etc.

-- Hard to discover in queries
MATCH (p:Product) WHERE NOT (p)--() RETURN p;  -- Anti-pattern query

Good:

-- Always connect nodes meaningfully
CREATE (product:Product {id: 'prod_999', name: 'New Product'});
CREATE (category:Category {name: 'Electronics'});
CREATE (vendor:Vendor {name: 'Acme Corp'});

CREATE (product)-[:IN_CATEGORY]->(category);
CREATE (vendor)-[:SUPPLIES]->(product);

Query-Driven Modeling

Model for Your Queries

Design your graph to make common queries efficient:

Example: Social network “news feed” query

Requirement: Show posts from friends, ordered by time, with like counts.

Model:

-- Users
CREATE (alice:User {id: 'user_001', name: 'Alice'});
CREATE (bob:User {id: 'user_002', name: 'Bob'});

-- Friendship
CREATE (alice)-[:FRIENDS_WITH]->(bob);

-- Posts
CREATE (post:Post {
    id: 'post_001',
    content: 'Loving graph databases!',
    timestamp: datetime('2024-03-15T10:30:00'),
    like_count: 0  -- Denormalize for performance
});

CREATE (bob)-[:POSTED]->(post);

-- Likes (still track individually for other queries)
CREATE (alice)-[:LIKED {timestamp: datetime('2024-03-15T10:35:00')}]->(post);

-- Increment like count
MATCH (p:Post {id: 'post_001'})
SET p.like_count = p.like_count + 1;

-- Efficient news feed query
MATCH (me:User {id: 'user_001'})-[:FRIENDS_WITH]->(friend)-[:POSTED]->(post:Post)
RETURN friend.name, post.content, post.timestamp, post.like_count
ORDER BY post.timestamp DESC
LIMIT 20;

Optimize for Write vs. Read

Write-optimized (normalized):

-- Separate like tracking
CREATE (user)-[:LIKED]->(post);

-- Query count dynamically
MATCH (post:Post {id: 'post_001'})<-[:LIKED]-(user)
RETURN count(user) as like_count;

Read-optimized (denormalized):

-- Store count as property
CREATE (post:Post {id: 'post_001', like_count: 0});

-- Update count on each like
MATCH (post:Post {id: 'post_001'})
SET post.like_count = post.like_count + 1;

-- Fast read
MATCH (post:Post {id: 'post_001'})
RETURN post.like_count;

Hybrid approach:

-- Both individual relationships AND denormalized count
CREATE (user)-[:LIKED]->(post);
MATCH (post:Post {id: 'post_001'})
SET post.like_count = post.like_count + 1;

-- Fast reads from property, audit trail from relationships

Schema Evolution

Adding New Node Types

-- Phase 1: Basic social network
CREATE (alice:User {name: 'Alice'});
CREATE (bob:User {name: 'Bob'});
CREATE (alice)-[:FRIENDS_WITH]->(bob);

-- Phase 2: Add posts (no migration needed!)
CREATE (post:Post {content: 'Hello', author_id: 'alice'});
CREATE (alice)-[:POSTED]->(post);

-- Phase 3: Add groups
CREATE (group:Group {name: 'Graph DB Enthusiasts'});
CREATE (alice)-[:MEMBER_OF]->(group);
CREATE (bob)-[:MEMBER_OF]->(group);

-- Existing data unaffected

Adding Properties

-- Add properties to existing nodes
MATCH (alice:User {name: 'Alice'})
SET alice.email = 'alice@example.com',
    alice.joined_date = date('2024-01-15');

-- Add properties to new nodes only
CREATE (carol:User {
    name: 'Carol',
    email: 'carol@example.com',
    joined_date: date('2024-03-20'),
    verification_status: 'verified'  -- New property
});

-- Query handles optional properties gracefully
MATCH (user:User)
RETURN user.name, user.email, user.verification_status;
-- null for users without verification_status

Refactoring Relationships

-- Old model: Generic "RELATED_TO" with type property
MATCH (a:Person)-[r:RELATED_TO]->(b:Person)
WHERE r.type = 'friend'

-- Refactor to specific types
MATCH (a:Person)-[old:RELATED_TO]->(b:Person)
WHERE old.type = 'friend'
CREATE (a)-[:FRIEND {since: old.since}]->(b)
DELETE old;

-- Now use specific relationship types
MATCH (a:Person)-[:FRIEND]->(b:Person)
RETURN b.name;

Performance Considerations

Indexing Strategy

-- Create indexes on frequently queried properties
CREATE INDEX user_email ON :User(email);
CREATE INDEX product_sku ON :Product(sku);
CREATE INDEX post_timestamp ON :Post(timestamp);

-- Unique constraints (also create indexes)
CREATE CONSTRAINT unique_user_id ON :User(id);
CREATE CONSTRAINT unique_email ON :User(email);

-- Composite indexes for multi-property queries
CREATE INDEX user_city_age ON :User(city, age);

-- Verify index usage with EXPLAIN
EXPLAIN MATCH (u:User {email: 'alice@example.com'}) RETURN u;

Relationship Direction

-- Model relationships in the direction of primary traversal
-- Bad: Querying against relationship direction
CREATE (product)-[:PURCHASED_BY]->(customer);
MATCH (customer:Customer {id: 'cust_001'})<-[:PURCHASED_BY]-(product)  -- Inefficient
RETURN product;

-- Good: Query with relationship direction
CREATE (customer)-[:PURCHASED]->(product);
MATCH (customer:Customer {id: 'cust_001'})-[:PURCHASED]->(product)
RETURN product;

Limiting Relationship Counts

-- Avoid nodes with unbounded relationship growth
-- Use time-based partitioning or aggregation

-- Time-partitioned events
CREATE (user)-[:LOGGED_IN_2024_03]->(session);
CREATE (user)-[:LOGGED_IN_2024_04]->(session);

-- Periodic aggregation
CREATE (user)-[:MONTHLY_ACTIVITY {
    month: '2024-03',
    login_count: 42,
    avg_session_duration: 1800
}]->(stats:ActivityStats);
  • Query Language: Optimizing GQL queries for your data model
  • Performance: Indexing strategies and query optimization
  • Best Practices: Production-ready modeling techniques
  • Examples: Real-world modeling scenarios
  • Architecture: Understanding Geode’s storage and indexing

Further Reading

  • Property Graph Model: Theoretical foundations
  • Domain-Driven Design: Aligning models with business domains
  • Normalization vs Denormalization: Trade-offs in graph modeling
  • Graph Algorithms: Modeling for algorithmic analysis
  • Migration Strategies: Evolving schemas in production

Effective graph modeling transforms complex domains into intuitive, performant graph structures. By understanding core patterns, avoiding anti-patterns, and optimizing for your query patterns, you can build graph models that scale with your application and delight your users with fast, expressive queries.


Related Articles