Graph Schema Design Guide | Geode Database

Overview

Designing an effective graph schema is critical for query performance, maintainability, and accurately representing your domain. This guide covers best practices for modeling data as property graphs in Geode.

Key Principles:

Model the domain, not the data - Think in relationships, not tables
Optimize for queries, not storage - Denormalization is often beneficial
Use labels for types, properties for attributes - Clear separation of concerns
Make relationships meaningful - Every edge should represent a real connection

Quick Links:

Property Graph Fundamentals

Core Concepts

A property graph consists of:

Nodes (vertices) - Entities in your domain
- Have one or more labels (types/categories)
- Contain properties (key-value pairs)
- Uniquely identified by internal ID
Relationships (edges) - Connections between nodes
- Have exactly one type (name)
- Are directed (have start and end nodes)
- Contain properties (key-value pairs)
- Uniquely identified by internal ID
Properties - Key-value attributes
- Keys are strings
- Values can be 50+ data types (strings, numbers, dates, vectors, etc.)
- Stored on nodes or relationships

Example Graph

-- Nodes with labels and properties
(alice:Person {name: "Alice", age: 30, email: "[email protected]"})
(bob:Person {name: "Bob", age: 25})
(acme:Company {name: "Acme Corp", founded: 2010})

-- Relationships with types and properties
(alice)-[:KNOWS {since: 2020, closeness: 0.8}]->(bob)
(alice)-[:WORKS_AT {role: "Engineer", since: 2022}]->(acme)

Node Label Design

Labeling Strategy

Labels represent entity types - Use them to categorize nodes.

✅ Good Labeling Patterns

1. Use Clear, Singular Nouns

-- GOOD
Person, Company, Product, Order, Review

-- BAD
People, Companies, person_table, usr

2. Use Multi-Label for Subtypes

-- GOOD: Person with specific roles
(alice:Person:Employee {name: "Alice"})
(bob:Person:Customer {name: "Bob"})
(charlie:Person:Employee:Manager {name: "Charlie"})

-- Query specific subtypes
MATCH (e:Employee) RETURN e          -- All employees
MATCH (m:Manager) RETURN m           -- All managers
MATCH (p:Person:!Employee) RETURN p  -- People who aren't employees

3. Use Meaningful Hierarchies

-- GOOD: Product categories
(laptop:Product:Electronics:Computer {brand: "Dell"})
(shirt:Product:Clothing:Apparel {size: "M"})

-- Query at different levels
MATCH (e:Electronics) RETURN e       -- All electronics
MATCH (c:Computer) RETURN c          -- Just computers

❌ Bad Labeling Antipatterns

1. Encoding Values in Labels (Dynamic Labels)

-- BAD: Don't create labels from data
(alice:Person:Age30:Female {name: "Alice"})

-- GOOD: Use properties
(alice:Person {name: "Alice", age: 30, gender: "female"})

2. Too Many Labels

-- BAD: Label explosion
(alice:Person:Employee:Engineer:Senior:FullStack:JavaScript:Remote)

-- GOOD: Use selective labels + properties
(alice:Person:Employee {
  name: "Alice",
  level: "Senior",
  role: "Engineer",
  stack: "FullStack",
  skills: ["JavaScript", "Python"],
  location: "Remote"
})

3. Generic Labels

-- BAD: Too generic, hard to query
(thing1:Entity {type: "person", name: "Alice"})
(thing2:Entity {type: "company", name: "Acme"})

-- GOOD: Specific labels
(alice:Person {name: "Alice"})
(acme:Company {name: "Acme"})

Label Naming Conventions

Convention	Example	Use Case
PascalCase	`Person`, `OrderItem`	Recommended - clear, standard
UPPER_SNAKE	`PURCHASE_ORDER`	Legacy systems integration
Namespacing	`app:User`, `sys:Config`	Multi-tenant or modular systems

Recommendation: Use PascalCase for consistency with GQL standards.

-- Recommended naming
CREATE (p:Person {name: "Alice"})
CREATE (o:PurchaseOrder {id: 12345})
CREATE (li:LineItem {quantity: 5})

Relationship Type Design

Relationship Naming

Relationship types describe the connection - Use verbs or verb phrases.

✅ Good Relationship Patterns

1. Use Active Verbs (Past or Present Tense)

-- GOOD: Clear, descriptive
(alice:Person)-[:KNOWS]->(bob:Person)
(alice:Person)-[:WORKS_AT]->(acme:Company)
(order:Order)-[:CONTAINS]->(item:Product)
(user:User)-[:PURCHASED]->(product:Product)
(employee:Employee)-[:REPORTS_TO]->(manager:Employee)

2. Direction Matters

-- GOOD: Relationship direction is meaningful
(parent:Person)-[:PARENT_OF]->(child:Person)

-- Query respects direction
MATCH (p:Person)-[:PARENT_OF]->(c:Person)
RETURN p.name AS parent, c.name AS child

-- Reverse direction
MATCH (c:Person)<-[:PARENT_OF]-(p:Person)
RETURN p.name AS parent, c.name AS child

3. Use Specific Types

-- GOOD: Specific relationship types
(alice:Person)-[:MANAGES]->(team:Team)
(alice:Person)-[:WORKS_ON]->(project:Project)
(alice:Person)-[:MENTORS]->(bob:Person)

-- BAD: Generic types
(alice:Person)-[:RELATED_TO {type: "manages"}]->(team:Team)

❌ Bad Relationship Antipatterns

1. Encoding Values in Type Names

-- BAD: Dynamic relationship types
(alice)-[:KNOWS_SINCE_2020]->(bob)
(alice)-[:FRIEND_CLOSENESS_0.8]->(charlie)

-- GOOD: Use properties
(alice)-[:KNOWS {since: 2020}]->(bob)
(alice)-[:FRIEND {closeness: 0.8}]->(charlie)

2. Bidirectional Relationships

-- BAD: Creating duplicate relationships
(alice)-[:KNOWS]->(bob)
(bob)-[:KNOWS]->(alice)

-- GOOD: Use single direction + queries
(alice)-[:KNOWS]->(bob)

-- Query both directions
MATCH (a:Person)-[:KNOWS]-(b:Person)  -- Ignores direction
WHERE a.name = "Alice"
RETURN b.name

3. Missing Relationship Types

-- BAD: Properties on nodes instead of relationships
(alice:Person {friendsWith: ["Bob", "Charlie"]})

-- GOOD: Explicit relationships
(alice:Person)-[:KNOWS]->(bob:Person)
(alice:Person)-[:KNOWS]->(charlie:Person)

Relationship Properties

Properties on relationships capture connection metadata.

-- Social network relationships
(alice)-[:KNOWS {
  since: date('2020-01-15'),
  strength: 0.8,
  context: "university",
  interactions: 147
}]->(bob)

-- Employment relationships
(alice)-[:WORKS_AT {
  role: "Senior Engineer",
  department: "Engineering",
  startDate: date('2022-03-01'),
  salary: 150000,
  employmentType: "full-time"
}]->(acme)

-- Transaction relationships
(order)-[:CONTAINS {
  quantity: 3,
  unitPrice: 29.99,
  discount: 0.1,
  addedAt: timestamp()
}]->(product)

Property Design Patterns

Property Selection

Properties store attributes of nodes and relationships.

✅ When to Use Properties

Attributes of the entity itself

(person:Person {
  name: "Alice",
  age: 30,
  email: "[email protected]",
  bio: "Software engineer..."
})

Frequently filtered/sorted values

-- Create indexes for these
(product:Product {
  price: 29.99,        -- Frequently filtered
  category: "Books",   -- Frequently filtered
  rating: 4.5,         -- Frequently sorted
  stock: 100
})

-- Efficient queries
CREATE INDEX product_price ON Product(price);
MATCH (p:Product) WHERE p.price < 50 RETURN p;

Temporal data

(event:Event {
  title: "Launch Party",
  startTime: timestamp('2026-03-15T19:00:00Z'),
  endTime: timestamp('2026-03-15T23:00:00Z'),
  timezone: 'America/New_York'
})

❌ When NOT to Use Properties

1. Relationships Disguised as Properties

-- BAD: Lists of related IDs
(person:Person {
  name: "Alice",
  friendIds: [10, 20, 30, 40]  -- Don't do this
})

-- GOOD: Explicit relationships
(alice:Person)-[:KNOWS]->(bob:Person)
(alice:Person)-[:KNOWS]->(charlie:Person)

2. Computed/Aggregated Data (Usually)

-- BAD: Redundant computed values
(person:Person {
  name: "Alice",
  friendCount: 15,      -- This can be queried
  avgFriendAge: 28.5    -- This can be computed
})

-- GOOD: Compute on demand
MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend)
RETURN p.name, count(friend) AS friendCount, avg(friend.age) AS avgFriendAge

-- EXCEPTION: Precompute for performance (materialized views/caching)
-- If counting friends is very slow, denormalize
(person:Person {name: "Alice", cachedFriendCount: 15})
-- But update on every friend add/remove

Property Types

Geode supports 50+ data types. Choose appropriate types:

Data Type	Use Case	Example
`String`, `Text`	Names, descriptions	`name: "Alice"`
`Int`, `BigInt`	IDs, counts	`id: 12345, age: 30`
`Real`, `Double`	Prices, ratings	`price: 29.99, rating: 4.5`
`Date`, `Timestamp`	Temporal data	`created: date('2026-01-23')`
`Boolean`	Flags	`active: true, verified: false`
`VectorF32`	ML embeddings	`embedding: vectorf32([0.1, 0.2, ...])`
`Json`, `Jsonb`	Semi-structured data	`metadata: jsonb('{"key": "value"}')`
`List`	Multiple values	`tags: ["tech", "database"]`
`GeoPoint`	Geographic locations	`location: geopoint(37.7749, -122.4194)`

Type Selection Tips:

Use specific types for better validation and performance
Avoid generic Text when Varchar(255) suffices
Use Jsonb for flexible schemas (but index key properties separately)
Use vectors for similarity search (embeddings, recommendations)

Domain Examples

Entities: People, Posts, Comments, Likes Relationships: KNOWS, POSTED, COMMENTED_ON, LIKED

-- Schema design
(alice:Person {
  id: 1,
  name: "Alice",
  email: "[email protected]",
  joinedAt: timestamp(),
  bio: "Software engineer"
})

(post:Post {
  id: 101,
  title: "Graph Databases",
  content: "Graph databases are...",
  createdAt: timestamp(),
  tags: ["tech", "databases"]
})

(comment:Comment {
  id: 501,
  content: "Great post!",
  createdAt: timestamp()
})

-- Relationships
(alice)-[:KNOWS {since: date('2020-01-01'), closeness: 0.8}]->(bob)
(alice)-[:POSTED {createdAt: timestamp()}]->(post)
(bob)-[:COMMENTED_ON {createdAt: timestamp()}]->(post)
(bob)-[:WROTE]->(comment)
(comment)-[:ON]->(post)
(alice)-[:LIKED {likedAt: timestamp()}]->(post)

Key Queries:

-- Find Alice's friends
MATCH (alice:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
RETURN friend.name

-- Find posts by Alice's friends
MATCH (alice:Person {name: "Alice"})-[:KNOWS]->(friend)-[:POSTED]->(post:Post)
RETURN post.title, friend.name
ORDER BY post.createdAt DESC

-- Find popular posts (most likes)
MATCH (p:Post)<-[:LIKED]-(u:User)
RETURN p.title, count(u) AS likes
ORDER BY likes DESC
LIMIT 10

Example 2: E-Commerce

Entities: Customer, Product, Order, LineItem, Category Relationships: PURCHASED, CONTAINS, IN_CATEGORY, REVIEWED

-- Schema design
(customer:Customer {
  id: 1,
  name: "Alice",
  email: "[email protected]",
  memberSince: date('2020-01-01')
})

(product:Product {
  id: 101,
  name: "Laptop",
  price: 999.99,
  stock: 50,
  sku: "LAP-001"
})

(category:Category {
  id: 10,
  name: "Electronics",
  slug: "electronics"
})

(order:Order {
  id: 5001,
  orderDate: timestamp(),
  status: "shipped",
  total: 1049.98
})

(lineItem:LineItem {
  quantity: 1,
  unitPrice: 999.99,
  subtotal: 999.99
})

-- Relationships
(customer)-[:PURCHASED {date: timestamp()}]->(order)
(order)-[:CONTAINS]->(lineItem)
(lineItem)-[:OF_PRODUCT]->(product)
(product)-[:IN_CATEGORY]->(category)
(customer)-[:REVIEWED {rating: 5, date: timestamp()}]->(product)

Key Queries:

-- Find customer's order history
MATCH (c:Customer {email: "[email protected]"})-[:PURCHASED]->(o:Order)
      -[:CONTAINS]->(li:LineItem)-[:OF_PRODUCT]->(p:Product)
RETURN o.orderDate, o.total, collect(p.name) AS products
ORDER BY o.orderDate DESC

-- Product recommendations (customers who bought this also bought...)
MATCH (p1:Product {id: 101})<-[:OF_PRODUCT]-(:LineItem)
      <-[:CONTAINS]-(o:Order)<-[:PURCHASED]-(c:Customer)
      -[:PURCHASED]->(o2:Order)-[:CONTAINS]->(:LineItem)
      -[:OF_PRODUCT]->(p2:Product)
WHERE p1 <> p2
RETURN p2.name, count(*) AS frequency
ORDER BY frequency DESC
LIMIT 5

Example 3: Knowledge Graph

Entities: Concept, Document, Author, Citation Relationships: RELATED_TO, AUTHORED, CITES, HAS_TOPIC

-- Schema design
(concept:Concept {
  id: 1,
  name: "Graph Databases",
  definition: "A database that uses graph structures...",
  embedding: vectorf32([0.1, 0.2, 0.3, ...])  -- For similarity search
})

(doc:Document {
  id: 101,
  title: "Introduction to Graph Theory",
  abstract: "This paper introduces...",
  publishedDate: date('2025-06-15'),
  embedding: vectorf32([...])  -- For semantic search
})

(author:Author {
  id: 201,
  name: "Dr. Alice Smith",
  institution: "MIT",
  hIndex: 45
})

-- Relationships
(concept1:Concept)-[:RELATED_TO {strength: 0.8}]->(concept2:Concept)
(author)-[:AUTHORED {role: "primary"}]->(doc)
(doc)-[:HAS_TOPIC]->(concept)
(doc1)-[:CITES {context: "methodology"}]->(doc2)

Key Queries:

-- Find related concepts (via vector similarity)
MATCH (c:Concept {name: "Graph Databases"})
MATCH (related:Concept)
WHERE vector_cosine(c.embedding, related.embedding) > 0.7
  AND c <> related
RETURN related.name, vector_cosine(c.embedding, related.embedding) AS similarity
ORDER BY similarity DESC
LIMIT 10

-- Find influential papers (high citation count)
MATCH (doc:Document)<-[:CITES]-(citing:Document)
RETURN doc.title, count(citing) AS citations
ORDER BY citations DESC
LIMIT 20

-- Find collaboration networks
MATCH (a1:Author)-[:AUTHORED]->(doc:Document)<-[:AUTHORED]-(a2:Author)
WHERE a1 <> a2
RETURN a1.name, a2.name, count(doc) AS collaborations
ORDER BY collaborations DESC

Example 4: Fraud Detection

Entities: Account, Transaction, Device, IP Address Relationships: OWNS, INITIATED, USED, FROM_IP

-- Schema design
(account:Account {
  id: 1,
  accountNumber: "1234567890",
  status: "active",
  riskScore: 0.3
})

(txn:Transaction {
  id: 5001,
  amount: 150.00,
  timestamp: timestamp(),
  status: "completed",
  flagged: false
})

(device:Device {
  id: "device-abc123",
  type: "mobile",
  os: "iOS",
  fingerprint: "abcd1234..."
})

(ip:IPAddress {
  address: "192.168.1.100",
  country: "US",
  riskLevel: "low"
})

-- Relationships
(account)-[:INITIATED {timestamp: timestamp()}]->(txn)
(txn)-[:USED]->(device)
(txn)-[:FROM_IP]->(ip)
(account)-[:OWNS]->(device)

Fraud Detection Queries:

-- Find accounts using same device (account takeover)
MATCH (a1:Account)-[:OWNS]->(d:Device)<-[:OWNS]-(a2:Account)
WHERE a1 <> a2
RETURN a1.accountNumber, a2.accountNumber, d.fingerprint

-- Find transactions from high-risk IPs
MATCH (txn:Transaction)-[:FROM_IP]->(ip:IPAddress)
WHERE ip.riskLevel = "high"
RETURN txn.id, txn.amount, ip.address, ip.country

-- Detect unusual transaction patterns
MATCH (account:Account)-[:INITIATED]->(txn:Transaction)
WITH account, avg(txn.amount) AS avgAmount, stddev(txn.amount) AS stdDev
MATCH (account)-[:INITIATED]->(recent:Transaction)
WHERE recent.timestamp > timestamp() - duration('PT24H')
  AND abs(recent.amount - avgAmount) > 3 * stdDev
RETURN account.accountNumber, recent.id, recent.amount, avgAmount, stdDev

Schema Evolution

Adding New Labels/Properties

-- Add new label to existing nodes
MATCH (e:Employee) WHERE e.role = "Manager"
SET e:Manager

-- Add new property with default
MATCH (p:Product) WHERE p.stock IS NULL
SET p.stock = 0

-- Add computed property
MATCH (p:Person)-[k:KNOWS]->(friend)
WITH p, count(friend) AS friendCount
SET p.cachedFriendCount = friendCount

Migrating Relationships

-- Split generic relationship into specific types
MATCH (a:Account)-[r:RELATED_TO {type: "owns"}]->(b:Account)
CREATE (a)-[:OWNS {since: r.createdAt}]->(b)
DELETE r

-- Merge duplicate relationships
MATCH (a:Person)-[r:KNOWS]->(b:Person)
WITH a, b, collect(r) AS rels
WHERE size(rels) > 1
CREATE (a)-[:KNOWS {since: min([rel IN rels | rel.since])}]->(b)
FOREACH (rel IN rels | DELETE rel)

Handling Breaking Changes

-- Create new property, migrate, then deprecate old
-- Step 1: Add new property
MATCH (p:Product)
SET p.priceInCents = toInt(p.price * 100)

-- Step 2: Update application to use priceInCents

-- Step 3: Remove old property (after migration)
MATCH (p:Product)
REMOVE p.price

Common Antipatterns

❌ Antipattern 1: Treating Graph Like Relational

Bad:

-- Treating nodes as relational tables with foreign keys
(order:Order {id: 1, customerId: 42, productId: 101})

Good:

-- Use explicit relationships
(customer:Customer {id: 42})-[:PURCHASED]->(order:Order {id: 1})
(order)-[:CONTAINS]->(:LineItem)-[:OF_PRODUCT]->(product:Product {id: 101})

❌ Antipattern 2: Dense Nodes (Supernodes)

Problem: Nodes with millions of relationships cause performance issues.

Bad:

-- Every user connected to same "System" node
(user1:User)-[:REGISTERED_IN]->(system:System)
(user2:User)-[:REGISTERED_IN]->(system:System)
-- ... millions of users

Good:

-- Partition by time/category
(user1:User {registeredAt: timestamp()})
(user2:User {registeredAt: timestamp()})

-- Or use properties instead
(user:User {systemId: "main", registeredAt: timestamp()})

❌ Antipattern 3: Over-Indexing

Bad:

-- Creating indexes on every property
CREATE INDEX person_name ON Person(name);
CREATE INDEX person_age ON Person(age);
CREATE INDEX person_email ON Person(email);
CREATE INDEX person_city ON Person(city);
CREATE INDEX person_zip ON Person(zip);
CREATE INDEX person_created ON Person(createdAt);
-- Too many indexes slow down writes

Good:

-- Index only frequently queried/filtered properties
CREATE INDEX person_email ON Person(email);  -- Unique lookups
CREATE INDEX person_name ON Person(name);    -- Search by name

-- Skip indexes on rarely queried properties

❌ Antipattern 4: Ignoring Cardinality

Problem: Not considering relationship cardinality.

Bad:

-- Many-to-many without intermediate node
(student:Student)-[:ENROLLED_IN {grade: "A"}]->(course:Course)
-- What if student takes course multiple times?

Good:

-- Use intermediate node for many-to-many
(student:Student)-[:HAS]->(enrollment:Enrollment {
  grade: "A",
  semester: "Fall 2025",
  year: 2025
})-[:IN]->(course:Course)

Performance Considerations

Indexing Strategy

Index properties used in:

Exact matches: WHERE p.email = '[email protected]'
Range queries: WHERE p.age > 30
Sorting: ORDER BY p.name

-- Create appropriate index types
CREATE INDEX person_email ON Person(email);           -- B-tree for exact/range
CREATE FULLTEXT INDEX person_bio ON Person(bio);     -- Full-text search
CREATE VECTOR INDEX doc_embed ON Document(embedding); -- Vector similarity
CREATE SPATIAL INDEX loc_coords ON Location(coords);  -- Geo queries

Query Optimization

Principle: Start with specific, indexed lookups, then expand.

-- GOOD: Start with indexed lookup
MATCH (alice:Person {email: "[email protected]"})  -- Index seek
MATCH (alice)-[:KNOWS]->(friend)
RETURN friend.name

-- BAD: Full scan first
MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.email = "[email protected]"  -- Scans all persons first
RETURN friend.name

Next Steps

Data Types Reference - Complete type system
Query Performance - Optimize queries with EXPLAIN/PROFILE
Indexing Guide - Index types and strategies
GQL Guide - Query language syntax
Migration Guide - Migrate from other graph databases

Overview Share link

Property Graph Fundamentals Share link

Core Concepts Share link

Example Graph Share link

Node Label Design Share link

Labeling Strategy Share link

✅ Good Labeling Patterns Share link

❌ Bad Labeling Antipatterns Share link

Label Naming Conventions Share link

Relationship Type Design Share link

Relationship Naming Share link

✅ Good Relationship Patterns Share link

❌ Bad Relationship Antipatterns Share link

Relationship Properties Share link

Property Design Patterns Share link

Property Selection Share link

✅ When to Use Properties Share link

❌ When NOT to Use Properties Share link

Property Types Share link

Domain Examples Share link

Example 1: Social Network Share link

Example 2: E-Commerce Share link

Example 3: Knowledge Graph Share link

Example 4: Fraud Detection Share link

Schema Evolution Share link

Adding New Labels/Properties Share link

Migrating Relationships Share link

Handling Breaking Changes Share link

Common Antipatterns Share link

❌ Antipattern 1: Treating Graph Like Relational Share link

❌ Antipattern 2: Dense Nodes (Supernodes) Share link

❌ Antipattern 3: Over-Indexing Share link

❌ Antipattern 4: Ignoring Cardinality Share link

Performance Considerations Share link

Indexing Strategy Share link

Query Optimization Share link

Next Steps Share link

Overview

Property Graph Fundamentals

Core Concepts

Example Graph

Node Label Design

Labeling Strategy

✅ Good Labeling Patterns

❌ Bad Labeling Antipatterns

Label Naming Conventions

Relationship Type Design

Relationship Naming

✅ Good Relationship Patterns

❌ Bad Relationship Antipatterns

Relationship Properties

Property Design Patterns

Property Selection

✅ When to Use Properties

❌ When NOT to Use Properties

Property Types

Domain Examples

Example 1: Social Network

Example 2: E-Commerce

Example 3: Knowledge Graph

Example 4: Fraud Detection

Schema Evolution

Adding New Labels/Properties

Migrating Relationships

Handling Breaking Changes

Common Antipatterns

❌ Antipattern 1: Treating Graph Like Relational

❌ Antipattern 2: Dense Nodes (Supernodes)

❌ Antipattern 3: Over-Indexing

❌ Antipattern 4: Ignoring Cardinality

Performance Considerations

Indexing Strategy

Query Optimization

Next Steps