Graph Database Fundamentals
Graph databases represent a fundamental shift in how we model and query connected data. Unlike traditional relational databases that use tables and joins, graph databases store data as nodes (entities) and relationships (connections), making them naturally suited for highly connected datasets where relationships are as important as the data itself.
What is a Graph Database?
A graph database is a database management system that uses graph structures with nodes, relationships, and properties to represent and store data. The graph relates the data items in the store to a collection of nodes and edges, where edges represent the relationships between nodes.
Core Components
- Nodes: Represent entities (people, products, locations, etc.)
- Relationships: Connect nodes and represent how entities relate
- Properties: Key-value pairs attached to nodes and relationships
- Labels: Categorize nodes into groups (optional, implementation-specific)
-- Node example: Person with properties
(p:Person {
id: 'user123',
name: 'Alice Johnson',
email: 'alice@example.com',
age: 32
})
-- Relationship example: connecting two nodes
(alice:Person)-[r:KNOWS {since: DATE '2020-05-15', context: 'college'}]->(bob:Person)
Graph Theory Background
Graph databases are based on mathematical graph theory, which has been studied for centuries.
Mathematical Definition
A graph G is defined as G = (V, E) where:
- V is a set of vertices (nodes)
- E is a set of edges (relationships)
In property graphs (like Geode), we extend this to:
- Each vertex has properties: v = {label, {k₁: v₁, k₂: v₂, …}}
- Each edge has type and properties: e = {type, {k₁: v₁, k₂: v₂, …}}
Graph Types
Directed vs. Undirected
- Directed: Relationships have direction (A→B is different from B→A)
- Undirected: Relationships are bidirectional
Most graph databases, including Geode, use directed graphs where relationships have explicit direction.
-- Directed relationship
(alice:Person)-[:FOLLOWS]->(bob:Person)
-- Alice follows Bob, but Bob doesn't necessarily follow Alice
-- To query bidirectionally
MATCH (a:Person)-[:FOLLOWS]-(b:Person)
WHERE a.name = 'Alice'
-- Finds relationships in both directions
Weighted vs. Unweighted
- Weighted: Relationships have numeric weights (distances, costs, strengths)
- Unweighted: All relationships are equal
-- Weighted graph: social network with interaction strength
(user1)-[:INTERACTS {weight: 0.85, interactions: 47}]->(user2)
Cyclic vs. Acyclic
- Cyclic: Contains cycles (paths that loop back)
- Acyclic: No cycles (directed acyclic graphs - DAGs)
-- DAG example: task dependencies (no circular dependencies allowed)
(task1:Task)-[:DEPENDS_ON]->(task2:Task)-[:DEPENDS_ON]->(task3:Task)
-- Cyclic example: social network (mutual follows create cycles)
(alice)-[:FOLLOWS]->(bob)-[:FOLLOWS]->(charlie)-[:FOLLOWS]->(alice)
Property Graph Model
Geode implements the property graph model, the most popular graph database model (used by Neo4j, Neptune, TigerGraph, and others).
Key Characteristics
- Nodes and Relationships are First-Class Citizens: Both can have properties
- Multi-Relational: Multiple relationship types between same nodes
- Schema-Optional: Can be schema-free, schema-enforced, or hybrid
- Type System: Rich data types (strings, numbers, dates, lists, maps)
Example: Social Network
-- Create users
INSERT (alice:Person {name: 'Alice', city: 'San Francisco', joined: DATE '2024-01-15'});
INSERT (bob:Person {name: 'Bob', city: 'New York', joined: DATE '2024-02-20'});
INSERT (charlie:Person {name: 'Charlie', city: 'Austin', joined: DATE '2024-03-10'});
-- Create relationships
INSERT (alice)-[:FOLLOWS {since: CURRENT_DATE}]->(bob);
INSERT (bob)-[:FOLLOWS {since: CURRENT_DATE}]->(charlie);
INSERT (charlie)-[:FOLLOWS {since: CURRENT_DATE}]->(alice);
-- Create content
INSERT (post:Post {title: 'Graph Databases 101', created: CURRENT_TIMESTAMP});
INSERT (alice)-[:AUTHORED]->(post);
INSERT (bob)-[:LIKED {timestamp: CURRENT_TIMESTAMP}]->(post);
-- Query the graph
MATCH (author:Person)-[:AUTHORED]->(post:Post)<-[:LIKED]-(liker:Person)
RETURN author.name, post.title, collect(liker.name) AS likers;
When to Use Graph Databases
Ideal Use Cases
1. Social Networks
- Friend connections, followers, recommendations
- Natural graph structure with nodes (users) and relationships (connections)
-- Find friends of friends who like similar content
MATCH (me:User {id: $user_id})-[:FRIEND]->(friend)-[:FRIEND]->(fof:User)
WHERE NOT EXISTS { MATCH (me)-[:FRIEND]->(fof) }
AND fof <> me
MATCH (me)-[:LIKES]->(interest:Interest)<-[:LIKES]-(fof)
RETURN fof.name, count(interest) AS common_interests
ORDER BY common_interests DESC
LIMIT 10;
2. Recommendation Engines
- Collaborative filtering, content recommendations, personalization
-- Recommend products based on similar users
MATCH (u1:User {id: $user_id})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(u2:User)
MATCH (u2)-[:PURCHASED]->(rec:Product)
WHERE NOT EXISTS { MATCH (u1)-[:PURCHASED]->(rec) }
RETURN rec.name, count(u2) AS recommendation_strength
ORDER BY recommendation_strength DESC
LIMIT 20;
3. Fraud Detection
- Pattern matching for suspicious behavior, money laundering detection
-- Detect circular transaction patterns (potential money laundering)
MATCH path = (a:Account)-[:TRANSFER*3..5]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 10000)
AND all(r IN relationships(path) WHERE r.timestamp >= CURRENT_DATE - DURATION 'P7D')
RETURN path, reduce(total = 0, r IN relationships(path) | total + r.amount) AS total_amount
ORDER BY total_amount DESC;
4. Knowledge Graphs
- Semantic networks, ontologies, Wikipedia-style interconnected knowledge
-- Explore knowledge graph connections
MATCH (entity:Entity {name: 'Albert Einstein'})
-[r1:RELATED_TO]->(related)
-[r2:RELATED_TO]->(further)
RETURN entity.name, type(r1), related.name, type(r2), further.name
LIMIT 100;
5. Network and Infrastructure Management
- IT systems, telecommunications, power grids, logistics networks
-- Find impact of server failure
MATCH (failed:Server {id: 'srv-42'})<-[:DEPENDS_ON*]-(affected:Service)
RETURN affected.name, affected.criticality, affected.users_affected
ORDER BY affected.criticality DESC;
6. Access Control and Authorization
- Role hierarchies, permission inheritance, complex authorization rules
-- Check if user has permission (direct or inherited)
MATCH path = (u:User {id: $user_id})-[:HAS_ROLE*]->(r:Role)-[:HAS_PERMISSION]->(p:Permission)
WHERE p.action = $action AND p.resource = $resource
RETURN count(path) > 0 AS has_permission;
When NOT to Use Graph Databases
1. Simple Tabular Data
If your data has few relationships and mostly consists of independent records, relational databases are simpler and more efficient.
2. High-Volume Transactional Systems
For pure OLTP with simple queries (e.g., account balance lookups), traditional RDBMS may be faster.
3. Large Binary Objects
Graph databases are not optimized for storing large files, videos, or images. Use object storage (S3) and store references in the graph.
4. Pure Aggregations Without Relationships
For time-series data or analytics without relationship traversal, specialized databases (InfluxDB, ClickHouse) may be better.
Graph Database Architecture
Storage Strategies
1. Native Graph Storage
Stores graph structure using adjacency lists or similar graph-native formats (Geode, Neo4j).
Advantages:
- Optimized for traversals
- Index-free adjacency (no joins needed)
- High performance for graph queries
2. Non-Native Storage
Uses relational or document storage underneath with graph abstraction layer.
Advantages:
- Leverages existing database infrastructure
- Familiar backup/recovery tools
Disadvantages:
- Slower traversals (requires joins)
- Limited scalability for deep queries
Query Processing
Graph Traversal Algorithms
- Breadth-First Search (BFS): Explore neighbors level by level
- Depth-First Search (DFS): Explore deeply before backtracking
- Bidirectional Search: Search from both endpoints simultaneously
-- BFS implicitly used for shortest path
MATCH path = SHORTEST (a:Person)-[:KNOWS*]-(b:Person)
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN path;
Index Usage
Graph databases use indexes for:
- Finding starting nodes (e.g., by property value)
- Filtering during traversal
- Accelerating aggregations
-- Index scan to find starting node
CREATE INDEX person_name ON Person(name);
-- Query uses index to find 'Alice', then traverses
MATCH (a:Person {name: 'Alice'})-[:KNOWS*2..3]->(friend)
RETURN friend.name;
Graph Query Languages
GQL (Graph Query Language)
ISO/IEC 39075:2024 standard implemented by Geode.
Advantages:
- International standard (like SQL for relational)
- Vendor-neutral
- Modern design (learned from SQL, Cypher, SPARQL)
- Pattern matching syntax
- Strong typing
-- GQL syntax
MATCH (a:Person)-[r:KNOWS]->(b:Person)
WHERE a.age > 30
RETURN a.name, b.name, r.since;
Cypher
Neo4j’s proprietary language (influenced GQL).
-- Cypher syntax (very similar to GQL)
MATCH (a:Person)-[r:KNOWS]->(b:Person)
WHERE a.age > 30
RETURN a.name, b.name, r.since
Gremlin
Apache TinkerPop’s imperative traversal language.
// Gremlin syntax (procedural)
g.V().hasLabel('Person')
.has('age', gt(30))
.outE('KNOWS')
.inV()
.path()
SPARQL
W3C standard for RDF/semantic graphs.
# SPARQL syntax (for RDF triples)
SELECT ?aName ?bName ?since
WHERE {
?a rdf:type :Person .
?a :knows ?b .
?a :age ?age .
?a :name ?aName .
?b :name ?bName .
FILTER (?age > 30)
}
Comparison with Other Database Types
vs. Relational Databases (SQL)
| Aspect | Graph DB | Relational DB |
|---|---|---|
| Model | Nodes and relationships | Tables and foreign keys |
| Relationships | First-class citizens | Represented by joins |
| Schema | Flexible (optional) | Rigid (required) |
| Joins | Index-free adjacency | Expensive JOIN operations |
| Use Case | Connected data queries | Structured tabular data |
| Performance | Fast multi-hop traversals | Slow for deep joins |
Example: Find friends of friends
-- Graph DB: natural and fast
MATCH (me:Person {name: 'Alice'})-[:FRIEND]->(friend)-[:FRIEND]->(fof)
WHERE fof <> me
RETURN fof.name;
-- Relational: requires self-joins
SELECT f2.name
FROM persons me
JOIN friendships f1 ON me.id = f1.person1_id
JOIN persons friend ON f1.person2_id = friend.id
JOIN friendships f2 ON friend.id = f2.person1_id
JOIN persons fof ON f2.person2_id = fof.id
WHERE me.name = 'Alice' AND fof.id <> me.id;
vs. Document Databases (MongoDB, etc.)
| Aspect | Graph DB | Document DB |
|---|---|---|
| Relationships | Explicit and navigable | Embedded or referenced |
| Queries | Pattern matching | Key-value or document lookup |
| Consistency | ACID (Geode) | Eventual (most) |
| Use Case | Relationship-heavy | Document-centric |
Document DB Limitation: Relationships require application-level joins or denormalization.
vs. Key-Value Stores (Redis, DynamoDB)
| Aspect | Graph DB | Key-Value Store |
|---|---|---|
| Data Model | Complex graph | Simple key-value pairs |
| Query Power | Rich pattern matching | Get by key only |
| Relationships | First-class | Not supported |
| Performance | Fast traversals | Fastest single-key lookup |
vs. Other Graph Databases
Geode vs. Neo4j
- Standards: Geode uses GQL (ISO standard); Neo4j uses Cypher (proprietary)
- Architecture: Both native graph storage
- ACID: Both full ACID
- Licensing: Geode open-source Apache 2.0; Neo4j dual-licensed (GPL/commercial)
Geode vs. Amazon Neptune
- Deployment: Geode on-premise or any cloud; Neptune AWS-only
- Query Language: Geode GQL; Neptune Gremlin/SPARQL
- Cost: Geode open-source; Neptune pay-per-instance
Geode vs. TigerGraph
- Language: GQL vs. GSQL
- Performance: Both high-performance; different optimization strategies
- ACID: Geode full ACID; TigerGraph eventual consistency in distributed mode
ACID Properties in Graph Databases
Geode implements full ACID compliance:
- Atomicity: Transactions are all-or-nothing
- Consistency: Data integrity constraints are enforced
- Isolation: Concurrent transactions don’t interfere
- Durability: Committed data persists across failures
-- ACID transaction example
BEGIN TRANSACTION;
INSERT (u:User {id: 'u123', balance: 1000});
INSERT (u2:User {id: 'u456', balance: 500});
-- Transfer money atomically
MATCH (sender:User {id: 'u123'}), (receiver:User {id: 'u456'})
SET sender.balance = sender.balance - 100
SET receiver.balance = receiver.balance + 100;
COMMIT;
-- Either both updates happen, or neither does
Getting Started with Graph Databases
1. Model Your Domain
Identify entities (nodes) and relationships:
- Entities: What are the “things” in your system?
- Relationships: How do these things connect?
- Properties: What attributes describe entities and relationships?
2. Install Geode
# Download and start Geode
cd geode
make build
geode serve --listen 0.0.0.0:3141
3. Create Your First Graph
-- Define your schema (optional in Geode)
INSERT (alice:Person {name: 'Alice', age: 30});
INSERT (bob:Person {name: 'Bob', age: 28});
INSERT (alice)-[:KNOWS {since: DATE '2020-01-01'}]->(bob);
4. Query Your Graph
-- Find Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(friend)
RETURN friend.name, friend.age;
5. Explore and Iterate
Try different queries, add more data, experiment with patterns:
-- More complex pattern
MATCH (p:Person)-[:KNOWS]->(friend)-[:KNOWS]->(fof)
WHERE p.name = 'Alice' AND fof <> p
RETURN fof.name, count(*) AS connections
ORDER BY connections DESC;
Best Practices
- Model Relationships Explicitly: Don’t hide relationships in properties
- Use Meaningful Relationship Types:
KNOWSis better thanRELATED_TO - Index Frequently Queried Properties: Speed up node lookups
- Avoid Dense Nodes: Nodes with millions of relationships can slow queries
- Use Parameterized Queries: Enable query plan caching
- Monitor Query Performance: Profile slow queries with PROFILE command
Conclusion
Graph databases represent a paradigm shift from traditional relational databases, offering natural modeling and efficient querying of connected data. Geode’s implementation of the ISO/IEC 39075:2024 GQL standard provides enterprise-grade graph capabilities with ACID guarantees, making it suitable for production workloads ranging from social networks to fraud detection to knowledge graphs.
Explore the documentation below to dive deeper into specific aspects of graph database theory, architecture, and practical implementation with Geode.