Data integrity and consistency are fundamental to reliable database systems. Geode provides comprehensive mechanisms to ensure your graph data remains accurate, valid, and consistent through ACID transactions, schema constraints, validation rules, and automated integrity checks.
Data Integrity Fundamentals
Data integrity ensures that data is:
- Accurate: Data correctly represents real-world entities
- Valid: Data conforms to defined rules and constraints
- Consistent: Data is coherent across the database
- Complete: Required data is present and not missing
- Unique: Duplicate data is prevented where necessary
Geode enforces data integrity through multiple layers:
- Transaction guarantees: ACID properties ensure consistent state
- Schema constraints: Rules enforced at the database level
- Validation rules: Custom business logic validation
- Referential integrity: Relationship consistency enforcement
- Checksums: Cryptographic verification of data authenticity
ACID Guarantees
Geode provides full ACID (Atomicity, Consistency, Isolation, Durability) compliance:
Atomicity
All operations in a transaction complete successfully or none do:
-- Transfer funds between accounts
BEGIN;
MATCH (from:Account {id: $from_id})
WHERE from.balance >= $amount
SET from.balance = from.balance - $amount;
MATCH (to:Account {id: $to_id})
SET to.balance = to.balance + $amount;
-- If either operation fails, entire transaction rolls back
COMMIT;
Consistency
Database transitions from one valid state to another:
-- Constraint ensures valid state
CREATE CONSTRAINT positive_balance
ON (a:Account)
ASSERT a.balance >= 0;
-- Transaction that violates constraint fails
BEGIN;
MATCH (a:Account {id: $account_id})
SET a.balance = -100; -- Fails: violates positive_balance constraint
COMMIT;
Isolation
Concurrent transactions don’t interfere with each other:
-- Transaction 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
MATCH (p:Product {sku: $sku})
WHERE p.inventory > 0
SET p.inventory = p.inventory - 1;
COMMIT;
-- Transaction 2 (concurrent)
-- Waits for Transaction 1 to complete
-- Ensures no double-booking of inventory
Durability
Committed changes survive system failures:
# Enable write-ahead logging for durability
geode serve --wal-enabled=true \
--wal-sync-mode=fsync \
--wal-flush-interval=100ms
# After crash, recover from WAL
geode recover --wal-directory=/var/lib/geode/wal
Schema Constraints
Define and enforce data quality rules at the schema level:
Uniqueness Constraints
-- Ensure email addresses are unique
CREATE CONSTRAINT unique_email
ON (u:User)
ASSERT u.email IS UNIQUE;
-- Ensure compound uniqueness
CREATE CONSTRAINT unique_person_name
ON (p:Person)
ASSERT (p.first_name, p.last_name, p.birth_date) IS UNIQUE;
-- Attempt to insert duplicate fails
CREATE (:User {email: 'alice@example.com'});
CREATE (:User {email: 'alice@example.com'}); -- Error: Constraint violation
Existence Constraints
-- Ensure required properties exist
CREATE CONSTRAINT required_user_fields
ON (u:User)
ASSERT u.email IS NOT NULL
AND u.created_at IS NOT NULL
AND u.status IS NOT NULL;
-- Insertion without required field fails
CREATE (:User {name: 'Bob'}); -- Error: Missing required property 'email'
Property Type Constraints
-- Ensure property types
CREATE CONSTRAINT email_format
ON (u:User)
ASSERT u.email MATCHES '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';
CREATE CONSTRAINT age_range
ON (p:Person)
ASSERT p.age >= 0 AND p.age <= 150;
CREATE CONSTRAINT positive_price
ON (p:Product)
ASSERT p.price > 0;
Relationship Constraints
-- Ensure relationship cardinality
CREATE CONSTRAINT one_manager
ON ()-[r:REPORTS_TO]->()
ASSERT count {(person)-[:REPORTS_TO]->()} <= 1;
-- Ensure relationship endpoints
CREATE CONSTRAINT valid_employment
ON ()-[r:EMPLOYED_BY]->()
ASSERT start_node(r):Person AND end_node(r):Company;
-- Ensure bidirectional relationships
CREATE CONSTRAINT symmetric_friendship
ON ()-[r:FRIENDS_WITH]->()
ASSERT EXISTS {
MATCH (a)-[r:FRIENDS_WITH]->(b)
MATCH (b)-[:FRIENDS_WITH]->(a)
};
Referential Integrity
Ensure relationships reference valid entities:
Foreign Key Constraints
-- Ensure order references valid customer
CREATE CONSTRAINT valid_customer_reference
ON (o:Order)
ASSERT EXISTS {
MATCH (o)-[:ORDERED_BY]->(c:Customer)
};
-- Prevent orphaned orders
-- Deletion of customer fails if orders exist
MATCH (c:Customer {id: $customer_id})
DELETE c; -- Error if orders reference this customer
Cascade Operations
-- Delete customer and all related orders
MATCH (c:Customer {id: $customer_id})
OPTIONAL MATCH (c)<-[:ORDERED_BY]-(o:Order)
DELETE c, o;
-- Update with cascade
MATCH (c:Customer {id: $old_id})
SET c.id = $new_id
WITH c
MATCH (c)<-[r:ORDERED_BY]-(o:Order)
-- Relationships automatically updated
RETURN count(o) AS updated_orders;
Preventing Dangling References
-- Before delete, check for references
MATCH (p:Product {id: $product_id})
WHERE NOT EXISTS {
MATCH (p)<-[:CONTAINS]-(order:Order)
WHERE order.status IN ['pending', 'processing']
}
DELETE p;
-- If active references exist, deletion blocked
Validation Rules
Implement custom validation logic:
Pre-Insert Validation
-- Validate before insertion
CREATE FUNCTION validate_user(props MAP)
RETURNS BOOLEAN
AS $$
RETURN props.email IS NOT NULL
AND props.email MATCHES '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
AND props.age >= 18
AND props.agreed_to_terms = true
$$;
-- Use in constraint
CREATE CONSTRAINT valid_user
ON (u:User)
ASSERT validate_user(properties(u));
Cross-Field Validation
-- Ensure date ranges are valid
CREATE CONSTRAINT valid_date_range
ON (e:Event)
ASSERT e.end_date >= e.start_date;
-- Ensure conditional fields
CREATE CONSTRAINT conditional_fields
ON (o:Order)
ASSERT (o.status = 'shipped' AND o.tracking_number IS NOT NULL)
OR (o.status != 'shipped');
Business Rule Validation
-- Ensure inventory is sufficient for order
CREATE CONSTRAINT sufficient_inventory
ON ()-[r:CONTAINS]->()
ASSERT {
MATCH (order)-[r:CONTAINS {quantity: qty}]->(product:Product)
WHERE product.inventory >= qty
RETURN true
};
Data Consistency Checks
Consistency Verification
# Run consistency check
geode check-consistency --full \
--report=/var/log/geode/consistency-report.json
# Example output:
# Checking node integrity... OK (1,234,567 nodes)
# Checking relationship integrity... OK (3,456,789 relationships)
# Checking constraint violations... FOUND 3 violations
# Checking orphaned relationships... OK
# Checking index consistency... OK
Automated Consistency Checks
# Enable periodic consistency checks
geode serve --consistency-check-enabled=true \
--consistency-check-interval=daily \
--consistency-check-time=03:00
# Alert on inconsistencies
geode serve --consistency-alert=true \
--alert-channel=email,pagerduty
Manual Verification Queries
-- Find nodes missing required properties
MATCH (u:User)
WHERE u.email IS NULL
OR u.created_at IS NULL
RETURN count(u) AS invalid_users;
-- Find orphaned relationships
MATCH ()-[r:ORDERED_BY]->()
WHERE NOT EXISTS {
MATCH (o:Order)-[r]->(c:Customer)
}
RETURN count(r) AS orphaned_orders;
-- Find duplicate data
MATCH (u1:User), (u2:User)
WHERE u1.email = u2.email
AND id(u1) < id(u2)
RETURN u1.email, count(*) AS duplicates;
-- Verify relationship integrity
MATCH (p:Person)-[r:WORKS_AT]->(c:Company)
WHERE NOT EXISTS {(c:Company)}
RETURN count(r) AS invalid_employment_relationships;
Checksums and Verification
Data Checksums
# Enable data checksums
geode serve --data-checksums=enabled \
--checksum-algorithm=sha256 \
--verify-checksums-on-read=true
# Verify data integrity
geode verify-checksums --path=/var/lib/geode/data \
--report=checksum-report.json
Write Verification
# Enable write verification
geode serve --verify-writes=true \
--write-verification-level=full
# Reads data back after write to confirm
Transaction Integrity
Savepoints
Create savepoints for partial rollback:
BEGIN;
-- First operation
CREATE (p:Person {name: 'Alice', email: 'alice@example.com'});
-- Create savepoint
SAVEPOINT after_person_creation;
-- Second operation
CREATE (c:Company {name: 'Acme Corp'});
CREATE (p)-[:WORKS_AT]->(c);
-- Error occurs, rollback to savepoint
ROLLBACK TO SAVEPOINT after_person_creation;
-- Person still exists, company and relationship removed
COMMIT;
Transaction Validation
-- Validate entire transaction before commit
BEGIN;
-- Multiple operations
CREATE (p:Person {name: 'Bob'});
CREATE (c:Company {name: 'TechCo'});
CREATE (p)-[:WORKS_AT]->(c);
-- Validate constraints before commit
CALL validate_transaction();
-- Only commits if all constraints satisfied
COMMIT;
Error Handling
Constraint Violation Handling
from geode_client import Client, ConstraintViolationError
client = Client('geode.example.com')
try:
await client.run("""
CREATE (:User {email: '[email protected]'})
""")
except ConstraintViolationError as e:
print(f"Constraint violation: {e.constraint_name}")
print(f"Details: {e.message}")
# Handle duplicate email gracefully
Transaction Conflict Resolution
max_retries = 3
retry_count = 0
while retry_count < max_retries:
try:
async with client.connection() as txn:
await txn.begin()
# Read current value
result = await txn.run("""
MATCH (p:Product {sku: $sku})
RETURN p.inventory, p.version
""", sku=sku)
inventory, version = result.rows[0]
# Update with optimistic locking
await txn.run("""
MATCH (p:Product {sku: $sku})
WHERE p.version = $version
SET p.inventory = $new_inventory,
p.version = $version + 1
""", sku=sku, version=version, new_inventory=inventory - 1)
await txn.commit()
break
except ConflictError:
retry_count += 1
if retry_count >= max_retries:
raise
await asyncio.sleep(0.1 * (2 ** retry_count)) # Exponential backoff
Best Practices
- Define Constraints Early: Establish constraints during schema design, not after data is loaded
- Use ACID Transactions: Wrap related operations in transactions for consistency
- Validate Input: Validate data at application layer before database insertion
- Regular Consistency Checks: Run automated consistency checks periodically
- Monitor Constraint Violations: Alert on constraint violations to detect data quality issues
- Use Appropriate Isolation: Choose isolation level based on consistency requirements
- Implement Retry Logic: Handle transient errors with exponential backoff
- Test Constraints: Thoroughly test constraints before production deployment
- Document Data Rules: Maintain clear documentation of all data integrity rules
- Backup Before Changes: Always backup before making schema or constraint changes
Monitoring Data Integrity
Integrity Metrics
# Monitor integrity metrics
geode stats --component=integrity \
--metrics=constraint_violations,checksum_errors,transaction_rollbacks
# Example output:
# Metric | Value | Trend
# --------------------------|-------|-------
# Constraint Violations | 12 | ↓
# Checksum Errors | 0 | →
# Transaction Rollbacks | 145 | ↑
# Orphaned Relationships | 3 | ↓
# Duplicate Nodes | 0 | →
Audit Integrity Events
# Enable integrity event logging
geode serve --audit-integrity-events=true \
--log-constraint-violations=true \
--log-data-corruption=true
Recovery from Integrity Issues
Constraint Violation Cleanup
-- Find and fix constraint violations
MATCH (u:User)
WHERE u.email IS NULL
SET u.email = 'unknown-' + toString(id(u)) + '@example.com';
-- Remove duplicates
MATCH (u1:User), (u2:User)
WHERE u1.email = u2.email
AND id(u1) < id(u2)
DELETE u2;
Orphaned Data Cleanup
-- Remove orphaned relationships
MATCH ()-[r:ORDERED_BY]->()
WHERE NOT EXISTS {
MATCH (o:Order)-[r]->(c:Customer)
}
DELETE r;
Data Repair
# Repair corrupted data
geode repair --checksum-errors \
--orphaned-relationships \
--constraint-violations \
--dry-run
# Apply repairs
geode repair --apply
Related Topics
- Transactions - ACID transaction management
- Isolation - Transaction isolation levels
- Compliance - Regulatory compliance and data integrity
- Data Governance - Data quality and governance
- Audit Logging - Tracking data modifications
- Schema Design - Schema design and constraints
- Error Codes - Error code reference for integrity errors
- Backup and Recovery - Data protection and recovery