Data governance in Geode provides comprehensive capabilities for managing data quality, tracking lineage, enforcing policies, and maintaining compliance across your graph database. This guide covers the tools and practices for implementing effective data governance in enterprise environments.
Data Governance Overview
Data governance ensures that data is:
- Accurate: Data quality meets business requirements
- Secure: Access is controlled and audited
- Compliant: Regulatory requirements are met
- Discoverable: Users can find and understand data
- Traceable: Data lineage is documented
- Consistent: Standards are enforced across the organization
Geode provides built-in features to support all aspects of data governance.
Data Quality Management
Schema Constraints
Define data quality rules using schema constraints:
-- Ensure email addresses are valid
CREATE CONSTRAINT valid_email
ON (u:User)
ASSERT u.email MATCHES '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';
-- Ensure phone numbers follow E.164 format
CREATE CONSTRAINT valid_phone
ON (c:Contact)
ASSERT c.phone MATCHES '^\+[1-9]\d{1,14}$';
-- Ensure dates are in the future for events
CREATE CONSTRAINT future_event
ON (e:Event)
ASSERT e.event_date > current_date();
-- Ensure numeric ranges
CREATE CONSTRAINT valid_age
ON (p:Person)
ASSERT p.age >= 0 AND p.age <= 150;
-- Ensure required fields are present
CREATE CONSTRAINT required_fields
ON (p:Product)
ASSERT p.name IS NOT NULL
AND p.sku IS NOT NULL
AND p.price IS NOT NULL;
Data Quality Checks
Implement automated data quality checks:
-- Find records with missing required data
MATCH (p:Person)
WHERE p.email IS NULL
OR p.name IS NULL
OR p.created_at IS NULL
RETURN count(p) AS incomplete_records;
-- Find duplicate records
MATCH (p1:Person), (p2:Person)
WHERE p1.email = p2.email
AND id(p1) < id(p2)
RETURN p1.email, count(*) AS duplicates;
-- Find orphaned relationships
MATCH ()-[r:BELONGS_TO]->()
WHERE NOT EXISTS {
MATCH (n)-[r]->(m)
WHERE n:Entity AND m:Group
}
RETURN count(r) AS orphaned_relationships;
-- Validate referential integrity
MATCH (o:Order)-[:ORDERED_BY]->(c:Customer)
WHERE NOT EXISTS {
MATCH (c) WHERE c:Customer
}
RETURN count(o) AS orders_without_customers;
Data Quality Metrics
Track data quality over time:
-- Create data quality metrics
CREATE (:DataQualityMetric {
name: 'email_completeness',
timestamp: current_timestamp(),
total_records: count {MATCH (p:Person) RETURN p},
complete_records: count {MATCH (p:Person) WHERE p.email IS NOT NULL RETURN p},
completeness_pct: 100.0 * count {MATCH (p:Person) WHERE p.email IS NOT NULL RETURN p} /
count {MATCH (p:Person) RETURN p}
});
-- Query quality trends
MATCH (m:DataQualityMetric)
WHERE m.name = 'email_completeness'
AND m.timestamp > current_timestamp() - duration('P30D')
RETURN m.timestamp, m.completeness_pct
ORDER BY m.timestamp;
Data Lineage Tracking
Track how data flows through your systems and transformations:
Lineage Model
Model data lineage in the graph:
-- Source systems
CREATE (:DataSource {
id: 'crm_system',
name: 'Customer CRM',
type: 'database',
connection: 'postgresql://crm.example.com'
});
-- Data transformations
CREATE (:DataTransformation {
id: 'etl_customer_enrichment',
name: 'Customer Data Enrichment',
type: 'ETL',
script: 'customer_enrichment.py',
version: '2.1.0',
last_run: current_timestamp()
});
-- Data assets
CREATE (:DataAsset {
id: 'customer_360',
name: 'Customer 360 View',
type: 'graph',
schema: 'Person, Company, Product'
});
-- Create lineage relationships
MATCH (source:DataSource {id: 'crm_system'})
MATCH (transform:DataTransformation {id: 'etl_customer_enrichment'})
MATCH (asset:DataAsset {id: 'customer_360'})
CREATE (source)-[:FEEDS]->(transform)
CREATE (transform)-[:PRODUCES]->(asset);
Automatic Lineage Tracking
Enable automatic lineage tracking:
# Enable lineage tracking for all queries
geode serve --lineage-tracking=enabled \
--lineage-detail=full \
--lineage-storage=graph
Query lineage information:
-- Find all data sources for a specific asset
MATCH (source:DataSource)-[:FEEDS*]->(asset:DataAsset {id: 'customer_360'})
RETURN source.name, source.type;
-- Trace downstream impact of a data source
MATCH (source:DataSource {id: 'crm_system'})-[:FEEDS*]->(downstream)
RETURN downstream.name, labels(downstream);
-- Find all transformations applied to data
MATCH path = (source)-[:FEEDS*]->(asset:DataAsset {id: 'customer_360'})
WHERE 'DataTransformation' IN labels(nodes(path))
RETURN [n IN nodes(path) WHERE 'DataTransformation' IN labels(n) | n.name] AS transformations;
Impact Analysis
Analyze the impact of changes:
-- Find all assets affected by changing a source
MATCH (source:DataSource {id: 'crm_system'})-[:FEEDS*]->(affected)
RETURN DISTINCT labels(affected), affected.name
ORDER BY labels(affected);
-- Find all users affected by data source change
MATCH (source:DataSource {id: 'crm_system'})-[:FEEDS*]->(asset:DataAsset)
MATCH (user:User)-[:USES]->(asset)
RETURN DISTINCT user.name, user.email;
Metadata Management
Data Catalog
Build a searchable data catalog:
-- Create catalog entries
CREATE (:CatalogEntry {
id: 'customers_table',
name: 'Customer Data',
type: 'dataset',
description: 'Primary customer information including contact details and preferences',
owner: 'data-team@example.com',
created_at: current_timestamp(),
updated_at: current_timestamp(),
tags: ['pii', 'customer', 'core'],
classification: 'confidential',
retention_period: duration('P7Y')
});
-- Add schema information
CREATE (:SchemaField {
name: 'email',
type: 'string',
description: 'Customer email address',
required: true,
pii: true,
example: 'customer@example.com'
});
-- Link catalog entries to schema
MATCH (catalog:CatalogEntry {id: 'customers_table'})
MATCH (field:SchemaField {name: 'email'})
CREATE (catalog)-[:HAS_FIELD]->(field);
Searchable Metadata
Search the data catalog:
-- Search by tag
MATCH (entry:CatalogEntry)
WHERE 'pii' IN entry.tags
RETURN entry.name, entry.description;
-- Search by classification
MATCH (entry:CatalogEntry)
WHERE entry.classification = 'confidential'
RETURN entry.name, entry.owner;
-- Full-text search
MATCH (entry:CatalogEntry)
WHERE entry.name CONTAINS 'customer'
OR entry.description CONTAINS 'customer'
RETURN entry.name, entry.description;
Business Glossary
Define business terms and link to technical assets:
-- Create business terms
CREATE (:BusinessTerm {
id: 'customer_lifetime_value',
name: 'Customer Lifetime Value',
abbreviation: 'CLV',
definition: 'Predicted net profit attributed to the entire future relationship with a customer',
owner: 'finance-team@example.com',
approved_by: 'CFO',
approved_at: current_timestamp()
});
-- Link terms to data assets
MATCH (term:BusinessTerm {id: 'customer_lifetime_value'})
MATCH (field:SchemaField {name: 'lifetime_value'})
CREATE (term)-[:DEFINED_BY]->(field);
Access Policies
Policy-Based Access Control
Define and enforce access policies:
-- Create data access policy
CREATE (:DataAccessPolicy {
id: 'pii_access_policy',
name: 'PII Access Control',
description: 'Restrict access to personally identifiable information',
effective_date: current_timestamp(),
created_by: 'security-team@example.com'
});
-- Define policy rules
CREATE (:PolicyRule {
policy_id: 'pii_access_policy',
rule_type: 'row_level_security',
condition: 'user.has_role("pii_viewer") OR data.owner = current_user()',
action: 'allow'
});
-- Apply policies to data
MATCH (policy:DataAccessPolicy {id: 'pii_access_policy'})
MATCH (entry:CatalogEntry)
WHERE 'pii' IN entry.tags
CREATE (entry)-[:GOVERNED_BY]->(policy);
Data Classification
Classify data by sensitivity:
-- Define classification levels
CREATE (:ClassificationLevel {name: 'public', level: 1});
CREATE (:ClassificationLevel {name: 'internal', level: 2});
CREATE (:ClassificationLevel {name: 'confidential', level: 3});
CREATE (:ClassificationLevel {name: 'restricted', level: 4});
-- Classify data assets
MATCH (entry:CatalogEntry {id: 'customers_table'})
MATCH (level:ClassificationLevel {name: 'confidential'})
CREATE (entry)-[:CLASSIFIED_AS]->(level);
-- Enforce classification-based access
CREATE POLICY classification_access
ON CatalogEntry
FOR SELECT
USING {
MATCH (entry)-[:CLASSIFIED_AS]->(level:ClassificationLevel)
MATCH (user:User {id: current_user()})
WHERE user.clearance_level >= level.level
RETURN true
};
Data Retention and Lifecycle
Retention Policies
Define and enforce data retention:
-- Create retention policy
CREATE (:RetentionPolicy {
id: 'gdpr_customer_retention',
name: 'GDPR Customer Data Retention',
retention_period: duration('P7Y'),
deletion_method: 'secure_delete',
legal_basis: 'GDPR Article 5(1)(e)',
approved_by: 'legal-team@example.com'
});
-- Apply to data
MATCH (policy:RetentionPolicy {id: 'gdpr_customer_retention'})
MATCH (entry:CatalogEntry)
WHERE 'customer' IN entry.tags
CREATE (entry)-[:GOVERNED_BY]->(policy);
-- Find data eligible for deletion
MATCH (data)-[:GOVERNED_BY]->(policy:RetentionPolicy)
WHERE data.created_at + policy.retention_period < current_timestamp()
RETURN data.id, data.name, data.created_at;
Automated Lifecycle Management
# Enable automated data lifecycle management
geode serve --lifecycle-management=enabled \
--lifecycle-check-interval=daily \
--lifecycle-enforcement=true
# Run manual lifecycle check
geode lifecycle-check --policy=gdpr_customer_retention \
--dry-run=true \
--output=lifecycle-report.json
Data Stewardship
Assign Data Stewards
-- Create stewardship assignments
CREATE (:DataSteward {
id: 'alice@example.com',
name: 'Alice Johnson',
title: 'Senior Data Steward',
department: 'Data Governance',
responsibilities: ['Customer Data', 'Product Data']
});
-- Assign stewards to data
MATCH (steward:DataSteward {id: 'alice@example.com'})
MATCH (entry:CatalogEntry)
WHERE 'customer' IN entry.tags
CREATE (steward)-[:RESPONSIBLE_FOR]->(entry);
-- Find steward for specific data
MATCH (steward:DataSteward)-[:RESPONSIBLE_FOR]->(entry:CatalogEntry {id: 'customers_table'})
RETURN steward.name, steward.id;
Compliance Reporting
Generate Compliance Reports
# Generate GDPR compliance report
geode governance-report --framework=gdpr \
--include=data-inventory,lineage,access-log \
--start-date=2025-01-01 \
--end-date=2025-12-31 \
--output=gdpr-compliance-2025.pdf
# Generate data quality report
geode governance-report --type=data-quality \
--metrics=completeness,accuracy,consistency \
--output=data-quality-report.json
# Generate access report
geode governance-report --type=access-audit \
--user=[email protected] \
--include-lineage=true \
--output=access-audit.json
Compliance Dashboards
-- Data quality dashboard metrics
MATCH (m:DataQualityMetric)
WHERE m.timestamp > current_timestamp() - duration('P1D')
RETURN m.name, avg(m.completeness_pct) AS avg_completeness
ORDER BY m.name;
-- Policy compliance metrics
MATCH (entry:CatalogEntry)-[:GOVERNED_BY]->(policy)
MATCH (violation:PolicyViolation)-[:VIOLATED]->(policy)
RETURN policy.name,
count(DISTINCT entry) AS governed_assets,
count(violation) AS violations,
100.0 * (1 - count(violation)::float / count(DISTINCT entry)) AS compliance_pct;
Data Discovery
Self-Service Discovery
Enable users to discover data:
-- Search catalog by keyword
MATCH (entry:CatalogEntry)
WHERE entry.name CONTAINS $keyword
OR entry.description CONTAINS $keyword
OR ANY(tag IN entry.tags WHERE tag CONTAINS $keyword)
RETURN entry.name, entry.description, entry.tags, entry.owner;
-- Browse by classification
MATCH (entry:CatalogEntry)-[:CLASSIFIED_AS]->(level:ClassificationLevel)
WHERE level.name = $classification
RETURN entry.name, entry.description;
-- Find related datasets
MATCH (entry:CatalogEntry {id: $dataset_id})-[:RELATED_TO*1..2]-(related:CatalogEntry)
RETURN DISTINCT related.name, related.description;
Best Practices
- Establish Clear Ownership: Assign data stewards for all critical data assets
- Document Everything: Maintain comprehensive metadata for all data
- Automate Quality Checks: Run automated data quality checks regularly
- Track Lineage: Enable automatic lineage tracking for all data flows
- Classify Appropriately: Classify all data by sensitivity level
- Enforce Policies: Use automated policy enforcement, not just documentation
- Regular Audits: Conduct periodic governance audits
- User Education: Train users on governance policies and tools
- Measure Effectiveness: Track governance metrics and KPIs
- Continuous Improvement: Regularly review and update governance policies
Related Topics
- Compliance - Regulatory compliance frameworks
- Audit Logging - Comprehensive audit trails
- Row-Level Security - Fine-grained access control
- Data Integrity - Data consistency and validation
- Encryption - Data protection with encryption
- Configuration - Governance configuration settings
- Schema - Schema design and constraints
- Authorization - Permission management