Migration Guide
This guide covers migrating data to Geode from other graph databases, relational databases, and between Geode environments.
Overview
Geode supports multiple migration scenarios:
| Source | Method | Complexity | Downtime |
|---|---|---|---|
| Neo4j | Export/Import | Medium | Minutes-hours |
| JanusGraph | Gremlin export | Medium | Minutes-hours |
| PostgreSQL | ETL pipeline | Medium | Varies |
| MongoDB | ETL pipeline | Medium | Varies |
| CSV/JSON | Bulk import | Low | Minutes |
| Another Geode | Replication/Backup | Low | Zero-minutes |
Migration from Neo4j
Compatibility Overview
Neo4j uses Cypher; Geode uses ISO GQL. Most basic Cypher is compatible:
| Cypher Feature | GQL Support | Notes |
|---|---|---|
| MATCH patterns | Full | Identical syntax |
| CREATE nodes | Full | Identical syntax |
| CREATE relationships | Full | Identical syntax |
| WHERE clauses | Full | Identical syntax |
| RETURN | Full | Identical syntax |
| MERGE | Partial | Use INSERT…ON CONFLICT |
| APOC procedures | None | Use built-in functions |
| FOREACH | Partial | Standard loops |
Export from Neo4j
Method 1: CSV Export
// Export nodes to CSV
CALL apoc.export.csv.query(
"MATCH (n) RETURN n.id, labels(n)[0] AS label, properties(n) AS props",
"nodes.csv",
{}
);
// Export relationships to CSV
CALL apoc.export.csv.query(
"MATCH (a)-[r]->(b) RETURN id(a) AS source, id(b) AS target, type(r) AS type, properties(r) AS props",
"relationships.csv",
{}
);
Method 2: JSON Export
// Export entire database to JSON
CALL apoc.export.json.all("export.json", {useTypes: true});
Method 3: Cypher Export
// Generate Cypher statements
CALL apoc.export.cypher.all("export.cypher", {
format: "cypher-shell",
useOptimizations: {type: "UNWIND_BATCH", unwindBatchSize: 1000}
});
Import to Geode
From CSV:
#!/bin/bash
# import-neo4j-csv.sh
# Import nodes
geode import csv \
--file nodes.csv \
--format auto \
--batch-size 1000 \
--mapping '{"id": "id", "label": "_label", "props": "_properties"}'
# Import relationships
geode import csv \
--file relationships.csv \
--format auto \
--batch-size 1000 \
--relationship-mode true \
--mapping '{"source": "_source", "target": "_target", "type": "_type", "props": "_properties"}'
From JSON:
geode import json \
--file export.json \
--format neo4j \
--batch-size 1000
From Cypher (with conversion):
# Convert Cypher to GQL (if needed)
geode migrate cypher-to-gql \
--input export.cypher \
--output import.gql
# Execute import
geode execute-file --file import.gql --batch-size 100
Query Migration
Convert common Cypher patterns to GQL:
// Neo4j: MERGE (create if not exists)
MERGE (p:Person {email: '[email protected]'})
ON CREATE SET p.name = 'Alice', p.created = datetime()
ON MATCH SET p.updated = datetime()
// Geode GQL equivalent
INSERT (:Person {email: '[email protected]', name: 'Alice', created: datetime()})
ON CONFLICT (email) DO UPDATE SET updated = datetime()
// Neo4j: APOC functions
CALL apoc.create.uuid() YIELD uuid
// Geode GQL equivalent
SELECT gen_random_uuid() AS uuid
// Neo4j: FOREACH
MATCH (p:Person)
FOREACH (i IN range(1, 10) | CREATE (p)-[:HAS_ITEM]->(:Item {num: i}))
// Geode GQL equivalent
MATCH (p:Person)
UNWIND range(1, 10) AS i
CREATE (p)-[:HAS_ITEM]->(:Item {num: i})
Migration from JanusGraph
Export from JanusGraph
// Export using Gremlin
g.V().valueMap(true).toList().collect { v ->
[id: v[T.id], label: v[T.label], properties: v.findAll { k, _ -> k != T.id && k != T.label }]
}.toJson()
g.E().valueMap(true).toList().collect { e ->
[id: e[T.id], label: e[T.label], source: e[Direction.OUT].id(), target: e[Direction.IN].id(), properties: e.findAll { k, _ -> !(k in [T.id, T.label, Direction.OUT, Direction.IN]) }]
}.toJson()
Convert Gremlin to GQL
| Gremlin | GQL |
|---|---|
g.addV('Person').property('name', 'Alice') | CREATE (:Person {name: 'Alice'}) |
g.V().hasLabel('Person') | MATCH (p:Person) RETURN p |
g.V().has('name', 'Alice') | MATCH (p {name: 'Alice'}) RETURN p |
g.V().outE('KNOWS').inV() | MATCH (a)-[:KNOWS]->(b) RETURN b |
g.V().both('KNOWS') | MATCH (a)-[:KNOWS]-(b) RETURN b |
g.V().out('KNOWS').out('KNOWS') | MATCH (a)-[:KNOWS*2]->(b) RETURN b |
Import Script
#!/bin/bash
# import-janusgraph.sh
# Convert Gremlin JSON export to GQL
python3 << 'EOF'
import json
# Load JanusGraph export
with open('vertices.json') as f:
vertices = json.load(f)
with open('edges.json') as f:
edges = json.load(f)
# Generate GQL statements
with open('import.gql', 'w') as out:
# Create nodes
for v in vertices:
props = ', '.join(f"{k}: {json.dumps(v['properties'].get(k, [None])[0])}"
for k in v['properties'] if v['properties'][k])
out.write(f"CREATE (:{v['label']} {{{props}}});\n")
# Create relationships
for e in edges:
props = ', '.join(f"{k}: {json.dumps(v)}"
for k, v in e.get('properties', {}).items())
out.write(f"MATCH (a), (b) WHERE id(a) = {e['source']} AND id(b) = {e['target']} "
f"CREATE (a)-[:{e['label']} {{{props}}}]->(b);\n")
EOF
# Execute import
geode execute-file --file import.gql
Migration from Relational Databases
PostgreSQL to Geode
Schema Mapping:
PostgreSQL Table -> Geode Node Label
PostgreSQL Row -> Geode Node
PostgreSQL Column -> Geode Property
PostgreSQL Foreign Key -> Geode Relationship
PostgreSQL Join Table -> Geode Relationship with properties
Export from PostgreSQL:
-- Export users
COPY (
SELECT id, name, email, created_at
FROM users
) TO '/tmp/users.csv' WITH CSV HEADER;
-- Export relationships (friendship)
COPY (
SELECT user_id AS source, friend_id AS target, created_at
FROM friendships
) TO '/tmp/friendships.csv' WITH CSV HEADER;
Import to Geode:
# Import users as Person nodes
geode import csv \
--file users.csv \
--label Person \
--id-column id \
--batch-size 1000
# Import friendships as KNOWS relationships
geode import csv \
--file friendships.csv \
--relationship KNOWS \
--source-label Person \
--source-id source \
--target-label Person \
--target-id target \
--batch-size 1000
MySQL to Geode
# Export from MySQL
mysql -u user -p database -e "
SELECT id, name, email FROM users
" | tail -n +2 > users.tsv
# Import to Geode
geode import csv \
--file users.tsv \
--delimiter '\t' \
--label User \
--id-column id
MongoDB to Geode
# Export from MongoDB
mongoexport --db mydb --collection users --out users.json --jsonArray
# Import to Geode
geode import json \
--file users.json \
--label User \
--id-field _id
Bulk Import
CSV Import
# Basic CSV import
geode import csv \
--file data.csv \
--label Person \
--batch-size 1000
# Advanced options
geode import csv \
--file data.csv \
--label Person \
--id-column uuid \
--skip-header true \
--delimiter ',' \
--quote '"' \
--escape '\\' \
--null-value 'NULL' \
--batch-size 5000 \
--parallel 4 \
--on-error skip \
--log-errors /var/log/geode/import-errors.log
JSON Import
# JSON array import
geode import json \
--file data.json \
--label Person \
--batch-size 1000
# JSONL (JSON Lines) import
geode import json \
--file data.jsonl \
--format jsonl \
--label Person \
--batch-size 1000
GQL Script Import
# Execute GQL file
geode execute-file \
--file import.gql \
--batch-size 100 \
--continue-on-error true
Environment Migration
Development to Production
#!/bin/bash
# migrate-dev-to-prod.sh
DEV_HOST="geode-dev.internal"
PROD_HOST="geode-prod.internal"
BACKUP_BUCKET="s3://geode-backups"
echo "=== Migrating Dev to Prod ==="
# 1. Create backup from dev
echo "Creating backup from dev..."
geode backup \
--host "$DEV_HOST" \
--dest "$BACKUP_BUCKET/dev-export-$(date +%Y%m%d)" \
--mode full
# 2. Optionally sanitize data
echo "Sanitizing data..."
# Remove test data, mask PII, etc.
# 3. Restore to prod (in maintenance window)
echo "Restoring to prod..."
geode restore \
--host "$PROD_HOST" \
--source "$BACKUP_BUCKET/dev-export-$(date +%Y%m%d)" \
--backup-id latest
echo "=== Migration Complete ==="
Cross-Region Migration
#!/bin/bash
# cross-region-migration.sh
SOURCE_REGION="us-east-1"
TARGET_REGION="eu-west-1"
SOURCE_HOST="geode.us-east.example.com"
TARGET_HOST="geode.eu-west.example.com"
echo "=== Cross-Region Migration ==="
# 1. Set up replication
echo "Setting up replication..."
geode admin replicate \
--source "$SOURCE_HOST" \
--target "$TARGET_HOST" \
--mode async
# 2. Wait for sync
echo "Waiting for initial sync..."
while true; do
LAG=$(geode admin replication-lag --target "$TARGET_HOST")
echo "Replication lag: $LAG bytes"
[ "$LAG" -lt 1000 ] && break
sleep 30
done
# 3. Stop writes to source
echo "Stopping writes to source..."
geode admin read-only --host "$SOURCE_HOST"
# 4. Final sync
echo "Final synchronization..."
sleep 60
# 5. Promote target
echo "Promoting target..."
geode admin promote --host "$TARGET_HOST"
# 6. Update DNS
echo "Update DNS to point to $TARGET_HOST"
echo "=== Migration Complete ==="
Data Validation
Pre-Migration Validation
-- Count nodes by label
MATCH (n)
RETURN labels(n)[0] AS label, count(n) AS count
ORDER BY label;
-- Count relationships by type
MATCH ()-[r]->()
RETURN type(r) AS type, count(r) AS count
ORDER BY type;
-- Check for orphaned relationships
MATCH (a)-[r]->(b)
WHERE a IS NULL OR b IS NULL
RETURN count(r) AS orphaned;
-- Find duplicates
MATCH (p:Person)
WITH p.email AS email, count(*) AS count
WHERE count > 1
RETURN email, count;
Post-Migration Validation
#!/bin/bash
# validate-migration.sh
SOURCE_HOST="$1"
TARGET_HOST="$2"
echo "=== Migration Validation ==="
# Compare node counts
echo "Comparing node counts..."
SOURCE_NODES=$(geode query "MATCH (n) RETURN count(n)" --host "$SOURCE_HOST" --format json | jq -r '.rows[0]["count(n)"]')
TARGET_NODES=$(geode query "MATCH (n) RETURN count(n)" --host "$TARGET_HOST" --format json | jq -r '.rows[0]["count(n)"]')
echo "Source nodes: $SOURCE_NODES"
echo "Target nodes: $TARGET_NODES"
if [ "$SOURCE_NODES" != "$TARGET_NODES" ]; then
echo "WARNING: Node count mismatch!"
fi
# Compare relationship counts
echo "Comparing relationship counts..."
SOURCE_RELS=$(geode query "MATCH ()-[r]->() RETURN count(r)" --host "$SOURCE_HOST" --format json | jq -r '.rows[0]["count(r)"]')
TARGET_RELS=$(geode query "MATCH ()-[r]->() RETURN count(r)" --host "$TARGET_HOST" --format json | jq -r '.rows[0]["count(r)"]')
echo "Source relationships: $SOURCE_RELS"
echo "Target relationships: $TARGET_RELS"
if [ "$SOURCE_RELS" != "$TARGET_RELS" ]; then
echo "WARNING: Relationship count mismatch!"
fi
# Spot check specific records
echo "Spot checking records..."
geode query "MATCH (p:Person {email: '[email protected]'}) RETURN p" --host "$TARGET_HOST"
echo "=== Validation Complete ==="
Performance Optimization
Bulk Loading Best Practices
- Disable constraints during import:
-- Temporarily disable constraints
ALTER DATABASE DISABLE CONSTRAINTS;
-- Import data
-- ...
-- Re-enable constraints
ALTER DATABASE ENABLE CONSTRAINTS;
- Use transactions for batching:
BEGIN TRANSACTION;
CREATE (:Person {id: 1, name: 'Alice'});
CREATE (:Person {id: 2, name: 'Bob'});
-- ... up to 1000 per transaction
COMMIT;
- Create indexes after import:
-- Import data first
-- Then create indexes
CREATE INDEX person_email ON Person(email);
CREATE INDEX person_name ON Person(name);
- Use parallel imports:
# Split file and import in parallel
split -l 100000 large-file.csv chunk-
ls chunk-* | parallel -j 4 'geode import csv --file {}'
Memory Optimization
# For large imports, increase memory
export GEODE_MAX_MEMORY=16GB
export GEODE_IMPORT_BUFFER_SIZE=1GB
# Use streaming import for very large files
geode import csv \
--file huge-file.csv \
--streaming true \
--batch-size 1000
Troubleshooting
Import Fails with OOM
# Reduce batch size
geode import csv --batch-size 100
# Use streaming mode
geode import csv --streaming true
# Increase memory
export GEODE_MAX_MEMORY=32GB
Duplicate Key Errors
# Skip duplicates
geode import csv --on-duplicate skip
# Update duplicates
geode import csv --on-duplicate update
# Fail on duplicates (default)
geode import csv --on-duplicate fail
Character Encoding Issues
# Specify encoding
geode import csv --encoding utf-8
# Convert file first
iconv -f ISO-8859-1 -t UTF-8 input.csv > input-utf8.csv
Slow Import Performance
# Check index status (indexes slow down writes)
geode admin index-status
# Temporarily drop indexes
geode query "DROP INDEX person_email"
# Import data
geode import csv --file data.csv
# Recreate indexes
geode query "CREATE INDEX person_email ON Person(email)"
Related Documentation
- Backup Procedures - Backup-based migration
- Upgrade Procedures - Version migration
- Disaster Recovery - DR migration
- Schema Design - Schema planning
- Migration Guide (Detailed) - Extended migration guide