Backup and Recovery

<h2 id="backup-and-recovery-in-geode" class="position-relative d-flex align-items-center group"> Backup and Recovery in Geode <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-and-recovery-in-geode" aria-haspopup="dialog" aria-label="Share link: Backup and Recovery in Geode"> Share link </button> </h2><div id="headingShareModal" class="heading-share-modal" role="dialog" aria-modal="true" aria-labelledby="headingShareTitle" hidden> <div class="hsm-dialog" role="document"> <div class="hsm-header"> <h2 id="headingShareTitle" class="h6 mb-0 fw-bold">Share this section</h2> <button type="button" class="hsm-close" aria-label="Close"> </button> </div> <div class="hsm-body"> <label for="headingShareInput" class="form-label small text-muted mb-1 text-uppercase fw-bold" style="font-size: 0.7rem; letter-spacing: 0.5px;">Permalink</label> <div class="input-group mb-4 hsm-url-group"> <input id="headingShareInput" type="text" class="form-control font-monospace" readonly aria-readonly="true" style="font-size: 0.85rem;" /> <button class="btn btn-primary hsm-copy" type="button" aria-label="Copy" title="Copy"> </button> </div> <div class="small fw-bold mb-2 text-muted text-uppercase" style="font-size: 0.7rem; letter-spacing: 0.5px;">Share via</div> <div class="hsm-share-grid"> <a id="share-twitter" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Twitter </a> <a id="share-linkedin" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> LinkedIn </a> <a id="share-facebook" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Facebook </a> </div> </div> </div> </div> <style> .heading-share-modal { position: fixed; inset: 0; display: flex; justify-content: center; align-items: center; background: rgba(0, 0, 0, 0.6); z-index: 1050; padding: 1rem; backdrop-filter: blur(4px); -webkit-backdrop-filter: blur(4px); } .heading-share-modal[hidden] { display: none !important; } .hsm-dialog { max-width: 420px; width: 100%; background: var(--bs-body-bg, #fff); color: var(--bs-body-color, #212529); border: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); border-radius: 1rem; box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25); overflow: hidden; animation: hsm-fade-in 0.2s ease-out; } @keyframes hsm-fade-in { from { opacity: 0; transform: scale(0.95); } to { opacity: 1; transform: scale(1); } } [data-bs-theme="dark"] .hsm-dialog { background: #1e293b; border-color: rgba(255,255,255,0.1); color: #f8f9fa; } .hsm-header { display: flex; justify-content: space-between; align-items: center; padding: 1rem 1.5rem; border-bottom: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); background: rgba(0,0,0,0.02); } [data-bs-theme="dark"] .hsm-header { background: rgba(255,255,255,0.02); border-color: rgba(255,255,255,0.1); } .hsm-close { background: transparent; border: none; color: inherit; opacity: 0.5; padding: 0.25rem 0.5rem; border-radius: 0.25rem; font-size: 1.2rem; line-height: 1; transition: opacity 0.2s; } .hsm-close:hover { opacity: 1; } .hsm-body { padding: 1.5rem; } .hsm-url-group { display: flex !important; align-items: stretch; } .hsm-url-group .form-control { flex: 1; min-width: 0; margin: 0; background: var(--bs-secondary-bg, #f8f9fa); border-color: var(--bs-border-color, #dee2e6); border-top-right-radius: 0; border-bottom-right-radius: 0; height: 42px; } .hsm-url-group .btn { flex: 0 0 auto; margin: 0; margin-left: -1px; border-top-left-radius: 0; border-bottom-left-radius: 0; height: 42px; display: flex; align-items: center; justify-content: center; padding: 0 1.25rem; z-index: 2; } [data-bs-theme="dark"] .hsm-url-group .form-control { background: #0f172a; border-color: #334155; color: #e2e8f0; } .hsm-share-grid { display: flex; flex-direction: column; gap: 0.5rem; } .hsm-share-grid .btn { display: flex; align-items: center; justify-content: center; font-size: 0.9rem; padding: 0.6rem; border-color: var(--bs-border-color); width: 100%; } [data-bs-theme="dark"] .hsm-share-grid .btn { color: #e2e8f0; border-color: #475569; } [data-bs-theme="dark"] .hsm-share-grid .btn:hover { background: #334155; border-color: #cbd5e1; } </style> <script> (function(){ const modal = document.getElementById('headingShareModal'); if(!modal) return; const input = modal.querySelector('#headingShareInput'); const copyBtn = modal.querySelector('.hsm-copy'); const twitter = modal.querySelector('#share-twitter'); const linkedin = modal.querySelector('#share-linkedin'); const facebook = modal.querySelector('#share-facebook'); const closeBtn = modal.querySelector('.hsm-close'); let lastFocus=null; let trapBound=false; function buildUrl(id){ return window.location.origin + window.location.pathname + '#' + id; } function isOpen(){ return !modal.hasAttribute('hidden'); } function hydrate(id){ const url=buildUrl(id); input.value=url; const enc=encodeURIComponent(url); const text=encodeURIComponent(document.title); if(twitter) twitter.href=`https://twitter.com/intent/tweet?url=${enc}&text=${text}`; if(linkedin) linkedin.href=`https://www.linkedin.com/sharing/share-offsite/?url=${enc}`; if(facebook) facebook.href=`https://www.facebook.com/sharer/sharer.php?u=${enc}`; } function openModal(id){ lastFocus=document.activeElement; hydrate(id); if(!isOpen()){ modal.removeAttribute('hidden'); } requestAnimationFrame(()=>{ input.focus(); }); trapFocus(); } function closeModal(){ if(!isOpen()) return; modal.setAttribute('hidden',''); if(lastFocus && typeof lastFocus.focus==='function') lastFocus.focus(); } function copyCurrent(){ try{ navigator.clipboard.writeText(input.value).then(()=>feedback(true),()=>fallback()); } catch(e){ fallback(); } } function fallback(){ input.select(); try{ document.execCommand('copy'); feedback(true);}catch(e){ feedback(false);} } function feedback(ok){ if(!copyBtn) return; const icon=copyBtn.querySelector('i'); if(!icon) return; const prev=copyBtn.getAttribute('data-prev')||icon.className; if(!copyBtn.getAttribute('data-prev')) copyBtn.setAttribute('data-prev',prev); icon.className= ok ? 'fa-duotone fa-clipboard-check':'fa-duotone fa-circle-exclamation'; setTimeout(()=>{ icon.className=prev; },1800); } function handleShareClick(e){ e.preventDefault(); const btn=e.currentTarget; const id=btn.getAttribute('data-share-target'); if(id) openModal(id); } function bindShareButtons(){ document.querySelectorAll('.h-share').forEach(btn=>{ if(!btn.dataset.hShareBound){ btn.addEventListener('click', handleShareClick); btn.dataset.hShareBound='1'; } }); } bindShareButtons(); if(document.readyState==='loading'){ document.addEventListener('DOMContentLoaded', bindShareButtons); } else { requestAnimationFrame(bindShareButtons); } document.addEventListener('click', function(e){ const shareBtn=e.target.closest && e.target.closest('.h-share'); if(shareBtn && !shareBtn.dataset.hShareBound){ handleShareClick.call(shareBtn, e); } }, true); document.addEventListener('click', e=>{ if(e.target===modal) closeModal(); if(e.target.closest && e.target.closest('.hsm-close')){ e.preventDefault(); closeModal(); } if(copyBtn && (e.target===copyBtn || (e.target.closest && e.target.closest('.hsm-copy')))) { e.preventDefault(); copyCurrent(); } }); document.addEventListener('keydown', e=>{ if(e.key==='Escape' && isOpen()) closeModal(); }); function trapFocus(){ if(trapBound) return; trapBound=true; modal.addEventListener('keydown', f=>{ if(f.key==='Tab' && isOpen()){ const focusable=[...modal.querySelectorAll('a[href],button,input,textarea,select,[tabindex]:not([tabindex="-1"])')].filter(el=>!el.hasAttribute('disabled')); if(!focusable.length) return; const first=focusable[0]; const last=focusable[focusable.length-1]; if(f.shiftKey && document.activeElement===first){ f.preventDefault(); last.focus(); } else if(!f.shiftKey && document.activeElement===last){ f.preventDefault(); first.focus(); } } }); } if(closeBtn) closeBtn.addEventListener('click', e=>{ e.preventDefault(); closeModal(); }); })(); </script>Data is the lifeblood of modern applications. Robust backup and recovery capabilities protect against hardware failures, software bugs, human errors, and disasters. Geode provides comprehensive backup and recovery features including continuous archiving, point-in-time recovery, and automated disaster recovery procedures. This guide covers backup strategies, recovery procedures, and best practices for ensuring data resilience in Geode deployments. <h3 id="understanding-backup-and-recovery" class="position-relative d-flex align-items-center group"> Understanding Backup and Recovery <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="understanding-backup-and-recovery" aria-haspopup="dialog" aria-label="Share link: Understanding Backup and Recovery"> Share link </button> </h3> <h4 id="recovery-objectives" class="position-relative d-flex align-items-center group"> Recovery Objectives <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="recovery-objectives" aria-haspopup="dialog" aria-label="Share link: Recovery Objectives"> Share link </button> </h4>RPO (Recovery Point Objective): Maximum acceptable data loss <ul> <li>RPO = 0: No data loss (synchronous replication)</li> <li>RPO = 1 minute: Lose at most 1 minute of data</li> <li>RPO = 1 hour: Daily backups acceptable for non-critical data</li> </ul> RTO (Recovery Time Objective): Maximum acceptable downtime <ul> <li>RTO = seconds: Requires hot standby with automatic failover</li> <li>RTO = minutes: Requires warm standby or fast restore</li> <li>RTO = hours: Cold backup restoration acceptable</li> </ul> <h4 id="backup-types" class="position-relative d-flex align-items-center group"> Backup Types <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-types" aria-haspopup="dialog" aria-label="Share link: Backup Types"> Share link </button> </h4>Full Backup: Complete copy of all data <ul> <li>Largest storage requirement</li> <li>Fastest restore (single operation)</li> <li>Slowest to create</li> </ul> Incremental Backup: Changes since last backup <ul> <li>Smallest storage requirement</li> <li>Requires full + all incrementals to restore</li> <li>Fastest to create</li> </ul> Differential Backup: Changes since last full backup <ul> <li>Medium storage requirement</li> <li>Requires full + latest differential to restore</li> <li>Medium creation speed</li> </ul> Continuous Archiving (WAL): Stream write-ahead log <ul> <li>Enables point-in-time recovery</li> <li>Minimal additional storage overhead</li> <li>Near-zero RPO</li> </ul> <h3 id="backup-configuration" class="position-relative d-flex align-items-center group"> Backup Configuration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-configuration" aria-haspopup="dialog" aria-label="Share link: Backup Configuration"> Share link </button> </h3> <h4 id="full-database-backup" class="position-relative d-flex align-items-center group"> Full Database Backup <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="full-database-backup" aria-haspopup="dialog" aria-label="Share link: Full Database Backup"> Share link </button> </h4>Create a complete backup of the database: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Create full backup ./geode backup create \ --output /backups/geode/full-$(date +%Y%m%d) \ --type full \ --compress gzip \ --parallel 4 # Backup with encryption ./geode backup create \ --output /backups/geode/full-$(date +%Y%m%d) \ --type full \ --encrypt \ --encryption-key-file /etc/geode/backup.key </code></pre></div>Configuration: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml"># geode.toml - Backup configuration [backup] enabled = true directory = "/var/lib/geode/backups" [backup.full] # Schedule full backups schedule = "0 2 * * 0" # Weekly at 2 AM Sunday retention_days = 30 compression = "gzip" parallel_workers = 4 [backup.incremental] # Schedule incremental backups schedule = "0 2 * * 1-6" # Daily at 2 AM Mon-Sat retention_days = 7 base_backup = "latest_full" </code></pre></div> <h4 id="incremental-backups" class="position-relative d-flex align-items-center group"> Incremental Backups <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="incremental-backups" aria-haspopup="dialog" aria-label="Share link: Incremental Backups"> Share link </button> </h4>Capture only changes since the last backup: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Create incremental backup ./geode backup create \ --output /backups/geode/incr-$(date +%Y%m%d-%H%M) \ --type incremental \ --base-backup /backups/geode/full-20260125 # List backup chain ./geode backup list --chain /backups/geode/full-20260125 </code></pre></div>Output: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">Backup Chain for full-20260125: 1. full-20260125 (12.4 GB, 2026-01-25 02:00:00) 2. incr-20260126-0200 (234 MB, 2026-01-26 02:00:00) 3. incr-20260127-0200 (189 MB, 2026-01-27 02:00:00) 4. incr-20260128-0200 (312 MB, 2026-01-28 02:00:00) Total size: 13.1 GB Restore point: 2026-01-28 02:00:00 </code></pre></div> <h4 id="wal-archiving-for-point-in-time-recovery" class="position-relative d-flex align-items-center group"> WAL Archiving for Point-in-Time Recovery <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="wal-archiving-for-point-in-time-recovery" aria-haspopup="dialog" aria-label="Share link: WAL Archiving for Point-in-Time Recovery"> Share link </button> </h4>Enable continuous WAL archiving for near-zero RPO: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml"># geode.toml - WAL archiving [wal] enabled = true directory = "/var/lib/geode/wal" max_size_mb = 1024 sync_mode = "fsync" [wal.archiving] enabled = true destination = "/backups/geode/wal" compression = "lz4" # Ship WAL to remote storage [wal.archiving.remote] enabled = true type = "s3" bucket = "geode-wal-archive" prefix = "production/wal" upload_interval_seconds = 60 </code></pre></div>Monitor WAL archiving: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-gql" data-lang="gql">-- Check WAL archiving status SELECT current_wal_file, last_archived_file, archive_lag_bytes, archive_lag_seconds, failed_archive_count FROM system.wal_archiving_status; </code></pre></div> <h4 id="cloud-storage-integration" class="position-relative d-flex align-items-center group"> Cloud Storage Integration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="cloud-storage-integration" aria-haspopup="dialog" aria-label="Share link: Cloud Storage Integration"> Share link </button> </h4>Amazon S3: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml">[backup.storage] type = "s3" bucket = "mycompany-geode-backups" prefix = "production" region = "us-east-1" storage_class = "STANDARD_IA" [backup.storage.credentials] # Use IAM role or explicit credentials use_instance_role = true # Or: # access_key_id = "..." # secret_access_key = "..." </code></pre></div>Google Cloud Storage: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml">[backup.storage] type = "gcs" bucket = "mycompany-geode-backups" prefix = "production" credentials_file = "/etc/geode/gcs-credentials.json" </code></pre></div>Azure Blob Storage: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml">[backup.storage] type = "azure" container = "geode-backups" prefix = "production" account = "mycompanybackups" # Uses managed identity or connection string </code></pre></div> <h3 id="recovery-procedures" class="position-relative d-flex align-items-center group"> Recovery Procedures <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="recovery-procedures" aria-haspopup="dialog" aria-label="Share link: Recovery Procedures"> Share link </button> </h3> <h4 id="full-restore" class="position-relative d-flex align-items-center group"> Full Restore <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="full-restore" aria-haspopup="dialog" aria-label="Share link: Full Restore"> Share link </button> </h4>Restore from a full backup: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Stop Geode server systemctl stop geode # Clear existing data (if any) rm -rf /var/lib/geode/data/* # Restore from backup ./geode backup restore \ --input /backups/geode/full-20260125 \ --data-dir /var/lib/geode/data \ --parallel 4 # Start Geode server systemctl start geode # Verify restoration ./geode shell -c "MATCH (n) RETURN count(n) as node_count" </code></pre></div> <h4 id="incremental-restore" class="position-relative d-flex align-items-center group"> Incremental Restore <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="incremental-restore" aria-haspopup="dialog" aria-label="Share link: Incremental Restore"> Share link </button> </h4>Restore full backup plus incrementals: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Restore full backup first ./geode backup restore \ --input /backups/geode/full-20260125 \ --data-dir /var/lib/geode/data # Apply incremental backups in order ./geode backup restore \ --input /backups/geode/incr-20260126-0200 \ --data-dir /var/lib/geode/data \ --incremental ./geode backup restore \ --input /backups/geode/incr-20260127-0200 \ --data-dir /var/lib/geode/data \ --incremental # Or restore entire chain automatically ./geode backup restore \ --input /backups/geode/incr-20260127-0200 \ --data-dir /var/lib/geode/data \ --restore-chain </code></pre></div> <h4 id="point-in-time-recovery-pitr" class="position-relative d-flex align-items-center group"> Point-in-Time Recovery (PITR) <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="point-in-time-recovery-pitr" aria-haspopup="dialog" aria-label="Share link: Point-in-Time Recovery (PITR)"> Share link </button> </h4>Recover to a specific moment in time: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Restore to specific timestamp ./geode backup restore \ --input /backups/geode/full-20260125 \ --wal-archive /backups/geode/wal \ --data-dir /var/lib/geode/data \ --target-time "2026-01-27 14:30:00 UTC" # Restore to specific transaction ./geode backup restore \ --input /backups/geode/full-20260125 \ --wal-archive /backups/geode/wal \ --data-dir /var/lib/geode/data \ --target-xid 12847293 # Restore to named recovery point ./geode backup restore \ --input /backups/geode/full-20260125 \ --wal-archive /backups/geode/wal \ --data-dir /var/lib/geode/data \ --target-name "pre-migration-checkpoint" </code></pre></div>Create named recovery points: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-gql" data-lang="gql">-- Create a named checkpoint for easy PITR targeting CREATE CHECKPOINT 'pre-migration-checkpoint'; -- List available checkpoints SELECT name, timestamp, wal_position FROM system.checkpoints ORDER BY timestamp DESC; </code></pre></div> <h4 id="selective-recovery" class="position-relative d-flex align-items-center group"> Selective Recovery <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="selective-recovery" aria-haspopup="dialog" aria-label="Share link: Selective Recovery"> Share link </button> </h4>Restore specific graphs or data: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Restore single graph ./geode backup restore \ --input /backups/geode/full-20260125 \ --data-dir /var/lib/geode/data \ --include-graphs "social_network,analytics" # Exclude specific graphs ./geode backup restore \ --input /backups/geode/full-20260125 \ --data-dir /var/lib/geode/data \ --exclude-graphs "temp_data,staging" </code></pre></div> <h3 id="disaster-recovery" class="position-relative d-flex align-items-center group"> Disaster Recovery <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="disaster-recovery" aria-haspopup="dialog" aria-label="Share link: Disaster Recovery"> Share link </button> </h3> <h4 id="dr-architecture" class="position-relative d-flex align-items-center group"> DR Architecture <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dr-architecture" aria-haspopup="dialog" aria-label="Share link: DR Architecture"> Share link </button> </h4>Design for disaster resilience: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">Primary Site (us-east-1) DR Site (us-west-2) ┌─────────────────────┐ ┌─────────────────────┐ │ Geode Cluster │ │ Geode Standby │ │ (3 nodes) │──WAL──────│ (3 nodes) │ │ │ Stream │ (read-only) │ └─────────────────────┘ └─────────────────────┘ │ │ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ │ S3 Backup Bucket │───Repl────│ S3 Backup Bucket │ │ (us-east-1) │ │ (us-west-2) │ └─────────────────────┘ └─────────────────────┘ </code></pre></div>Configuration for DR standby: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml"># geode.toml - DR standby site [server] mode = "standby" read_only = true [replication.streaming] enabled = true primary_host = "primary.geode.internal" primary_port = 7687 [replication.wal] # Receive WAL from primary receive_directory = "/var/lib/geode/wal_receive" apply_delay_seconds = 0 # Real-time, or delay for protection [failover] # Manual failover only for DR auto_promote = false promotion_command = "/etc/geode/promote-to-primary.sh" </code></pre></div> <h4 id="dr-failover-procedure" class="position-relative d-flex align-items-center group"> DR Failover Procedure <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dr-failover-procedure" aria-haspopup="dialog" aria-label="Share link: DR Failover Procedure"> Share link </button> </h4>Planned Failover (maintenance): <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># On primary site # 1. Stop accepting writes ./geode admin set-read-only true # 2. Wait for replication to catch up ./geode admin wait-for-sync --target dr-site # 3. Create final checkpoint ./geode admin checkpoint --name "failover-$(date +%Y%m%d)" # On DR site # 4. Promote standby to primary ./geode admin promote --accept-writes # 5. Update DNS/load balancer # 6. Verify applications reconnect </code></pre></div>Unplanned Failover (disaster): <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># On DR site # 1. Check last received WAL position ./geode admin wal-status # 2. Decide on data loss acceptance # 3. Promote to primary ./geode admin promote --force --accept-data-loss # 4. Update DNS/load balancer # 5. Notify stakeholders of potential data loss # 6. Begin incident response </code></pre></div> <h4 id="dr-testing" class="position-relative d-flex align-items-center group"> DR Testing <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dr-testing" aria-haspopup="dialog" aria-label="Share link: DR Testing"> Share link </button> </h4>Regular DR drills ensure readiness: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">#!/usr/bin/env python3 """Automated DR test script""" import subprocess import time from datetime import datetime def run_dr_test(): """Execute DR failover test""" print(f"Starting DR test at {datetime.now()}") # 1. Record current state primary_count = get_node_count("primary.geode.internal") print(f"Primary node count: {primary_count}") # 2. Create test data create_test_nodes("primary.geode.internal", 1000) # 3. Wait for replication time.sleep(30) dr_count = get_node_count("dr.geode.internal") print(f"DR node count: {dr_count}") # 4. Simulate primary failure print("Simulating primary failure...") subprocess.run(["ssh", "primary", "systemctl", "stop", "geode"]) # 5. Promote DR print("Promoting DR site...") subprocess.run(["ssh", "dr", "./geode", "admin", "promote"]) # 6. Verify DR is operational time.sleep(10) dr_count_after = get_node_count("dr.geode.internal") print(f"DR node count after promotion: {dr_count_after}") # 7. Run test queries verify_queries("dr.geode.internal") # 8. Restore primary (cleanup) print("Restoring primary...") subprocess.run(["ssh", "primary", "systemctl", "start", "geode"]) subprocess.run(["ssh", "dr", "./geode", "admin", "demote"]) print(f"DR test completed at {datetime.now()}") def get_node_count(host): result = subprocess.run( ["./geode", "shell", "--host", host, "-c", "MATCH (n) RETURN count(n) as cnt"], capture_output=True, text=True ) # Parse count from output return int(result.stdout.strip().split()[-1]) def create_test_nodes(host, count): subprocess.run([ "./geode", "shell", "--host", host, "-c", f"UNWIND range(1, {count}) AS i CREATE (:DRTest {{id: i, ts: datetime()}})" ]) def verify_queries(host): queries = [ "MATCH (n:DRTest) RETURN count(n)", "MATCH (n) RETURN labels(n), count(*) GROUP BY labels(n)", ] for query in queries: subprocess.run(["./geode", "shell", "--host", host, "-c", query]) if __name__ == "__main__": run_dr_test() </code></pre></div> <h3 id="backup-monitoring" class="position-relative d-flex align-items-center group"> Backup Monitoring <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-monitoring" aria-haspopup="dialog" aria-label="Share link: Backup Monitoring"> Share link </button> </h3> <h4 id="backup-status-metrics" class="position-relative d-flex align-items-center group"> Backup Status Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-status-metrics" aria-haspopup="dialog" aria-label="Share link: Backup Status Metrics"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Prometheus metrics curl http://localhost:3141/metrics | grep backup # Example metrics geode_backup_last_full_timestamp 1706140800 geode_backup_last_full_duration_seconds 847 geode_backup_last_full_size_bytes 13421772800 geode_backup_last_incremental_timestamp 1706227200 geode_backup_last_incremental_size_bytes 245366784 geode_backup_wal_archive_lag_bytes 0 geode_backup_wal_archive_lag_seconds 12 </code></pre></div> <h4 id="backup-health-checks" class="position-relative d-flex align-items-center group"> Backup Health Checks <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-health-checks" aria-haspopup="dialog" aria-label="Share link: Backup Health Checks"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-gql" data-lang="gql">-- Check backup status SELECT backup_type, last_backup_time, last_backup_size_mb, last_backup_duration_seconds, backup_count_24h, oldest_backup_time FROM system.backup_status; -- Check WAL archiving health SELECT current_wal_file, last_archived_file, archive_lag_bytes, archive_lag_seconds, archive_failures_24h FROM system.wal_archiving_status; </code></pre></div> <h4 id="alerting-rules" class="position-relative d-flex align-items-center group"> Alerting Rules <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alerting-rules" aria-haspopup="dialog" aria-label="Share link: Alerting Rules"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># Prometheus alerting rules for backups groups: - name: geode_backup_alerts rules: - alert: BackupOverdue expr: time() - geode_backup_last_full_timestamp > 86400 * 8 for: 1h labels: severity: warning annotations: summary: "Full backup overdue" description: "Last full backup was {{ $value | humanizeDuration }} ago" - alert: BackupFailed expr: geode_backup_last_status != 1 for: 5m labels: severity: critical annotations: summary: "Backup failed" - alert: WALArchiveLagHigh expr: geode_backup_wal_archive_lag_seconds > 300 for: 5m labels: severity: warning annotations: summary: "WAL archive lag is high" description: "WAL archiving is {{ $value }}s behind" - alert: WALArchiveFailed expr: increase(geode_backup_wal_archive_failures_total[1h]) > 0 for: 5m labels: severity: critical annotations: summary: "WAL archiving failed" - alert: BackupStorageLow expr: geode_backup_storage_free_bytes / geode_backup_storage_total_bytes < 0.1 for: 1h labels: severity: warning annotations: summary: "Backup storage below 10%" </code></pre></div> <h3 id="backup-validation" class="position-relative d-flex align-items-center group"> Backup Validation <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-validation" aria-haspopup="dialog" aria-label="Share link: Backup Validation"> Share link </button> </h3> <h4 id="automated-backup-testing" class="position-relative d-flex align-items-center group"> Automated Backup Testing <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="automated-backup-testing" aria-haspopup="dialog" aria-label="Share link: Automated Backup Testing"> Share link </button> </h4>Regularly verify backup integrity: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Verify backup integrity ./geode backup verify \ --input /backups/geode/full-20260125 \ --check-checksums \ --check-consistency # Test restore to temporary location ./geode backup test-restore \ --input /backups/geode/full-20260125 \ --temp-dir /tmp/geode-restore-test \ --run-queries "MATCH (n) RETURN count(n)" </code></pre></div>Automated validation script: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">#!/bin/bash # /etc/geode/scripts/validate-backup.sh BACKUP_DIR="/backups/geode" TEST_DIR="/tmp/geode-backup-test" ALERT_EMAIL="[email protected]" # Find latest full backup LATEST_FULL=$(ls -t $BACKUP_DIR/full-* | head -1) echo "Validating backup: $LATEST_FULL" # Verify checksums if ! ./geode backup verify --input "$LATEST_FULL" --check-checksums; then echo "CRITICAL: Backup checksum verification failed!" | mail -s "Backup Alert" $ALERT_EMAIL exit 1 fi # Test restore rm -rf $TEST_DIR mkdir -p $TEST_DIR if ! ./geode backup restore --input "$LATEST_FULL" --data-dir "$TEST_DIR" 2>/dev/null; then echo "CRITICAL: Backup restore test failed!" | mail -s "Backup Alert" $ALERT_EMAIL exit 1 fi # Start temporary instance and verify ./geode serve --data-dir "$TEST_DIR" --listen 127.0.0.1:13141 & TEMP_PID=$! sleep 10 # Run verification queries NODE_COUNT=$(./geode shell --host 127.0.0.1:13141 -c "MATCH (n) RETURN count(n)" 2>/dev/null) # Cleanup kill $TEMP_PID 2>/dev/null rm -rf $TEST_DIR echo "Backup validation successful. Node count: $NODE_COUNT" </code></pre></div> <h4 id="retention-management" class="position-relative d-flex align-items-center group"> Retention Management <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="retention-management" aria-haspopup="dialog" aria-label="Share link: Retention Management"> Share link </button> </h4>Configure backup retention policies: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml">[backup.retention] # Keep full backups for 30 days full_retention_days = 30 full_min_count = 4 # Keep at least 4 full backups # Keep incremental backups for 7 days incremental_retention_days = 7 # Keep WAL archives for 14 days wal_retention_days = 14 # Automatic cleanup auto_cleanup = true cleanup_schedule = "0 4 * * *" # 4 AM daily </code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Manual cleanup ./geode backup cleanup \ --older-than 30d \ --keep-min 4 \ --dry-run # Preview what will be deleted # Actually delete ./geode backup cleanup \ --older-than 30d \ --keep-min 4 </code></pre></div> <h3 id="best-practices" class="position-relative d-flex align-items-center group"> Best Practices <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="best-practices" aria-haspopup="dialog" aria-label="Share link: Best Practices"> Share link </button> </h3> <h4 id="backup-strategy" class="position-relative d-flex align-items-center group"> Backup Strategy <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="backup-strategy" aria-haspopup="dialog" aria-label="Share link: Backup Strategy"> Share link </button> </h4><ol> <li>Follow 3-2-1 rule: 3 copies, 2 different media, 1 offsite</li> <li>Automate backups: Never rely on manual processes</li> <li>Encrypt sensitive data: Protect backups at rest</li> <li>Test restores regularly: Untested backups may not work</li> <li>Document procedures: Clear runbooks for emergencies</li> </ol> <h4 id="recovery-planning" class="position-relative d-flex align-items-center group"> Recovery Planning <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="recovery-planning" aria-haspopup="dialog" aria-label="Share link: Recovery Planning"> Share link </button> </h4><ol> <li>Define RTO/RPO: Based on business requirements</li> <li>Choose appropriate strategy: Balance cost vs. recovery speed</li> <li>Plan for different scenarios: Hardware failure, data corruption, disaster</li> <li>Train team members: Everyone should know recovery procedures</li> <li>Conduct regular drills: Practice makes perfect</li> </ol> <h4 id="monitoring-and-alerting" class="position-relative d-flex align-items-center group"> Monitoring and Alerting <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="monitoring-and-alerting" aria-haspopup="dialog" aria-label="Share link: Monitoring and Alerting"> Share link </button> </h4><ol> <li>Monitor backup success: Alert on failures immediately</li> <li>Track backup duration: Detect performance degradation</li> <li>Monitor storage capacity: Plan ahead for growth</li> <li>Check WAL archiving: Ensure continuous protection</li> <li>Validate backups: Automated integrity checks</li> </ol> <h4 id="security" class="position-relative d-flex align-items-center group"> Security <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="security" aria-haspopup="dialog" aria-label="Share link: Security"> Share link </button> </h4><ol> <li>Encrypt backups: Use strong encryption (AES-256)</li> <li>Secure backup storage: Restrict access to backup locations</li> <li>Audit backup access: Log who accesses backups</li> <li>Rotate encryption keys: Regular key rotation policy</li> <li>Test decryption: Verify you can decrypt backups</li> </ol> <h3 id="related-topics" class="position-relative d-flex align-items-center group"> Related Topics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="related-topics" aria-haspopup="dialog" aria-label="Share link: Related Topics"> Share link </button> </h3><ul> <li><a href="/tags/high-availability/" >High Availability</a> - HA configuration and failover</li> <li><a href="/tags/backup/" >Backup</a> - Backup creation and management</li> <li><a href="/tags/replication/" >Replication</a> - Data replication strategies</li> <li><a href="/tags/distributed-systems/" >Distributed Systems</a> - Distributed architecture</li> <li><a href="/tags/operations/" >Operations</a> - Operational procedures</li> <li><a href="/tags/security/" >Security</a> - Data protection and encryption</li> </ul> <h3 id="further-reading" class="position-relative d-flex align-items-center group"> Further Reading <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="further-reading" aria-haspopup="dialog" aria-label="Share link: Further Reading"> Share link </button> </h3><ul> <li>Backup and Recovery Best Practices Guide</li> <li>Disaster Recovery Planning Handbook</li> <li>Point-in-Time Recovery Tutorial</li> <li>Backup Encryption Guide</li> <li>DR Testing Procedures</li> <li>Compliance and Backup Retention</li> </ul>

Popular

Related Articles

Backup Procedures

Database Recovery