Advanced Telemetry and Monitoring Guide

<h3 id="overview" class="position-relative d-flex align-items-center group"> Overview <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="overview" aria-haspopup="dialog" aria-label="Share link: Overview"> Share link </button> </h3><div id="headingShareModal" class="heading-share-modal" role="dialog" aria-modal="true" aria-labelledby="headingShareTitle" hidden> <div class="hsm-dialog" role="document"> <div class="hsm-header"> <h2 id="headingShareTitle" class="h6 mb-0 fw-bold">Share this section</h2> <button type="button" class="hsm-close" aria-label="Close"> </button> </div> <div class="hsm-body"> <label for="headingShareInput" class="form-label small text-muted mb-1 text-uppercase fw-bold" style="font-size: 0.7rem; letter-spacing: 0.5px;">Permalink</label> <div class="input-group mb-4 hsm-url-group"> <input id="headingShareInput" type="text" class="form-control font-monospace" readonly aria-readonly="true" style="font-size: 0.85rem;" /> <button class="btn btn-primary hsm-copy" type="button" aria-label="Copy" title="Copy"> </button> </div> <div class="small fw-bold mb-2 text-muted text-uppercase" style="font-size: 0.7rem; letter-spacing: 0.5px;">Share via</div> <div class="hsm-share-grid"> <a id="share-twitter" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Twitter </a> <a id="share-linkedin" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> LinkedIn </a> <a id="share-facebook" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Facebook </a> </div> </div> </div> </div> <style> .heading-share-modal { position: fixed; inset: 0; display: flex; justify-content: center; align-items: center; background: rgba(0, 0, 0, 0.6); z-index: 1050; padding: 1rem; backdrop-filter: blur(4px); -webkit-backdrop-filter: blur(4px); } .heading-share-modal[hidden] { display: none !important; } .hsm-dialog { max-width: 420px; width: 100%; background: var(--bs-body-bg, #fff); color: var(--bs-body-color, #212529); border: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); border-radius: 1rem; box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25); overflow: hidden; animation: hsm-fade-in 0.2s ease-out; } @keyframes hsm-fade-in { from { opacity: 0; transform: scale(0.95); } to { opacity: 1; transform: scale(1); } } [data-bs-theme="dark"] .hsm-dialog { background: #1e293b; border-color: rgba(255,255,255,0.1); color: #f8f9fa; } .hsm-header { display: flex; justify-content: space-between; align-items: center; padding: 1rem 1.5rem; border-bottom: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); background: rgba(0,0,0,0.02); } [data-bs-theme="dark"] .hsm-header { background: rgba(255,255,255,0.02); border-color: rgba(255,255,255,0.1); } .hsm-close { background: transparent; border: none; color: inherit; opacity: 0.5; padding: 0.25rem 0.5rem; border-radius: 0.25rem; font-size: 1.2rem; line-height: 1; transition: opacity 0.2s; } .hsm-close:hover { opacity: 1; } .hsm-body { padding: 1.5rem; } .hsm-url-group { display: flex !important; align-items: stretch; } .hsm-url-group .form-control { flex: 1; min-width: 0; margin: 0; background: var(--bs-secondary-bg, #f8f9fa); border-color: var(--bs-border-color, #dee2e6); border-top-right-radius: 0; border-bottom-right-radius: 0; height: 42px; } .hsm-url-group .btn { flex: 0 0 auto; margin: 0; margin-left: -1px; border-top-left-radius: 0; border-bottom-left-radius: 0; height: 42px; display: flex; align-items: center; justify-content: center; padding: 0 1.25rem; z-index: 2; } [data-bs-theme="dark"] .hsm-url-group .form-control { background: #0f172a; border-color: #334155; color: #e2e8f0; } .hsm-share-grid { display: flex; flex-direction: column; gap: 0.5rem; } .hsm-share-grid .btn { display: flex; align-items: center; justify-content: center; font-size: 0.9rem; padding: 0.6rem; border-color: var(--bs-border-color); width: 100%; } [data-bs-theme="dark"] .hsm-share-grid .btn { color: #e2e8f0; border-color: #475569; } [data-bs-theme="dark"] .hsm-share-grid .btn:hover { background: #334155; border-color: #cbd5e1; } </style> <script> (function(){ const modal = document.getElementById('headingShareModal'); if(!modal) return; const input = modal.querySelector('#headingShareInput'); const copyBtn = modal.querySelector('.hsm-copy'); const twitter = modal.querySelector('#share-twitter'); const linkedin = modal.querySelector('#share-linkedin'); const facebook = modal.querySelector('#share-facebook'); const closeBtn = modal.querySelector('.hsm-close'); let lastFocus=null; let trapBound=false; function buildUrl(id){ return window.location.origin + window.location.pathname + '#' + id; } function isOpen(){ return !modal.hasAttribute('hidden'); } function hydrate(id){ const url=buildUrl(id); input.value=url; const enc=encodeURIComponent(url); const text=encodeURIComponent(document.title); if(twitter) twitter.href=`https://twitter.com/intent/tweet?url=${enc}&text=${text}`; if(linkedin) linkedin.href=`https://www.linkedin.com/sharing/share-offsite/?url=${enc}`; if(facebook) facebook.href=`https://www.facebook.com/sharer/sharer.php?u=${enc}`; } function openModal(id){ lastFocus=document.activeElement; hydrate(id); if(!isOpen()){ modal.removeAttribute('hidden'); } requestAnimationFrame(()=>{ input.focus(); }); trapFocus(); } function closeModal(){ if(!isOpen()) return; modal.setAttribute('hidden',''); if(lastFocus && typeof lastFocus.focus==='function') lastFocus.focus(); } function copyCurrent(){ try{ navigator.clipboard.writeText(input.value).then(()=>feedback(true),()=>fallback()); } catch(e){ fallback(); } } function fallback(){ input.select(); try{ document.execCommand('copy'); feedback(true);}catch(e){ feedback(false);} } function feedback(ok){ if(!copyBtn) return; const icon=copyBtn.querySelector('i'); if(!icon) return; const prev=copyBtn.getAttribute('data-prev')||icon.className; if(!copyBtn.getAttribute('data-prev')) copyBtn.setAttribute('data-prev',prev); icon.className= ok ? 'fa-duotone fa-clipboard-check':'fa-duotone fa-circle-exclamation'; setTimeout(()=>{ icon.className=prev; },1800); } function handleShareClick(e){ e.preventDefault(); const btn=e.currentTarget; const id=btn.getAttribute('data-share-target'); if(id) openModal(id); } function bindShareButtons(){ document.querySelectorAll('.h-share').forEach(btn=>{ if(!btn.dataset.hShareBound){ btn.addEventListener('click', handleShareClick); btn.dataset.hShareBound='1'; } }); } bindShareButtons(); if(document.readyState==='loading'){ document.addEventListener('DOMContentLoaded', bindShareButtons); } else { requestAnimationFrame(bindShareButtons); } document.addEventListener('click', function(e){ const shareBtn=e.target.closest && e.target.closest('.h-share'); if(shareBtn && !shareBtn.dataset.hShareBound){ handleShareClick.call(shareBtn, e); } }, true); document.addEventListener('click', e=>{ if(e.target===modal) closeModal(); if(e.target.closest && e.target.closest('.hsm-close')){ e.preventDefault(); closeModal(); } if(copyBtn && (e.target===copyBtn || (e.target.closest && e.target.closest('.hsm-copy')))) { e.preventDefault(); copyCurrent(); } }); document.addEventListener('keydown', e=>{ if(e.key==='Escape' && isOpen()) closeModal(); }); function trapFocus(){ if(trapBound) return; trapBound=true; modal.addEventListener('keydown', f=>{ if(f.key==='Tab' && isOpen()){ const focusable=[...modal.querySelectorAll('a[href],button,input,textarea,select,[tabindex]:not([tabindex="-1"])')].filter(el=>!el.hasAttribute('disabled')); if(!focusable.length) return; const first=focusable[0]; const last=focusable[focusable.length-1]; if(f.shiftKey && document.activeElement===first){ f.preventDefault(); last.focus(); } else if(!f.shiftKey && document.activeElement===last){ f.preventDefault(); first.focus(); } } }); } if(closeBtn) closeBtn.addEventListener('click', e=>{ e.preventDefault(); closeModal(); }); })(); </script>Geode provides comprehensive telemetry capabilities for production observability, including streaming paging events, Prometheus metrics integration, QUIC transport metrics, and customizable monitoring dashboards. Version: v0.1.1+ (includes QUIC+TLS metrics) <h4 id="what-youll-learn" class="position-relative d-flex align-items-center group"> What You&rsquo;ll Learn <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="what-youll-learn" aria-haspopup="dialog" aria-label="Share link: What Youll Learn"> Share link </button> </h4><ul> <li>How to enable and configure streaming telemetry</li> <li>Prometheus metrics integration and scraping</li> <li>Grafana dashboard setup and customization</li> <li>QUIC transport metrics monitoring</li> <li>Custom metric instrumentation</li> <li>Alerting and incident response patterns</li> </ul> <h4 id="prerequisites" class="position-relative d-flex align-items-center group"> Prerequisites <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prerequisites" aria-haspopup="dialog" aria-label="Share link: Prerequisites"> Share link </button> </h4><ul> <li>Geode v0.1.1+ installed</li> <li>Basic understanding of metrics and monitoring</li> <li>Prometheus and Grafana (optional, for integration sections)</li> </ul> <h3 id="streaming-telemetry" class="position-relative d-flex align-items-center group"> Streaming Telemetry <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="streaming-telemetry" aria-haspopup="dialog" aria-label="Share link: Streaming Telemetry"> Share link </button> </h3> <h4 id="paging-events" class="position-relative d-flex align-items-center group"> Paging Events <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="paging-events" aria-haspopup="dialog" aria-label="Share link: Paging Events"> Share link </button> </h4>Geode can emit optional paging telemetry as JSON Lines on stderr. This feature is disabled by default to avoid noisy logs. Status: Optional, single-node, development/ops aid <h5 id="environment-variables" class="position-relative d-flex align-items-center group"> Environment Variables <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="environment-variables" aria-haspopup="dialog" aria-label="Share link: Environment Variables"> Share link </button> </h5>Core Toggles: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Enable pagination telemetry export GEODE_TELEMETRY_PAGING=1 # Client-side: Capture server stderr for e2e tests export GEODE_CAPTURE_SERVER_STDERR=1 </code></pre></div>CI/Testing Variables: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Enable telemetry smoke test export GEODE_CI_TELEMETRY_SMOKE=1 # Strict mode: Fail if telemetry missing export GEODE_CI_TELEMETRY_STRICT=1 # Safety: Avoid vendor teardown crashes in short-lived tests export GEODE_QUIC_PASSIVE_TEARDOWN=1 </code></pre></div> <h5 id="event-shape" class="position-relative d-flex align-items-center group"> Event Shape <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="event-shape" aria-haspopup="dialog" aria-label="Share link: Event Shape"> Share link </button> </h5>Each page emission produces a single JSON line on stderr: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "ts": "1758732000", "level": "INFO", "component": "server", "type": "TELEMETRY", "event": "PULL_PAGE", "page_size": "1000", "rows_emitted": "1000", "final": "false", "page": { "index": 0, "size": 1000 }, "ordered": true, "order_keys": ["timestamp"], "request_id": "550e8400-e29b-41d4-a716-446655440000" } </code></pre></div>Field Descriptions: <ul> <li><code>type</code>: Always <code>TELEMETRY</code> for these events</li> <li><code>event</code>: Always <code>PULL_PAGE</code> for paging emissions</li> <li><code>page_size</code>: Requested page size (stringified)</li> <li><code>rows_emitted</code>: Actual rows in this page (stringified)</li> <li><code>final</code>: <code>true</code> if last page, else <code>false</code></li> <li><code>ordered</code>: Whether result set is ordered</li> <li><code>order_keys</code>: Keys used for ordering</li> <li><code>request_id</code>: Unique identifier for the request</li> </ul> Emission Location: <code>src/server/main.zig</code> (guarded by <code>state.telemetry_paging</code>) <h5 id="usage-examples" class="position-relative d-flex align-items-center group"> Usage Examples <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="usage-examples" aria-haspopup="dialog" aria-label="Share link: Usage Examples"> Share link </button> </h5>Server with Telemetry: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Start server with paging telemetry enabled GEODE_TELEMETRY_PAGING=1 \ geode serve \ --listen 127.0.0.1:7567 \ --log-json \ --result-format json \ 2>server.log & PID=$! # Run queries geode query "RETURN 1 AS x ORDER BY x LIMIT 1" --format json # Check telemetry kill $PID cat server.log | grep '"event":"PULL_PAGE"' </code></pre></div>Client with Server Stderr Capture: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">GEODE_CAPTURE_SERVER_STDERR=1 \ geode query "RETURN 1 AS x ORDER BY x LIMIT 1" --format json </code></pre></div>Helper Script: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Quick local validation make build && bash scripts/telemetry-smoke.sh 1 </code></pre></div>The script: <ul> <li>Builds <code>zig-out/bin/geode</code></li> <li>Starts server with <code>GEODE_TELEMETRY_PAGING=1</code></li> <li>Runs ordered+limited query</li> <li>Prints telemetry lines containing <code>"event":"PULL_PAGE"</code></li> <li>Cleans up server process</li> </ul> <h4 id="testing-telemetry" class="position-relative d-flex align-items-center group"> Testing Telemetry <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="testing-telemetry" aria-haspopup="dialog" aria-label="Share link: Testing Telemetry"> Share link </button> </h4>CANARY Integration: <ul> <li>Test: <code>TestCANARY_REQ_GQL_090_TelemetryPagingPull</code></li> <li>Requirement: <code>REQ-GQL-090</code> - Streaming Telemetry, Paging Events</li> <li>Status: <code>TESTED</code></li> </ul> Smoke Tests: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Basic smoke test (non-fatal) GEODE_TELEMETRY_PAGING=1 \ GEODE_CI_TELEMETRY_SMOKE=1 \ zig test test_telemetry_smoke.zig # Strict mode (fails if telemetry missing) make quic-smoke-strict # Convenience (non-fatal) make quic-smoke </code></pre></div> <h3 id="quic-transport-metrics" class="position-relative d-flex align-items-center group"> QUIC Transport Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="quic-transport-metrics" aria-haspopup="dialog" aria-label="Share link: QUIC Transport Metrics"> Share link </button> </h3>New in v0.1.1: QUIC+TLS transport telemetry replaces TCP metrics. <h4 id="quic-metrics" class="position-relative d-flex align-items-center group"> QUIC Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="quic-metrics" aria-haspopup="dialog" aria-label="Share link: QUIC Metrics"> Share link </button> </h4>Handshake Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_quic_handshake_duration_seconds{quantile="0.5"} 0.015 geode_quic_handshake_duration_seconds{quantile="0.95"} 0.045 geode_quic_handshake_duration_seconds{quantile="0.99"} 0.120 geode_quic_handshake_success_total{} 1234 geode_quic_handshake_failure_total{} 5 </code></pre></div>Stream Multiplexing: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_quic_active_streams{} 42 geode_quic_stream_create_total{} 5678 geode_quic_stream_close_total{} 5636 geode_quic_max_concurrent_streams{} 100 </code></pre></div>Loss Recovery: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_quic_packet_loss_rate{} 0.001 geode_quic_rtt_milliseconds{quantile="0.5"} 12.5 geode_quic_rtt_milliseconds{quantile="0.95"} 45.0 geode_quic_retransmit_total{} 23 geode_quic_congestion_events_total{} 3 </code></pre></div>Data Transfer: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_quic_bytes_sent_total{} 12345678 geode_quic_bytes_received_total{} 98765432 geode_quic_throughput_mbps{} 125.5 </code></pre></div> <h4 id="tls-metrics" class="position-relative d-flex align-items-center group"> TLS Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="tls-metrics" aria-haspopup="dialog" aria-label="Share link: TLS Metrics"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_tls_version{version="1.3"} 1 geode_tls_cipher_suite{suite="TLS_AES_256_GCM_SHA384"} 1 geode_tls_handshake_duration_seconds{quantile="0.95"} 0.035 geode_tls_session_reuse_total{} 456 geode_tls_cert_verification_errors_total{} 0 </code></pre></div> <h3 id="prometheus-integration" class="position-relative d-flex align-items-center group"> Prometheus Integration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prometheus-integration" aria-haspopup="dialog" aria-label="Share link: Prometheus Integration"> Share link </button> </h3> <h4 id="server-configuration" class="position-relative d-flex align-items-center group"> Server Configuration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="server-configuration" aria-haspopup="dialog" aria-label="Share link: Server Configuration"> Share link </button> </h4>Enable Prometheus metrics endpoint: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># geode.yaml monitoring: prometheus: enable: true port: 9090 path: /metrics interval: 15s </code></pre></div>Command-Line: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">geode serve \ --prometheus-enable \ --prometheus-port 9090 \ --prometheus-path /metrics </code></pre></div>Environment Variables: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">export GEODE_PROMETHEUS_ENABLE=1 export GEODE_PROMETHEUS_PORT=9090 export GEODE_PROMETHEUS_PATH=/metrics </code></pre></div> <h4 id="prometheus-configuration" class="position-relative d-flex align-items-center group"> Prometheus Configuration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prometheus-configuration" aria-haspopup="dialog" aria-label="Share link: Prometheus Configuration"> Share link </button> </h4>Add Geode as a scrape target in <code>prometheus.yml</code>: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'geode' static_configs: - targets: ['localhost:9090'] labels: instance: 'geode-primary' environment: 'production' scrape_interval: 15s scrape_timeout: 10s metrics_path: '/metrics' - job_name: 'geode-federation' static_configs: - targets: - 'geode-shard-1:9090' - 'geode-shard-2:9090' - 'geode-shard-3:9090' scrape_interval: 30s </code></pre></div> <h4 id="available-metrics" class="position-relative d-flex align-items-center group"> Available Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="available-metrics" aria-haspopup="dialog" aria-label="Share link: Available Metrics"> Share link </button> </h4>Query Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_queries_total{status="success"} 12345 geode_queries_total{status="error"} 23 geode_query_duration_seconds{quantile="0.5"} 0.015 geode_query_duration_seconds{quantile="0.95"} 0.250 geode_query_duration_seconds{quantile="0.99"} 1.500 geode_slow_queries_total{threshold="1s"} 45 </code></pre></div>Connection Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_active_connections{} 42 geode_connection_create_total{} 1234 geode_connection_close_total{} 1192 geode_connection_errors_total{} 5 geode_connection_pool_size{type="max"} 5000 geode_connection_pool_size{type="current"} 42 </code></pre></div>Transaction Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_transactions_active{} 8 geode_transactions_total{status="committed"} 5678 geode_transactions_total{status="rolled_back"} 12 geode_transaction_duration_seconds{quantile="0.95"} 2.5 geode_transaction_conflicts_total{} 3 geode_deadlocks_detected_total{} 1 </code></pre></div>Storage Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_storage_size_bytes{type="data"} 10737418240 geode_storage_size_bytes{type="wal"} 536870912 geode_storage_size_bytes{type="index"} 2147483648 geode_wal_sync_duration_seconds{quantile="0.95"} 0.005 geode_checkpoint_duration_seconds{quantile="0.95"} 5.0 geode_page_faults_total{} 1234 </code></pre></div>Memory Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_memory_used_bytes{} 4294967296 geode_memory_limit_bytes{} 17179869184 geode_memory_usage_percent{} 25.0 geode_buffer_pool_size_bytes{} 8589934592 geode_buffer_pool_hit_rate{} 0.98 </code></pre></div>Index Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_index_operations_total{type="insert"} 12345 geode_index_operations_total{type="delete"} 234 geode_index_operations_total{type="search"} 56789 geode_index_size_bytes{name="hnsw_embeddings"} 1073741824 geode_hnsw_search_duration_seconds{quantile="0.95"} 0.012 </code></pre></div>Graph Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_node_count{label="Person"} 1000000 geode_node_count{label="Product"} 50000 geode_relationship_count{type="KNOWS"} 5000000 geode_relationship_count{type="PURCHASED"} 250000 geode_graph_size_nodes_total{} 1050000 geode_graph_size_relationships_total{} 5250000 </code></pre></div>Session Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">geode_active_sessions{} 42 geode_session_create_total{} 1234 geode_session_timeout_total{} 5 geode_avg_session_duration_seconds{} 180.5 geode_session_parameters_total{} 156 </code></pre></div> <h3 id="grafana-dashboards" class="position-relative d-flex align-items-center group"> Grafana Dashboards <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="grafana-dashboards" aria-haspopup="dialog" aria-label="Share link: Grafana Dashboards"> Share link </button> </h3> <h4 id="installation" class="position-relative d-flex align-items-center group"> Installation <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="installation" aria-haspopup="dialog" aria-label="Share link: Installation"> Share link </button> </h4><ol> <li>Install Grafana:</li> </ol> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"># Docker docker run -d \ --name=grafana \ -p 3000:3000 \ grafana/grafana # Or use package manager sudo apt-get install -y grafana sudo systemctl start grafana-server </code></pre></div><ol start="2"> <li> Access Grafana: http://localhost:3000 (default: admin/admin) </li> <li> Add Prometheus Data Source: <ul> <li>Configuration → Data Sources → Add data source</li> <li>Select Prometheus</li> <li>URL: http://localhost:9091 (Prometheus server)</li> <li>Save & Test</li> </ul> </li> </ol> <h4 id="pre-built-dashboards" class="position-relative d-flex align-items-center group"> Pre-Built Dashboards <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="pre-built-dashboards" aria-haspopup="dialog" aria-label="Share link: Pre-Built Dashboards"> Share link </button> </h4>Geode includes pre-built Grafana dashboards: Location: <code>monitoring/grafana/dashboards/</code> Available Dashboards: <ol> <li><code>geode-overview.json</code> - System overview</li> <li><code>geode-queries.json</code> - Query performance</li> <li><code>geode-transactions.json</code> - Transaction monitoring</li> <li><code>geode-storage.json</code> - Storage and I/O</li> <li><code>geode-quic.json</code> - QUIC transport metrics (v0.1.1+)</li> </ol> Import Dashboard: <ol> <li>Dashboards → Import</li> <li>Upload JSON file or paste JSON</li> <li>Select Prometheus data source</li> <li>Import</li> </ol> <h4 id="custom-dashboard-examples" class="position-relative d-flex align-items-center group"> Custom Dashboard Examples <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="custom-dashboard-examples" aria-haspopup="dialog" aria-label="Share link: Custom Dashboard Examples"> Share link </button> </h4> <h5 id="system-overview-dashboard" class="position-relative d-flex align-items-center group"> System Overview Dashboard <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="system-overview-dashboard" aria-haspopup="dialog" aria-label="Share link: System Overview Dashboard"> Share link </button> </h5>Key Panels: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "dashboard": { "title": "Geode System Overview", "panels": [ { "title": "Query Rate", "type": "graph", "targets": [{ "expr": "rate(geode_queries_total[5m])", "legendFormat": "{{status}}" }] }, { "title": "Active Connections", "type": "stat", "targets": [{ "expr": "geode_active_connections" }] }, { "title": "Memory Usage", "type": "gauge", "targets": [{ "expr": "(geode_memory_used_bytes / geode_memory_limit_bytes) * 100" }] }, { "title": "Query Duration (p95)", "type": "graph", "targets": [{ "expr": "geode_query_duration_seconds{quantile=\"0.95\"}" }] } ] } } </code></pre></div> <h5 id="quic-transport-dashboard-v011" class="position-relative d-flex align-items-center group"> QUIC Transport Dashboard (v0.1.1+) <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="quic-transport-dashboard-v011" aria-haspopup="dialog" aria-label="Share link: QUIC Transport Dashboard (v0.1.1&#43;)"> Share link </button> </h5><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "panels": [ { "title": "QUIC Handshake Duration", "targets": [{ "expr": "geode_quic_handshake_duration_seconds" }] }, { "title": "Active QUIC Streams", "targets": [{ "expr": "geode_quic_active_streams" }] }, { "title": "Packet Loss Rate", "targets": [{ "expr": "geode_quic_packet_loss_rate" }] }, { "title": "RTT Distribution", "targets": [{ "expr": "geode_quic_rtt_milliseconds" }] } ] } </code></pre></div> <h4 id="dashboard-variables" class="position-relative d-flex align-items-center group"> Dashboard Variables <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dashboard-variables" aria-haspopup="dialog" aria-label="Share link: Dashboard Variables"> Share link </button> </h4>Create dynamic dashboards with variables: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "templating": { "list": [ { "name": "instance", "type": "query", "query": "label_values(geode_queries_total, instance)", "multi": true }, { "name": "label", "type": "query", "query": "label_values(geode_node_count, label)" } ] } } </code></pre></div>Use in queries: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql">geode_node_count{instance=~"$instance", label="$label"} </code></pre></div> <h3 id="custom-metrics" class="position-relative d-flex align-items-center group"> Custom Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="custom-metrics" aria-haspopup="dialog" aria-label="Share link: Custom Metrics"> Share link </button> </h3> <h4 id="application-level-metrics" class="position-relative d-flex align-items-center group"> Application-Level Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="application-level-metrics" aria-haspopup="dialog" aria-label="Share link: Application-Level Metrics"> Share link </button> </h4>Instrument your application code: Go Example: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go">import ( "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" ) var ( userLoginCounter = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "app_user_logins_total", Help: "Total number of user logins", }, []string{"status"}, ) queryLatency = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "app_query_duration_seconds", Help: "Query execution duration", Buckets: prometheus.DefBuckets, }, []string{"query_type"}, ) ) func handleLogin(username, password string) error { start := time.Now() defer func() { duration := time.Since(start).Seconds() queryLatency.WithLabelValues("user_auth").Observe(duration) }() err := authenticateUser(username, password) if err != nil { userLoginCounter.WithLabelValues("failure").Inc() return err } userLoginCounter.WithLabelValues("success").Inc() return nil } </code></pre></div>Python Example: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">from prometheus_client import Counter, Histogram user_login_counter = Counter( 'app_user_logins_total', 'Total number of user logins', ['status'] ) query_latency = Histogram( 'app_query_duration_seconds', 'Query execution duration', ['query_type'] ) async def handle_login(username: str, password: str) -> bool: with query_latency.labels('user_auth').time(): try: authenticated = await authenticate_user(username, password) if authenticated: user_login_counter.labels('success').inc() return True else: user_login_counter.labels('failure').inc() return False except Exception as e: user_login_counter.labels('error').inc() raise </code></pre></div> <h4 id="business-metrics" class="position-relative d-flex align-items-center group"> Business Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="business-metrics" aria-haspopup="dialog" aria-label="Share link: Business Metrics"> Share link </button> </h4>Track domain-specific metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Fraud detection rate sum(rate(fraud_detected_total[5m])) / sum(rate(transactions_total[5m])) # Recommendation click-through rate sum(rate(recommendation_clicks_total[5m])) / sum(rate(recommendations_shown_total[5m])) # Knowledge graph coverage geode_node_count{label="Entity"} / geode_target_entities </code></pre></div> <h3 id="alerting" class="position-relative d-flex align-items-center group"> Alerting <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alerting" aria-haspopup="dialog" aria-label="Share link: Alerting"> Share link </button> </h3> <h4 id="prometheus-alerting-rules" class="position-relative d-flex align-items-center group"> Prometheus Alerting Rules <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prometheus-alerting-rules" aria-haspopup="dialog" aria-label="Share link: Prometheus Alerting Rules"> Share link </button> </h4>Create <code>alerts.yml</code>: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">groups: - name: geode interval: 30s rules: - alert: GeodeDown expr: up{job="geode"} == 0 for: 5m labels: severity: critical annotations: summary: "Geode instance is down" description: "Geode instance {{ $labels.instance }} has been down for more than 5 minutes." - alert: HighQueryLatency expr: geode_query_duration_seconds{quantile="0.95"} > 5 for: 2m labels: severity: warning annotations: summary: "High query latency detected" description: "P95 query latency is {{ $value }}s on {{ $labels.instance }}" - alert: HighMemoryUsage expr: (geode_memory_used_bytes / geode_memory_limit_bytes) * 100 > 90 for: 2m labels: severity: warning annotations: summary: "High memory usage" description: "Memory usage is {{ $value }}% on {{ $labels.instance }}" - alert: HighConnectionCount expr: geode_active_connections > geode_max_connections * 0.9 for: 1m labels: severity: warning annotations: summary: "High connection count" description: "{{ $value }} active connections on {{ $labels.instance }}" - alert: TransactionDeadlock expr: rate(geode_deadlocks_detected_total[5m]) > 0 for: 1m labels: severity: warning annotations: summary: "Transaction deadlocks detected" description: "{{ $value }} deadlocks/sec on {{ $labels.instance }}" - alert: SlowQueries expr: rate(geode_slow_queries_total[5m]) > 10 for: 5m labels: severity: info annotations: summary: "High rate of slow queries" description: "{{ $value }} slow queries/sec on {{ $labels.instance }}" </code></pre></div> <h4 id="alertmanager-configuration" class="position-relative d-flex align-items-center group"> Alertmanager Configuration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alertmanager-configuration" aria-haspopup="dialog" aria-label="Share link: Alertmanager Configuration"> Share link </button> </h4>Configure alert routing in <code>alertmanager.yml</code>: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">global: smtp_smarthost: 'localhost:587' smtp_from: '[email protected]' route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'web.hook' routes: - match: severity: critical receiver: critical repeat_interval: 5m - match: severity: warning receiver: warnings receivers: - name: 'web.hook' webhook_configs: - url: 'http://localhost:5001/webhook' send_resolved: true - name: 'critical' email_configs: - to: '[email protected]' subject: '🚨 Critical Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' slack_configs: - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' channel: '#alerts' title: 'Geode Critical Alert' text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' - name: 'warnings' email_configs: - to: '[email protected]' </code></pre></div> <h3 id="advanced-monitoring-patterns" class="position-relative d-flex align-items-center group"> Advanced Monitoring Patterns <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-monitoring-patterns" aria-haspopup="dialog" aria-label="Share link: Advanced Monitoring Patterns"> Share link </button> </h3> <h4 id="slislo-monitoring" class="position-relative d-flex align-items-center group"> SLI/SLO Monitoring <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="slislo-monitoring" aria-haspopup="dialog" aria-label="Share link: SLI/SLO Monitoring"> Share link </button> </h4>Track Service Level Indicators and Objectives: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Availability SLI (target: 99.9%) (sum(rate(geode_queries_total{status="success"}[30d])) / sum(rate(geode_queries_total[30d]))) * 100 # Latency SLI (target: p95 < 100ms) geode_query_duration_seconds{quantile="0.95"} < 0.1 # Error budget remaining (1 - availability target) 1 - ((sum(rate(geode_queries_total{status="success"}[30d])) / sum(rate(geode_queries_total[30d])))) </code></pre></div>Grafana SLO Dashboard: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "panels": [ { "title": "Availability (30d rolling)", "targets": [{ "expr": "(sum(rate(geode_queries_total{status=\"success\"}[30d])) / sum(rate(geode_queries_total[30d]))) * 100" }], "thresholds": [ { "value": 99.9, "color": "green" }, { "value": 99.0, "color": "yellow" }, { "value": 0, "color": "red" } ] } ] } </code></pre></div> <h4 id="capacity-planning" class="position-relative d-flex align-items-center group"> Capacity Planning <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="capacity-planning" aria-haspopup="dialog" aria-label="Share link: Capacity Planning"> Share link </button> </h4>Monitor resource trends: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Predict storage exhaustion (linear regression) predict_linear(geode_storage_size_bytes{type="data"}[7d], 30 * 24 * 3600) # Connection pool utilization trend avg_over_time(geode_active_connections[7d]) / geode_connection_pool_size{type="max"} # Memory growth rate deriv(geode_memory_used_bytes[1h]) </code></pre></div> <h4 id="anomaly-detection" class="position-relative d-flex align-items-center group"> Anomaly Detection <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="anomaly-detection" aria-haspopup="dialog" aria-label="Share link: Anomaly Detection"> Share link </button> </h4>Use recording rules for anomaly detection: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">groups: - name: anomaly_detection interval: 1m rules: - record: query_latency_baseline expr: avg_over_time(geode_query_duration_seconds{quantile="0.95"}[7d]) - alert: LatencyAnomaly expr: geode_query_duration_seconds{quantile="0.95"} > query_latency_baseline * 2 for: 5m labels: severity: warning annotations: summary: "Query latency anomaly detected" </code></pre></div> <h3 id="troubleshooting" class="position-relative d-flex align-items-center group"> Troubleshooting <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="troubleshooting" aria-haspopup="dialog" aria-label="Share link: Troubleshooting"> Share link </button> </h3> <h4 id="missing-metrics" class="position-relative d-flex align-items-center group"> Missing Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="missing-metrics" aria-haspopup="dialog" aria-label="Share link: Missing Metrics"> Share link </button> </h4>Symptom: Prometheus shows no targets or metrics unavailable Solutions: <ol> <li> Verify server is running with metrics enabled: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">curl http://localhost:9090/metrics </code></pre></div></li> <li> Check Prometheus configuration: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">promtool check config prometheus.yml </code></pre></div></li> <li> Verify firewall allows port 9090 </li> <li> Check Prometheus logs: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">journalctl -u prometheus -f </code></pre></div></li> </ol> <h4 id="high-cardinality-metrics" class="position-relative d-flex align-items-center group"> High Cardinality Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="high-cardinality-metrics" aria-haspopup="dialog" aria-label="Share link: High Cardinality Metrics"> Share link </button> </h4>Symptom: Prometheus consuming excessive memory Solutions: <ol> <li> Limit label cardinality: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># ❌ Bad: Unique label per user geode_user_queries_total{user_id="12345"} # ✅ Good: Aggregated labels geode_user_queries_total{user_type="premium"} </code></pre></div></li> <li> Use recording rules to pre-aggregate: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">- record: query_rate_by_type expr: sum(rate(geode_queries_total[5m])) by (type) </code></pre></div></li> <li> Adjust Prometheus retention: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">prometheus --storage.tsdb.retention.time=15d </code></pre></div></li> </ol> <h4 id="telemetry-performance-impact" class="position-relative d-flex align-items-center group"> Telemetry Performance Impact <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="telemetry-performance-impact" aria-haspopup="dialog" aria-label="Share link: Telemetry Performance Impact"> Share link </button> </h4>Symptom: Telemetry causing performance degradation Solutions: <ol> <li> Disable paging telemetry in production: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">unset GEODE_TELEMETRY_PAGING </code></pre></div></li> <li> Reduce Prometheus scrape interval: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">scrape_interval: 60s # Instead of 15s </code></pre></div></li> <li> Use sampling for high-frequency events </li> </ol> <h3 id="next-steps" class="position-relative d-flex align-items-center group"> Next Steps <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="next-steps" aria-haspopup="dialog" aria-label="Share link: Next Steps"> Share link </button> </h3>Explore More: <ul> <li><a href="/docs/ops/observability/" >Observability Guide</a> - Logging and tracing</li> <li><a href="/docs/ops/deployment/" >Deployment</a> - Production deployment</li> <li><a href="/docs/query/performance-tuning/" >Performance Tuning</a> - Query optimization</li> </ul> Related Topics: <ul> <li><a href="/docs/ops/observability/#alerting" >Alerting Best Practices</a> - Alert design</li> <li><a href="/docs/guides/backup-automation/" >Backup and Recovery</a> - Data protection</li> <li><a href="/docs/security/overview/#audit-logging" >Security Monitoring</a> - Security events</li> </ul> Tools: <ul> <li><a href="https://prometheus.io/" aria-label="Prometheus – opens in new window" target="_blank" rel="noopener noreferrer" >Prometheus ↗ </a> - Metrics collection</li> <li><a href="https://grafana.com/" aria-label="Grafana – opens in new window" target="_blank" rel="noopener noreferrer" >Grafana ↗ </a> - Visualization</li> <li><a href="https://prometheus.io/docs/alerting/latest/alertmanager/" aria-label="Alertmanager – opens in new window" target="_blank" rel="noopener noreferrer" >Alertmanager ↗ </a> - Alert routing</li> </ul> <hr> Version: v0.1.1+ (QUIC+TLS metrics) Telemetry: Optional streaming events (stderr JSON Lines) Prometheus: Full integration with 50+ metrics Status: Production-ready