Tag: Prometheus Metrics & Monitoring

Prometheus is the industry-standard monitoring solution for cloud-native applications, and Geode provides first-class Prometheus integration. Geode exposes comprehensive metrics covering queries, transactions, connections, storage, memory, and system health through a Prometheus-compatible <code>/metrics</code> endpoint. By integrating Geode with Prometheus, you gain real-time visibility into database performance, can set up intelligent alerting on critical conditions, and build comprehensive dashboards for operational insights. Combined with Grafana for visualization, Prometheus and Geode form a powerful observability stack for production graph database deployments. This guide covers Prometheus integration patterns, essential metrics, alerting rules, and visualization strategies for monitoring Geode effectively. <h3 id="prometheus-architecture-with-geode" class="position-relative d-flex align-items-center group"> Prometheus Architecture with Geode <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prometheus-architecture-with-geode" aria-haspopup="dialog" aria-label="Share link: Prometheus Architecture with Geode"> Share link </button> </h3><div id="headingShareModal" class="heading-share-modal" role="dialog" aria-modal="true" aria-labelledby="headingShareTitle" hidden> <div class="hsm-dialog" role="document"> <div class="hsm-header"> <h2 id="headingShareTitle" class="h6 mb-0 fw-bold">Share this section</h2> <button type="button" class="hsm-close" aria-label="Close"> </button> </div> <div class="hsm-body"> <label for="headingShareInput" class="form-label small text-muted mb-1 text-uppercase fw-bold" style="font-size: 0.7rem; letter-spacing: 0.5px;">Permalink</label> <div class="input-group mb-4 hsm-url-group"> <input id="headingShareInput" type="text" class="form-control font-monospace" readonly aria-readonly="true" style="font-size: 0.85rem;" /> <button class="btn btn-primary hsm-copy" type="button" aria-label="Copy" title="Copy"> </button> </div> <div class="small fw-bold mb-2 text-muted text-uppercase" style="font-size: 0.7rem; letter-spacing: 0.5px;">Share via</div> <div class="hsm-share-grid"> <a id="share-twitter" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Twitter </a> <a id="share-linkedin" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> LinkedIn </a> <a id="share-facebook" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> Facebook </a> </div> </div> </div> </div> <style> .heading-share-modal { position: fixed; inset: 0; display: flex; justify-content: center; align-items: center; background: rgba(0, 0, 0, 0.6); z-index: 1050; padding: 1rem; backdrop-filter: blur(4px); -webkit-backdrop-filter: blur(4px); } .heading-share-modal[hidden] { display: none !important; } .hsm-dialog { max-width: 420px; width: 100%; background: var(--bs-body-bg, #fff); color: var(--bs-body-color, #212529); border: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); border-radius: 1rem; box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25); overflow: hidden; animation: hsm-fade-in 0.2s ease-out; } @keyframes hsm-fade-in { from { opacity: 0; transform: scale(0.95); } to { opacity: 1; transform: scale(1); } } [data-bs-theme="dark"] .hsm-dialog { background: #1e293b; border-color: rgba(255,255,255,0.1); color: #f8f9fa; } .hsm-header { display: flex; justify-content: space-between; align-items: center; padding: 1rem 1.5rem; border-bottom: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); background: rgba(0,0,0,0.02); } [data-bs-theme="dark"] .hsm-header { background: rgba(255,255,255,0.02); border-color: rgba(255,255,255,0.1); } .hsm-close { background: transparent; border: none; color: inherit; opacity: 0.5; padding: 0.25rem 0.5rem; border-radius: 0.25rem; font-size: 1.2rem; line-height: 1; transition: opacity 0.2s; } .hsm-close:hover { opacity: 1; } .hsm-body { padding: 1.5rem; } .hsm-url-group { display: flex !important; align-items: stretch; } .hsm-url-group .form-control { flex: 1; min-width: 0; margin: 0; background: var(--bs-secondary-bg, #f8f9fa); border-color: var(--bs-border-color, #dee2e6); border-top-right-radius: 0; border-bottom-right-radius: 0; height: 42px; } .hsm-url-group .btn { flex: 0 0 auto; margin: 0; margin-left: -1px; border-top-left-radius: 0; border-bottom-left-radius: 0; height: 42px; display: flex; align-items: center; justify-content: center; padding: 0 1.25rem; z-index: 2; } [data-bs-theme="dark"] .hsm-url-group .form-control { background: #0f172a; border-color: #334155; color: #e2e8f0; } .hsm-share-grid { display: flex; flex-direction: column; gap: 0.5rem; } .hsm-share-grid .btn { display: flex; align-items: center; justify-content: center; font-size: 0.9rem; padding: 0.6rem; border-color: var(--bs-border-color); width: 100%; } [data-bs-theme="dark"] .hsm-share-grid .btn { color: #e2e8f0; border-color: #475569; } [data-bs-theme="dark"] .hsm-share-grid .btn:hover { background: #334155; border-color: #cbd5e1; } </style> <script> (function(){ const modal = document.getElementById('headingShareModal'); if(!modal) return; const input = modal.querySelector('#headingShareInput'); const copyBtn = modal.querySelector('.hsm-copy'); const twitter = modal.querySelector('#share-twitter'); const linkedin = modal.querySelector('#share-linkedin'); const facebook = modal.querySelector('#share-facebook'); const closeBtn = modal.querySelector('.hsm-close'); let lastFocus=null; let trapBound=false; function buildUrl(id){ return window.location.origin + window.location.pathname + '#' + id; } function isOpen(){ return !modal.hasAttribute('hidden'); } function hydrate(id){ const url=buildUrl(id); input.value=url; const enc=encodeURIComponent(url); const text=encodeURIComponent(document.title); if(twitter) twitter.href=`https://twitter.com/intent/tweet?url=${enc}&text=${text}`; if(linkedin) linkedin.href=`https://www.linkedin.com/sharing/share-offsite/?url=${enc}`; if(facebook) facebook.href=`https://www.facebook.com/sharer/sharer.php?u=${enc}`; } function openModal(id){ lastFocus=document.activeElement; hydrate(id); if(!isOpen()){ modal.removeAttribute('hidden'); } requestAnimationFrame(()=>{ input.focus(); }); trapFocus(); } function closeModal(){ if(!isOpen()) return; modal.setAttribute('hidden',''); if(lastFocus && typeof lastFocus.focus==='function') lastFocus.focus(); } function copyCurrent(){ try{ navigator.clipboard.writeText(input.value).then(()=>feedback(true),()=>fallback()); } catch(e){ fallback(); } } function fallback(){ input.select(); try{ document.execCommand('copy'); feedback(true);}catch(e){ feedback(false);} } function feedback(ok){ if(!copyBtn) return; const icon=copyBtn.querySelector('i'); if(!icon) return; const prev=copyBtn.getAttribute('data-prev')||icon.className; if(!copyBtn.getAttribute('data-prev')) copyBtn.setAttribute('data-prev',prev); icon.className= ok ? 'fa-duotone fa-clipboard-check':'fa-duotone fa-circle-exclamation'; setTimeout(()=>{ icon.className=prev; },1800); } function handleShareClick(e){ e.preventDefault(); const btn=e.currentTarget; const id=btn.getAttribute('data-share-target'); if(id) openModal(id); } function bindShareButtons(){ document.querySelectorAll('.h-share').forEach(btn=>{ if(!btn.dataset.hShareBound){ btn.addEventListener('click', handleShareClick); btn.dataset.hShareBound='1'; } }); } bindShareButtons(); if(document.readyState==='loading'){ document.addEventListener('DOMContentLoaded', bindShareButtons); } else { requestAnimationFrame(bindShareButtons); } document.addEventListener('click', function(e){ const shareBtn=e.target.closest && e.target.closest('.h-share'); if(shareBtn && !shareBtn.dataset.hShareBound){ handleShareClick.call(shareBtn, e); } }, true); document.addEventListener('click', e=>{ if(e.target===modal) closeModal(); if(e.target.closest && e.target.closest('.hsm-close')){ e.preventDefault(); closeModal(); } if(copyBtn && (e.target===copyBtn || (e.target.closest && e.target.closest('.hsm-copy')))) { e.preventDefault(); copyCurrent(); } }); document.addEventListener('keydown', e=>{ if(e.key==='Escape' && isOpen()) closeModal(); }); function trapFocus(){ if(trapBound) return; trapBound=true; modal.addEventListener('keydown', f=>{ if(f.key==='Tab' && isOpen()){ const focusable=[...modal.querySelectorAll('a[href],button,input,textarea,select,[tabindex]:not([tabindex="-1"])')].filter(el=>!el.hasAttribute('disabled')); if(!focusable.length) return; const first=focusable[0]; const last=focusable[focusable.length-1]; if(f.shiftKey && document.activeElement===first){ f.preventDefault(); last.focus(); } else if(!f.shiftKey && document.activeElement===last){ f.preventDefault(); first.focus(); } } }); } if(closeBtn) closeBtn.addEventListener('click', e=>{ e.preventDefault(); closeModal(); }); })(); </script>Prometheus uses a pull-based model where it periodically scrapes metrics from instrumented applications. Geode exposes metrics at <code>http://geode-host:8080/metrics</code> in Prometheus exposition format: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"># HELP geode_queries_total Total number of queries executed # TYPE geode_queries_total counter geode_queries_total{status="success"} 125847 geode_queries_total{status="error"} 342 # HELP geode_query_duration_seconds Query execution duration # TYPE geode_query_duration_seconds histogram geode_query_duration_seconds_bucket{le="0.01"} 45234 geode_query_duration_seconds_bucket{le="0.05"} 89432 geode_query_duration_seconds_bucket{le="0.1"} 112847 geode_query_duration_seconds_sum 12847.3 geode_query_duration_seconds_count 125847 </code></pre></div> <h3 id="configuring-prometheus-scraping" class="position-relative d-flex align-items-center group"> Configuring Prometheus Scraping <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="configuring-prometheus-scraping" aria-haspopup="dialog" aria-label="Share link: Configuring Prometheus Scraping"> Share link </button> </h3>Basic Prometheus Configuration: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s external_labels: cluster: 'geode-production' scrape_configs: - job_name: 'geode' static_configs: - targets: ['localhost:8080'] labels: instance: 'geode-primary' environment: 'production' scrape_interval: 10s scrape_timeout: 5s metrics_path: '/metrics' </code></pre></div>Multi-Instance Configuration: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">scrape_configs: - job_name: 'geode-cluster' static_configs: - targets: - 'geode-node-1:8080' - 'geode-node-2:8080' - 'geode-node-3:8080' labels: cluster: 'geode-prod' datacenter: 'us-east-1' </code></pre></div>Kubernetes Service Discovery: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">scrape_configs: - job_name: 'geode-k8s' kubernetes_sd_configs: - role: pod namespaces: names: - geode relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: geode - source_labels: [__meta_kubernetes_pod_name] target_label: instance - source_labels: [__meta_kubernetes_namespace] target_label: namespace </code></pre></div> <h3 id="essential-geode-metrics" class="position-relative d-flex align-items-center group"> Essential Geode Metrics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="essential-geode-metrics" aria-haspopup="dialog" aria-label="Share link: Essential Geode Metrics"> Share link </button> </h3>Query Performance Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Query rate by status rate(geode_queries_total[5m]) # Average query duration (p50, p95, p99) histogram_quantile(0.50, rate(geode_query_duration_seconds_bucket[5m])) histogram_quantile(0.95, rate(geode_query_duration_seconds_bucket[5m])) histogram_quantile(0.99, rate(geode_query_duration_seconds_bucket[5m])) # Slow query rate (queries > 1s) rate(geode_slow_queries_total[5m]) # Query errors by type sum by (error_type) (rate(geode_query_errors_total[5m])) </code></pre></div>Transaction Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Transaction commit rate rate(geode_transactions_total{status="committed"}[5m]) # Transaction rollback rate rate(geode_transactions_total{status="rolled_back"}[5m]) # Transaction conflict rate rate(geode_transaction_conflicts_total[5m]) # Active transactions geode_active_transactions # Transaction duration histogram_quantile(0.95, rate(geode_transaction_duration_seconds_bucket[5m])) </code></pre></div>Connection Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Active connections geode_active_connections # Connection pool utilization geode_active_connections / geode_max_connections * 100 # Connection error rate rate(geode_connection_errors_total[5m]) # Connections by client type sum by (client_type) (geode_active_connections) </code></pre></div>Memory and Resource Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Memory usage geode_memory_used_bytes # Memory usage percentage geode_memory_used_bytes / geode_memory_total_bytes * 100 # Cache hit rate rate(geode_cache_hits_total[5m]) / (rate(geode_cache_hits_total[5m]) + rate(geode_cache_misses_total[5m])) # MVCC version count geode_mvcc_versions_total # Garbage collection time rate(geode_gc_duration_seconds_sum[5m]) </code></pre></div>Storage Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Disk space used geode_disk_used_bytes # Disk space available geode_disk_free_bytes # WAL size geode_wal_size_bytes # Disk I/O operations rate(geode_disk_io_operations_total[5m]) # Checkpoint duration geode_checkpoint_duration_seconds </code></pre></div>Index Metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Index size geode_index_size_bytes # Index lookups rate(geode_index_lookups_total[5m]) # Index hit rate rate(geode_index_hits_total[5m]) / rate(geode_index_lookups_total[5m]) </code></pre></div> <h3 id="alerting-rules" class="position-relative d-flex align-items-center group"> Alerting Rules <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alerting-rules" aria-haspopup="dialog" aria-label="Share link: Alerting Rules"> Share link </button> </h3>Critical Alerts: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">groups: - name: geode_critical interval: 30s rules: - alert: GeodeDown expr: up{job="geode"} == 0 for: 1m labels: severity: critical annotations: summary: "Geode instance {{ $labels.instance }} is down" description: "Geode has been unreachable for more than 1 minute" - alert: HighQueryErrorRate expr: | rate(geode_queries_total{status="error"}[5m]) > 10 for: 5m labels: severity: critical annotations: summary: "High query error rate on {{ $labels.instance }}" description: "Query error rate is {{ $value }} errors/sec" - alert: DiskSpaceCritical expr: | geode_disk_free_bytes / geode_disk_total_bytes < 0.05 for: 5m labels: severity: critical annotations: summary: "Critical disk space on {{ $labels.instance }}" description: "Only {{ $value | humanizePercentage }} disk space remaining" - alert: MemoryExhaustion expr: | geode_memory_used_bytes / geode_memory_total_bytes > 0.95 for: 10m labels: severity: critical annotations: summary: "Memory near exhaustion on {{ $labels.instance }}" description: "Memory usage at {{ $value | humanizePercentage }}" </code></pre></div>Warning Alerts: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"> - name: geode_warnings interval: 1m rules: - alert: SlowQueryRateIncreasing expr: | rate(geode_slow_queries_total[5m]) > 5 for: 10m labels: severity: warning annotations: summary: "Slow query rate increasing on {{ $labels.instance }}" description: "{{ $value }} slow queries per second" - alert: HighTransactionConflicts expr: | rate(geode_transaction_conflicts_total[5m]) > 100 for: 10m labels: severity: warning annotations: summary: "High transaction conflict rate" description: "{{ $value }} conflicts per second" - alert: ConnectionPoolPressure expr: | geode_active_connections / geode_max_connections > 0.8 for: 15m labels: severity: warning annotations: summary: "Connection pool pressure on {{ $labels.instance }}" description: "{{ $value | humanizePercentage }} of connections used" - alert: LongRunningQueries expr: | geode_active_queries{duration_seconds=">60"} > 5 for: 5m labels: severity: warning annotations: summary: "Multiple long-running queries detected" description: "{{ $value }} queries running for >60 seconds" - alert: WalSizeGrowing expr: | deriv(geode_wal_size_bytes[30m]) > 1e9 for: 30m labels: severity: warning annotations: summary: "WAL size growing rapidly on {{ $labels.instance }}" description: "WAL growing at {{ $value | humanize1024 }}/sec" </code></pre></div> <h3 id="grafana-dashboard-queries" class="position-relative d-flex align-items-center group"> Grafana Dashboard Queries <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="grafana-dashboard-queries" aria-haspopup="dialog" aria-label="Share link: Grafana Dashboard Queries"> Share link </button> </h3>Query Performance Dashboard: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Panel: Query Rate sum(rate(geode_queries_total[5m])) by (status) # Panel: Query Latency Percentiles histogram_quantile(0.50, sum(rate(geode_query_duration_seconds_bucket[5m])) by (le)) histogram_quantile(0.95, sum(rate(geode_query_duration_seconds_bucket[5m])) by (le)) histogram_quantile(0.99, sum(rate(geode_query_duration_seconds_bucket[5m])) by (le)) # Panel: Top Slow Queries topk(10, geode_query_duration_seconds{quantile="0.99"}) # Panel: Query Errors by Type sum by (error_type) (rate(geode_query_errors_total[5m])) </code></pre></div>System Health Dashboard: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Panel: CPU Usage rate(geode_cpu_seconds_total[5m]) * 100 # Panel: Memory Usage geode_memory_used_bytes / geode_memory_total_bytes * 100 # Panel: Disk Usage geode_disk_used_bytes / geode_disk_total_bytes * 100 # Panel: Active Connections geode_active_connections </code></pre></div>Transaction Dashboard: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Panel: Transaction Throughput rate(geode_transactions_total{status="committed"}[5m]) # Panel: Transaction Success Rate rate(geode_transactions_total{status="committed"}[5m]) / rate(geode_transactions_total[5m]) * 100 # Panel: Transaction Conflicts rate(geode_transaction_conflicts_total[5m]) # Panel: Transaction Duration Heatmap sum(rate(geode_transaction_duration_seconds_bucket[5m])) by (le) </code></pre></div> <h3 id="recording-rules-for-performance" class="position-relative d-flex align-items-center group"> Recording Rules for Performance <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="recording-rules-for-performance" aria-haspopup="dialog" aria-label="Share link: Recording Rules for Performance"> Share link </button> </h3>Pre-aggregate expensive queries using recording rules: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">groups: - name: geode_recordings interval: 15s rules: - record: job:geode_query_rate:5m expr: sum(rate(geode_queries_total[5m])) by (job, instance) - record: job:geode_query_latency_p95:5m expr: histogram_quantile(0.95, sum(rate(geode_query_duration_seconds_bucket[5m])) by (job, instance, le)) - record: job:geode_memory_usage_percent:current expr: | geode_memory_used_bytes / geode_memory_total_bytes * 100 - record: job:geode_cache_hit_rate:5m expr: | rate(geode_cache_hits_total[5m]) / (rate(geode_cache_hits_total[5m]) + rate(geode_cache_misses_total[5m])) </code></pre></div> <h3 id="advanced-monitoring-patterns" class="position-relative d-flex align-items-center group"> Advanced Monitoring Patterns <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-monitoring-patterns" aria-haspopup="dialog" aria-label="Share link: Advanced Monitoring Patterns"> Share link </button> </h3>Predict Disk Fullness: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql">predict_linear(geode_disk_free_bytes[1h], 4 * 3600) < 0 </code></pre></div>Detect Anomalies in Query Rate: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql">abs(rate(geode_queries_total[5m]) - avg_over_time(rate(geode_queries_total[5m])[1h:5m])) > 2 * stddev_over_time(rate(geode_queries_total[5m])[1h:5m]) </code></pre></div>Correlate Query Performance with Memory Pressure: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql">rate(geode_query_duration_seconds_sum[5m]) / rate(geode_query_duration_seconds_count[5m]) and geode_memory_used_bytes / geode_memory_total_bytes > 0.8 </code></pre></div> <h3 id="federation-for-multi-cluster-monitoring" class="position-relative d-flex align-items-center group"> Federation for Multi-Cluster Monitoring <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="federation-for-multi-cluster-monitoring" aria-haspopup="dialog" aria-label="Share link: Federation for Multi-Cluster Monitoring"> Share link </button> </h3>Monitor multiple Geode clusters from a central Prometheus: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># Central Prometheus configuration scrape_configs: - job_name: 'federate-us-east' honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="geode"}' static_configs: - targets: - 'prometheus-us-east:9090' labels: region: 'us-east-1' - job_name: 'federate-us-west' honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="geode"}' static_configs: - targets: - 'prometheus-us-west:9090' labels: region: 'us-west-2' </code></pre></div> <h3 id="advanced-prometheus-patterns" class="position-relative d-flex align-items-center group"> Advanced Prometheus Patterns <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-prometheus-patterns" aria-haspopup="dialog" aria-label="Share link: Advanced Prometheus Patterns"> Share link </button> </h3> <h4 id="dynamic-service-discovery" class="position-relative d-flex align-items-center group"> Dynamic Service Discovery <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dynamic-service-discovery" aria-haspopup="dialog" aria-label="Share link: Dynamic Service Discovery"> Share link </button> </h4>For dynamic Geode clusters, use service discovery: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># Consul service discovery scrape_configs: - job_name: 'geode-consul' consul_sd_configs: - server: 'localhost:8500' services: ['geode'] relabel_configs: - source_labels: [__meta_consul_service] target_label: job - source_labels: [__meta_consul_node] target_label: instance - source_labels: [__meta_consul_tags] regex: '.*,environment=([^,]+),.*' target_label: environment </code></pre></div> <h4 id="custom-exporters" class="position-relative d-flex align-items-center group"> Custom Exporters <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="custom-exporters" aria-haspopup="dialog" aria-label="Share link: Custom Exporters"> Share link </button> </h4>Create custom exporters for Geode-specific metrics: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"># geode_exporter.py from prometheus_client import start_http_server, Gauge, Counter import asyncio from geode_client import Client # Define custom metrics active_queries_gauge = Gauge('geode_custom_active_queries', 'Currently executing queries', ['query_type']) cache_memory_gauge = Gauge('geode_custom_cache_memory_bytes', 'Memory used by query cache') async def collect_custom_metrics(client): """Collect custom Geode metrics""" while True: # Active queries by type result, _ = await client.query(""" CALL system.active_queries() RETURN query_type, count(*) as count """) for row in result.rows: active_queries_gauge.labels( query_type=row['query_type'] ).set(row['count']) # Cache memory usage cache_stats, _ = await client.query(""" CALL system.cache_stats() RETURN memory_bytes """) cache_memory_gauge.set(cache_stats.rows[0]['memory_bytes']) await asyncio.sleep(15) # Scrape interval if __name__ == '__main__': start_http_server(9090) client = Client("localhost:3141") asyncio.run(collect_custom_metrics(client)) </code></pre></div> <h4 id="metric-relabeling" class="position-relative d-flex align-items-center group"> Metric Relabeling <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="metric-relabeling" aria-haspopup="dialog" aria-label="Share link: Metric Relabeling"> Share link </button> </h4>Transform metrics during scraping: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">scrape_configs: - job_name: 'geode' static_configs: - targets: ['localhost:8080'] metric_relabel_configs: # Drop high-cardinality debug metrics in production - source_labels: [__name__] regex: 'geode_debug_.*' action: drop # Rename legacy metrics - source_labels: [__name__] regex: 'geode_old_metric_name' target_label: __name__ replacement: 'geode_new_metric_name' # Aggregate instance-level metrics to cluster level - source_labels: [instance] regex: 'geode-node-.*' target_label: cluster replacement: 'geode-production' </code></pre></div> <h3 id="alert-management" class="position-relative d-flex align-items-center group"> Alert Management <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alert-management" aria-haspopup="dialog" aria-label="Share link: Alert Management"> Share link </button> </h3> <h4 id="alertmanager-configuration" class="position-relative d-flex align-items-center group"> Alertmanager Configuration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alertmanager-configuration" aria-haspopup="dialog" aria-label="Share link: Alertmanager Configuration"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># alertmanager.yml global: resolve_timeout: 5m slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' route: group_by: ['alertname', 'cluster'] group_wait: 10s group_interval: 10s repeat_interval: 12h receiver: 'team-database' routes: - match: severity: critical receiver: 'team-database-pager' continue: true - match: severity: warning receiver: 'team-database-slack' receivers: - name: 'team-database' email_configs: - to: '[email protected]' - name: 'team-database-pager' pagerduty_configs: - service_key: 'YOUR_PAGERDUTY_KEY' - name: 'team-database-slack' slack_configs: - channel: '#database-alerts' title: 'Geode Alert' text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' </code></pre></div> <h4 id="alert-inhibition-rules" class="position-relative d-flex align-items-center group"> Alert Inhibition Rules <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alert-inhibition-rules" aria-haspopup="dialog" aria-label="Share link: Alert Inhibition Rules"> Share link </button> </h4>Prevent alert storms with inhibition: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># alertmanager.yml (continued) inhibit_rules: # Don't alert on high memory if instance is down - source_match: severity: 'critical' alertname: 'GeodeDown' target_match: severity: 'warning' equal: ['instance'] # Don't alert on slow queries if database is overloaded - source_match: alertname: 'HighQueryErrorRate' target_match: alertname: 'SlowQueryRateIncreasing' equal: ['instance'] </code></pre></div> <h4 id="alert-templates" class="position-relative d-flex align-items-center group"> Alert Templates <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alert-templates" aria-haspopup="dialog" aria-label="Share link: Alert Templates"> Share link </button> </h4>Create informative alert messages: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">groups: - name: geode_alerts rules: - alert: HighErrorRate expr: | rate(geode_queries_total{status="error"}[5m]) > 10 annotations: summary: "High query error rate on {{ $labels.instance }}" description: | Query error rate is {{ $value | humanize }} errors/sec. Current stats: - Instance: {{ $labels.instance }} - Error rate: {{ $value | humanize }}/sec - Time: {{ $labels.timestamp }} Runbook: https://docs.geodedb.com/runbooks/high-error-rate dashboard: "https://grafana.example.com/d/geode-errors" runbook_url: "https://docs.geodedb.com/runbooks/high-error-rate" </code></pre></div> <h3 id="grafana-integration" class="position-relative d-flex align-items-center group"> Grafana Integration <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="grafana-integration" aria-haspopup="dialog" aria-label="Share link: Grafana Integration"> Share link </button> </h3> <h4 id="provisioning-dashboards" class="position-relative d-flex align-items-center group"> Provisioning Dashboards <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="provisioning-dashboards" aria-haspopup="dialog" aria-label="Share link: Provisioning Dashboards"> Share link </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># grafana/provisioning/dashboards/geode.yml apiVersion: 1 providers: - name: 'Geode' orgId: 1 folder: 'Database' type: file disableDeletion: false updateIntervalSeconds: 10 allowUiUpdates: true options: path: /etc/grafana/dashboards/geode </code></pre></div> <h4 id="dashboard-variables" class="position-relative d-flex align-items-center group"> Dashboard Variables <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dashboard-variables" aria-haspopup="dialog" aria-label="Share link: Dashboard Variables"> Share link </button> </h4>Create dynamic dashboards with variables: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "templating": { "list": [ { "name": "instance", "type": "query", "datasource": "Prometheus", "query": "label_values(geode_up, instance)", "multi": true, "includeAll": true }, { "name": "time_range", "type": "interval", "query": "1m,5m,15m,30m,1h,6h,12h,1d", "current": { "text": "5m", "value": "5m" } } ] }, "panels": [ { "title": "Query Rate", "targets": [ { "expr": "sum(rate(geode_queries_total{instance=~\"$instance\"}[$time_range]))" } ] } ] } </code></pre></div> <h4 id="panel-examples" class="position-relative d-flex align-items-center group"> Panel Examples <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="panel-examples" aria-haspopup="dialog" aria-label="Share link: Panel Examples"> Share link </button> </h4>Heatmap Panel for query duration distribution: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "type": "heatmap", "title": "Query Duration Heatmap", "targets": [ { "expr": "sum(rate(geode_query_duration_seconds_bucket{instance=~\"$instance\"}[$time_range])) by (le)", "format": "heatmap", "legendFormat": "{{le}}" } ], "dataFormat": "tsbuckets", "yAxis": { "format": "s" } } </code></pre></div>Status Panel for system health: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json">{ "type": "stat", "title": "System Status", "targets": [ { "expr": "up{job=\"geode\"}", "instant": true } ], "mappings": [ { "value": 1, "text": "UP", "color": "green" }, { "value": 0, "text": "DOWN", "color": "red" } ] } </code></pre></div> <h3 id="advanced-querying-techniques" class="position-relative d-flex align-items-center group"> Advanced Querying Techniques <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-querying-techniques" aria-haspopup="dialog" aria-label="Share link: Advanced Querying Techniques"> Share link </button> </h3> <h4 id="quantile-aggregation" class="position-relative d-flex align-items-center group"> Quantile Aggregation <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="quantile-aggregation" aria-haspopup="dialog" aria-label="Share link: Quantile Aggregation"> Share link </button> </h4>Calculate percentiles across multiple instances: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># P99 query latency across all instances histogram_quantile(0.99, sum(rate(geode_query_duration_seconds_bucket[5m])) by (le) ) # P99 per instance histogram_quantile(0.99, sum(rate(geode_query_duration_seconds_bucket[5m])) by (le, instance) ) </code></pre></div> <h4 id="rate-calculation-windows" class="position-relative d-flex align-items-center group"> Rate Calculation Windows <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="rate-calculation-windows" aria-haspopup="dialog" aria-label="Share link: Rate Calculation Windows"> Share link </button> </h4>Choose appropriate time windows: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Very short window (1m) - noisy but responsive rate(geode_queries_total[1m]) # Medium window (5m) - balanced rate(geode_queries_total[5m]) # Long window (1h) - smooth but slow to react rate(geode_queries_total[1h]) # Use irate for instant rate (last 2 samples) irate(geode_queries_total[5m]) </code></pre></div> <h4 id="subquery-patterns" class="position-relative d-flex align-items-center group"> Subquery Patterns <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="subquery-patterns" aria-haspopup="dialog" aria-label="Share link: Subquery Patterns"> Share link </button> </h4>Calculate rate of rate (acceleration): <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"># Query acceleration (change in query rate) deriv( rate(geode_queries_total[5m])[10m:1m] ) # Predict future value predict_linear(geode_queries_total[1h], 3600) </code></pre></div> <h3 id="monitoring-at-scale" class="position-relative d-flex align-items-center group"> Monitoring at Scale <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="monitoring-at-scale" aria-haspopup="dialog" aria-label="Share link: Monitoring at Scale"> Share link </button> </h3> <h4 id="hierarchical-federation" class="position-relative d-flex align-items-center group"> Hierarchical Federation <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="hierarchical-federation" aria-haspopup="dialog" aria-label="Share link: Hierarchical Federation"> Share link </button> </h4>For large deployments with multiple regions: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># Regional Prometheus federates from local instances - job_name: 'federate-local' honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="geode"}' static_configs: - targets: - 'prometheus-az1:9090' - 'prometheus-az2:9090' - 'prometheus-az3:9090' # Global Prometheus federates from regional instances - job_name: 'federate-regional' honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="geode"}' static_configs: - targets: - 'prometheus-us-east:9090' - 'prometheus-us-west:9090' - 'prometheus-eu-west:9090' </code></pre></div> <h4 id="remote-write-for-long-term-storage" class="position-relative d-flex align-items-center group"> Remote Write for Long-Term Storage <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="remote-write-for-long-term-storage" aria-haspopup="dialog" aria-label="Share link: Remote Write for Long-Term Storage"> Share link </button> </h4>Send metrics to long-term storage: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml">remote_write: - url: "https://prometheus-remote-storage.example.com/api/v1/write" basic_auth: username: 'geode-metrics' password: 'SECRET' write_relabel_configs: # Only send aggregate metrics for long-term storage - source_labels: [__name__] regex: 'geode_(queries_total|query_duration_seconds_bucket|memory_used_bytes)' action: keep queue_config: capacity: 10000 max_shards: 10 min_shards: 1 max_samples_per_send: 5000 </code></pre></div> <h4 id="thanos-for-global-view" class="position-relative d-flex align-items-center group"> Thanos for Global View <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="thanos-for-global-view" aria-haspopup="dialog" aria-label="Share link: Thanos for Global View"> Share link </button> </h4>Deploy Thanos for unlimited retention and global querying: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"># Prometheus with Thanos sidecar global: external_labels: cluster: 'geode-production' region: 'us-east-1' # Thanos sidecar configuration --storage.tsdb.path=/prometheus --objstore.config-file=/etc/thanos/bucket.yml --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:10902 </code></pre></div> <h3 id="best-practices" class="position-relative d-flex align-items-center group"> Best Practices <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="best-practices" aria-haspopup="dialog" aria-label="Share link: Best Practices"> Share link </button> </h3>Appropriate Scrape Intervals: Balance between data resolution and storage overhead. 15-30 seconds works well for most deployments. Use Recording Rules: Pre-compute expensive queries to reduce dashboard load times. Set Alert Thresholds Based on Baselines: Monitor normal behavior before setting alert thresholds to minimize false positives. Implement Alert Routing: Use Alertmanager to route alerts to appropriate teams and communication channels. Retain Historical Data: Configure appropriate retention periods to support trend analysis and capacity planning. Monitor Prometheus Itself: Ensure your monitoring system is healthy and performant. Label Consistently: Use consistent label naming across all metrics for easier querying and aggregation. Dashboard Organization: Create focused dashboards for different personas (developers, operators, executives). Metric Naming: Follow Prometheus naming conventions (unit suffix, descriptive names). Cardinality Management: Keep label cardinality bounded to prevent memory issues. <h3 id="troubleshooting-common-issues" class="position-relative d-flex align-items-center group"> Troubleshooting Common Issues <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="troubleshooting-common-issues" aria-haspopup="dialog" aria-label="Share link: Troubleshooting Common Issues"> Share link </button> </h3>Missing Metrics: Verify Geode metrics endpoint is accessible and Prometheus can reach it. High Cardinality: Avoid labels with unbounded values (e.g., user IDs, session IDs). Storage Issues: Monitor Prometheus disk usage and adjust retention policies as needed. Slow Queries: Use recording rules to pre-aggregate complex queries used in dashboards. Scrape Failures: Check network connectivity, TLS certificates, and authentication. Out of Memory: Reduce retention period, use recording rules, or add more RAM. Slow Dashboards: Optimize queries, use recording rules, reduce time ranges. <h3 id="production-deployment-checklist" class="position-relative d-flex align-items-center group"> Production Deployment Checklist <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="production-deployment-checklist" aria-haspopup="dialog" aria-label="Share link: Production Deployment Checklist"> Share link </button> </h3><ul> <li><input disabled="" type="checkbox"> Scrape intervals configured appropriately</li> <li><input disabled="" type="checkbox"> Recording rules created for expensive queries</li> <li><input disabled="" type="checkbox"> Alert rules defined and tested</li> <li><input disabled="" type="checkbox"> Alertmanager configured with proper routing</li> <li><input disabled="" type="checkbox"> Dashboards created for key metrics</li> <li><input disabled="" type="checkbox"> Retention period set based on requirements</li> <li><input disabled="" type="checkbox"> Backup strategy for Prometheus data</li> <li><input disabled="" type="checkbox"> Monitoring of Prometheus itself</li> <li><input disabled="" type="checkbox"> High availability setup (if required)</li> <li><input disabled="" type="checkbox"> Documentation of metrics and alerts</li> <li><input disabled="" type="checkbox"> Runbooks created for common alerts</li> <li><input disabled="" type="checkbox"> Load testing of monitoring stack</li> </ul> <h3 id="related-topics" class="position-relative d-flex align-items-center group"> Related Topics <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="related-topics" aria-haspopup="dialog" aria-label="Share link: Related Topics"> Share link </button> </h3><ul> <li><a href="/tags/monitoring/" >Monitoring and Observability</a> </li> <li><a href="/tags/metrics/" >Metrics and Performance</a> </li> <li><a href="/tags/observability/" >Observability Best Practices</a> </li> <li><a href="/tags/tools/" >Grafana Dashboards</a> </li> <li><a href="/tags/operations/" >Alerting Strategies</a> </li> <li><a href="/tags/performance/" >Performance Tuning</a> </li> </ul> <h3 id="further-reading" class="position-relative d-flex align-items-center group"> Further Reading <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="further-reading" aria-haspopup="dialog" aria-label="Share link: Further Reading"> Share link </button> </h3><ul> <li>Prometheus Configuration Guide</li> <li>Grafana Dashboard Templates for Geode</li> <li>Alert Runbook Examples</li> <li>Performance Tuning with Metrics</li> <li>Multi-Cluster Monitoring Patterns</li> <li>Prometheus Best Practices</li> <li>Recording Rules Guide</li> <li>Federation Patterns</li> </ul>

Popular

Related Articles

Monitoring

Monitoring and Telemetry

Monitoring Guide