<!-- CANARY: REQ=REQ-DOCS-001; FEATURE="Docs"; ASPECT=Documentation; STATUS=TESTED; OWNER=docs; UPDATED=2026-01-15 --> <p>Prometheus is the industry-standard monitoring solution for cloud-native applications, and Geode provides first-class Prometheus integration. Geode exposes comprehensive metrics covering queries, transactions, connections, storage, memory, and system health through a Prometheus-compatible <code>/metrics</code> endpoint.</p> <p>By integrating Geode with Prometheus, you gain real-time visibility into database performance, can set up intelligent alerting on critical conditions, and build comprehensive dashboards for operational insights. Combined with Grafana for visualization, Prometheus and Geode form a powerful observability stack for production graph database deployments.</p> <p>This guide covers Prometheus integration patterns, essential metrics, alerting rules, and visualization strategies for monitoring Geode effectively.</p> <h3 id="prometheus-architecture-with-geode" class="position-relative d-flex align-items-center group"> <span>Prometheus Architecture with Geode</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="prometheus-architecture-with-geode" aria-haspopup="dialog" aria-label="Share link: Prometheus Architecture with Geode"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><div id="headingShareModal" class="heading-share-modal" role="dialog" aria-modal="true" aria-labelledby="headingShareTitle" hidden> <div class="hsm-dialog" role="document"> <div class="hsm-header"> <h2 id="headingShareTitle" class="h6 mb-0 fw-bold">Share this section</h2> <button type="button" class="hsm-close" aria-label="Close"> <i class="fa-solid fa-xmark"></i> </button> </div> <div class="hsm-body"> <label for="headingShareInput" class="form-label small text-muted mb-1 text-uppercase fw-bold" style="font-size: 0.7rem; letter-spacing: 0.5px;">Permalink</label> <div class="input-group mb-4 hsm-url-group"> <input id="headingShareInput" type="text" class="form-control font-monospace" readonly aria-readonly="true" style="font-size: 0.85rem;" /> <button class="btn btn-primary hsm-copy" type="button" aria-label="Copy" title="Copy"> <i class="fa-duotone fa-clipboard" aria-hidden="true"></i> </button> </div> <div class="small fw-bold mb-2 text-muted text-uppercase" style="font-size: 0.7rem; letter-spacing: 0.5px;">Share via</div> <div class="hsm-share-grid"> <a id="share-twitter" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> <i class="fa-brands fa-twitter me-2"></i>Twitter </a> <a id="share-linkedin" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> <i class="fa-brands fa-linkedin me-2"></i>LinkedIn </a> <a id="share-facebook" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer"> <i class="fa-brands fa-facebook me-2"></i>Facebook </a> </div> </div> </div> </div> <style> .heading-share-modal { position: fixed; inset: 0; display: flex; justify-content: center; align-items: center; background: rgba(0, 0, 0, 0.6); z-index: 1050; padding: 1rem; backdrop-filter: blur(4px); -webkit-backdrop-filter: blur(4px); } .heading-share-modal[hidden] { display: none !important; } .hsm-dialog { max-width: 420px; width: 100%; background: var(--bs-body-bg, #fff); color: var(--bs-body-color, #212529); border: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); border-radius: 1rem; box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25); overflow: hidden; animation: hsm-fade-in 0.2s ease-out; } @keyframes hsm-fade-in { from { opacity: 0; transform: scale(0.95); } to { opacity: 1; transform: scale(1); } } [data-bs-theme="dark"] .hsm-dialog { background: #1e293b; border-color: rgba(255,255,255,0.1); color: #f8f9fa; } .hsm-header { display: flex; justify-content: space-between; align-items: center; padding: 1rem 1.5rem; border-bottom: 1px solid var(--bs-border-color, rgba(0,0,0,0.1)); background: rgba(0,0,0,0.02); } [data-bs-theme="dark"] .hsm-header { background: rgba(255,255,255,0.02); border-color: rgba(255,255,255,0.1); } .hsm-close { background: transparent; border: none; color: inherit; opacity: 0.5; padding: 0.25rem 0.5rem; border-radius: 0.25rem; font-size: 1.2rem; line-height: 1; transition: opacity 0.2s; } .hsm-close:hover { opacity: 1; } .hsm-body { padding: 1.5rem; } .hsm-url-group { display: flex !important; align-items: stretch; } .hsm-url-group .form-control { flex: 1; min-width: 0; margin: 0; background: var(--bs-secondary-bg, #f8f9fa); border-color: var(--bs-border-color, #dee2e6); border-top-right-radius: 0; border-bottom-right-radius: 0; height: 42px; } .hsm-url-group .btn { flex: 0 0 auto; margin: 0; margin-left: -1px; border-top-left-radius: 0; border-bottom-left-radius: 0; height: 42px; display: flex; align-items: center; justify-content: center; padding: 0 1.25rem; z-index: 2; } [data-bs-theme="dark"] .hsm-url-group .form-control { background: #0f172a; border-color: #334155; color: #e2e8f0; } .hsm-share-grid { display: flex; flex-direction: column; gap: 0.5rem; } .hsm-share-grid .btn { display: flex; align-items: center; justify-content: center; font-size: 0.9rem; padding: 0.6rem; border-color: var(--bs-border-color); width: 100%; } [data-bs-theme="dark"] .hsm-share-grid .btn { color: #e2e8f0; border-color: #475569; } [data-bs-theme="dark"] .hsm-share-grid .btn:hover { background: #334155; border-color: #cbd5e1; } </style> <script> (function(){ const modal = document.getElementById('headingShareModal'); if(!modal) return; const input = modal.querySelector('#headingShareInput'); const copyBtn = modal.querySelector('.hsm-copy'); const twitter = modal.querySelector('#share-twitter'); const linkedin = modal.querySelector('#share-linkedin'); const facebook = modal.querySelector('#share-facebook'); const closeBtn = modal.querySelector('.hsm-close'); let lastFocus=null; let trapBound=false; function buildUrl(id){ return window.location.origin + window.location.pathname + '#' + id; } function isOpen(){ return !modal.hasAttribute('hidden'); } function hydrate(id){ const url=buildUrl(id); input.value=url; const enc=encodeURIComponent(url); const text=encodeURIComponent(document.title); if(twitter) twitter.href=`https://twitter.com/intent/tweet?url=${enc}&text=${text}`; if(linkedin) linkedin.href=`https://www.linkedin.com/sharing/share-offsite/?url=${enc}`; if(facebook) facebook.href=`https://www.facebook.com/sharer/sharer.php?u=${enc}`; } function openModal(id){ lastFocus=document.activeElement; hydrate(id); if(!isOpen()){ modal.removeAttribute('hidden'); } requestAnimationFrame(()=>{ input.focus(); }); trapFocus(); } function closeModal(){ if(!isOpen()) return; modal.setAttribute('hidden',''); if(lastFocus && typeof lastFocus.focus==='function') lastFocus.focus(); } function copyCurrent(){ try{ navigator.clipboard.writeText(input.value).then(()=>feedback(true),()=>fallback()); } catch(e){ fallback(); } } function fallback(){ input.select(); try{ document.execCommand('copy'); feedback(true);}catch(e){ feedback(false);} } function feedback(ok){ if(!copyBtn) return; const icon=copyBtn.querySelector('i'); if(!icon) return; const prev=copyBtn.getAttribute('data-prev')||icon.className; if(!copyBtn.getAttribute('data-prev')) copyBtn.setAttribute('data-prev',prev); icon.className= ok ? 'fa-duotone fa-clipboard-check':'fa-duotone fa-circle-exclamation'; setTimeout(()=>{ icon.className=prev; },1800); } function handleShareClick(e){ e.preventDefault(); const btn=e.currentTarget; const id=btn.getAttribute('data-share-target'); if(id) openModal(id); } function bindShareButtons(){ document.querySelectorAll('.h-share').forEach(btn=>{ if(!btn.dataset.hShareBound){ btn.addEventListener('click', handleShareClick); btn.dataset.hShareBound='1'; } }); } bindShareButtons(); if(document.readyState==='loading'){ document.addEventListener('DOMContentLoaded', bindShareButtons); } else { requestAnimationFrame(bindShareButtons); } document.addEventListener('click', function(e){ const shareBtn=e.target.closest && e.target.closest('.h-share'); if(shareBtn && !shareBtn.dataset.hShareBound){ handleShareClick.call(shareBtn, e); } }, true); document.addEventListener('click', e=>{ if(e.target===modal) closeModal(); if(e.target.closest && e.target.closest('.hsm-close')){ e.preventDefault(); closeModal(); } if(copyBtn && (e.target===copyBtn || (e.target.closest && e.target.closest('.hsm-copy')))) { e.preventDefault(); copyCurrent(); } }); document.addEventListener('keydown', e=>{ if(e.key==='Escape' && isOpen()) closeModal(); }); function trapFocus(){ if(trapBound) return; trapBound=true; modal.addEventListener('keydown', f=>{ if(f.key==='Tab' && isOpen()){ const focusable=[...modal.querySelectorAll('a[href],button,input,textarea,select,[tabindex]:not([tabindex="-1"])')].filter(el=>!el.hasAttribute('disabled')); if(!focusable.length) return; const first=focusable[0]; const last=focusable[focusable.length-1]; if(f.shiftKey && document.activeElement===first){ f.preventDefault(); last.focus(); } else if(!f.shiftKey && document.activeElement===last){ f.preventDefault(); first.focus(); } } }); } if(closeBtn) closeBtn.addEventListener('click', e=>{ e.preventDefault(); closeModal(); }); })(); </script><p>Prometheus uses a pull-based model where it periodically scrapes metrics from instrumented applications. Geode exposes metrics at <code>http://geode-host:8080/metrics</code> in Prometheus exposition format:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># HELP geode_queries_total Total number of queries executed </span></span><span class="line"><span class="cl"># TYPE geode_queries_total counter </span></span><span class="line"><span class="cl">geode_queries_total{status=&#34;success&#34;} 125847 </span></span><span class="line"><span class="cl">geode_queries_total{status=&#34;error&#34;} 342 </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"># HELP geode_query_duration_seconds Query execution duration </span></span><span class="line"><span class="cl"># TYPE geode_query_duration_seconds histogram </span></span><span class="line"><span class="cl">geode_query_duration_seconds_bucket{le=&#34;0.01&#34;} 45234 </span></span><span class="line"><span class="cl">geode_query_duration_seconds_bucket{le=&#34;0.05&#34;} 89432 </span></span><span class="line"><span class="cl">geode_query_duration_seconds_bucket{le=&#34;0.1&#34;} 112847 </span></span><span class="line"><span class="cl">geode_query_duration_seconds_sum 12847.3 </span></span><span class="line"><span class="cl">geode_query_duration_seconds_count 125847 </span></span></code></pre></div> <h3 id="configuring-prometheus-scraping" class="position-relative d-flex align-items-center group"> <span>Configuring Prometheus Scraping</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="configuring-prometheus-scraping" aria-haspopup="dialog" aria-label="Share link: Configuring Prometheus Scraping"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Basic Prometheus Configuration</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># prometheus.yml</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">global</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">scrape_interval</span><span class="p">:</span><span class="w"> </span><span class="l">15s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">evaluation_interval</span><span class="p">:</span><span class="w"> </span><span class="l">15s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">external_labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">cluster</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-production&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">scrape_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;localhost:8080&#39;</span><span class="p">]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">instance</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-primary&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;production&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">scrape_interval</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">scrape_timeout</span><span class="p">:</span><span class="w"> </span><span class="l">5s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;/metrics&#39;</span><span class="w"> </span></span></span></code></pre></div><p><strong>Multi-Instance Configuration</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-cluster&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;geode-node-1:8080&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;geode-node-2:8080&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;geode-node-3:8080&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">cluster</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-prod&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">datacenter</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;us-east-1&#39;</span><span class="w"> </span></span></span></code></pre></div><p><strong>Kubernetes Service Discovery</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-k8s&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">kubernetes_sd_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="l">pod</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">namespaces</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">names</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">geode</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">relabel_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_kubernetes_pod_label_app]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">keep</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="l">geode</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_kubernetes_pod_name]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">instance</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_kubernetes_namespace]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">namespace</span><span class="w"> </span></span></span></code></pre></div> <h3 id="essential-geode-metrics" class="position-relative d-flex align-items-center group"> <span>Essential Geode Metrics</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="essential-geode-metrics" aria-haspopup="dialog" aria-label="Share link: Essential Geode Metrics"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Query Performance Metrics</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Query rate by status</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Average query duration (p50, p95, p99)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.50</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Slow query rate (queries &gt; 1s)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_slow_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Query errors by type</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">error_type</span><span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_errors_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span></code></pre></div><p><strong>Transaction Metrics</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Transaction commit rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">&#34;</span><span class="s">committed</span><span class="p">&#34;}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Transaction rollback rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">&#34;</span><span class="s">rolled_back</span><span class="p">&#34;}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Transaction conflict rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_conflicts_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Active transactions</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_transactions</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Transaction duration</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span></code></pre></div><p><strong>Connection Metrics</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Active connections</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_connections</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Connection pool utilization</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_connections</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_max_connections</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Connection error rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_connection_errors_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Connections by client type</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">client_type</span><span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="nv">geode_active_connections</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div><p><strong>Memory and Resource Metrics</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Memory usage</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Memory usage percentage</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_memory_total_bytes</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Cache hit rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cache_hits_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cache_hits_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cache_misses_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># MVCC version count</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_mvcc_versions_total</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Garbage collection time</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_gc_duration_seconds_sum</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div><p><strong>Storage Metrics</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Disk space used</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_disk_used_bytes</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Disk space available</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_disk_free_bytes</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># WAL size</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_wal_size_bytes</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Disk I/O operations</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_disk_io_operations_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Checkpoint duration</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_checkpoint_duration_seconds</span><span class="w"> </span></span></span></code></pre></div><p><strong>Index Metrics</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Index size</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_index_size_bytes</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Index lookups</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_index_lookups_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Index hit rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_index_hits_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_index_lookups_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div> <h3 id="alerting-rules" class="position-relative d-flex align-items-center group"> <span>Alerting Rules</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alerting-rules" aria-haspopup="dialog" aria-label="Share link: Alerting Rules"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Critical Alerts</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">groups</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_critical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">30s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">GeodeDown</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="l">up{job=&#34;geode&#34;} == 0</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">1m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Geode instance {{ $labels.instance }} is down&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Geode has been unreachable for more than 1 minute&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">HighQueryErrorRate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_queries_total{status=&#34;error&#34;}[5m]) &gt; 10</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;High query error rate on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Query error rate is {{ $value }} errors/sec&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">DiskSpaceCritical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> geode_disk_free_bytes / geode_disk_total_bytes &lt; 0.05</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Critical disk space on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Only {{ $value | humanizePercentage }} disk space remaining&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">MemoryExhaustion</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> geode_memory_used_bytes / geode_memory_total_bytes &gt; 0.95</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">10m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Memory near exhaustion on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Memory usage at {{ $value | humanizePercentage }}&#34;</span><span class="w"> </span></span></span></code></pre></div><p><strong>Warning Alerts</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_warnings</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">1m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">SlowQueryRateIncreasing</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_slow_queries_total[5m]) &gt; 5</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">10m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Slow query rate increasing on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{ $value }} slow queries per second&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">HighTransactionConflicts</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_transaction_conflicts_total[5m]) &gt; 100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">10m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;High transaction conflict rate&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{ $value }} conflicts per second&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">ConnectionPoolPressure</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> geode_active_connections / geode_max_connections &gt; 0.8</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">15m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Connection pool pressure on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{ $value | humanizePercentage }} of connections used&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">LongRunningQueries</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> geode_active_queries{duration_seconds=&#34;&gt;60&#34;} &gt; 5</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Multiple long-running queries detected&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{ $value }} queries running for &gt;60 seconds&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">WalSizeGrowing</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> deriv(geode_wal_size_bytes[30m]) &gt; 1e9</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">30m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;WAL size growing rapidly on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;WAL growing at {{ $value | humanize1024 }}/sec&#34;</span><span class="w"> </span></span></span></code></pre></div> <h3 id="grafana-dashboard-queries" class="position-relative d-flex align-items-center group"> <span>Grafana Dashboard Queries</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="grafana-dashboard-queries" aria-haspopup="dialog" aria-label="Share link: Grafana Dashboard Queries"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Query Performance Dashboard</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Panel: Query Rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">status</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Query Latency Percentiles</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.50</span><span class="p">,</span><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Top Slow Queries</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">topk</span><span class="o">(</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nv">geode_query_duration_seconds</span><span class="p">{</span><span class="nl">quantile</span><span class="o">=</span><span class="p">&#34;</span><span class="s">0.99</span><span class="p">&#34;}</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Query Errors by Type</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">error_type</span><span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_errors_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span></span></span></code></pre></div><p><strong>System Health Dashboard</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Panel: CPU Usage</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cpu_seconds_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Memory Usage</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_memory_total_bytes</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Disk Usage</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_disk_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_disk_total_bytes</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Active Connections</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_connections</span><span class="w"> </span></span></span></code></pre></div><p><strong>Transaction Dashboard</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Panel: Transaction Throughput</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">&#34;</span><span class="s">committed</span><span class="p">&#34;}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Transaction Success Rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">&#34;</span><span class="s">committed</span><span class="p">&#34;}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Transaction Conflicts</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_conflicts_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Transaction Duration Heatmap</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div> <h3 id="recording-rules-for-performance" class="position-relative d-flex align-items-center group"> <span>Recording Rules for Performance</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="recording-rules-for-performance" aria-haspopup="dialog" aria-label="Share link: Recording Rules for Performance"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p>Pre-aggregate expensive queries using recording rules:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">groups</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_recordings</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">15s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_query_rate:5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="l">sum(rate(geode_queries_total[5m])) by (job, instance)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_query_latency_p95:5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="l">histogram_quantile(0.95, sum(rate(geode_query_duration_seconds_bucket[5m])) by (job, instance, le))</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_memory_usage_percent:current</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> geode_memory_used_bytes / geode_memory_total_bytes * 100</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_cache_hit_rate:5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_cache_hits_total[5m]) / </span></span></span><span class="line"><span class="cl"><span class="sd"> (rate(geode_cache_hits_total[5m]) + rate(geode_cache_misses_total[5m]))</span><span class="w"> </span></span></span></code></pre></div> <h3 id="advanced-monitoring-patterns" class="position-relative d-flex align-items-center group"> <span>Advanced Monitoring Patterns</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-monitoring-patterns" aria-haspopup="dialog" aria-label="Share link: Advanced Monitoring Patterns"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Predict Disk Fullness</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="kr">predict_linear</span><span class="o">(</span><span class="nv">geode_disk_free_bytes</span><span class="p">[</span><span class="s">1h</span><span class="p">],</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">3600</span><span class="o">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span></span></span></code></pre></div><p><strong>Detect Anomalies in Query Rate</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="kr">abs</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">avg_over_time</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="p">[</span><span class="s">1h</span><span class="err">:</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">stddev_over_time</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="p">[</span><span class="s">1h</span><span class="err">:</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div><p><strong>Correlate Query Performance with Memory Pressure</strong>:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_sum</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_count</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="ow">and</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_memory_total_bytes</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mf">0.8</span><span class="w"> </span></span></span></code></pre></div> <h3 id="federation-for-multi-cluster-monitoring" class="position-relative d-flex align-items-center group"> <span>Federation for Multi-Cluster Monitoring</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="federation-for-multi-cluster-monitoring" aria-haspopup="dialog" aria-label="Share link: Federation for Multi-Cluster Monitoring"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p>Monitor multiple Geode clusters from a central Prometheus:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Central Prometheus configuration</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">scrape_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;federate-us-east&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;/federate&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">&#39;match[]&#39;</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;{job=&#34;geode&#34;}&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-us-east:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;us-east-1&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;federate-us-west&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;/federate&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">&#39;match[]&#39;</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;{job=&#34;geode&#34;}&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-us-west:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;us-west-2&#39;</span><span class="w"> </span></span></span></code></pre></div> <h3 id="advanced-prometheus-patterns" class="position-relative d-flex align-items-center group"> <span>Advanced Prometheus Patterns</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-prometheus-patterns" aria-haspopup="dialog" aria-label="Share link: Advanced Prometheus Patterns"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3> <h4 id="dynamic-service-discovery" class="position-relative d-flex align-items-center group"> <span>Dynamic Service Discovery</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dynamic-service-discovery" aria-haspopup="dialog" aria-label="Share link: Dynamic Service Discovery"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>For dynamic Geode clusters, use service discovery:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Consul service discovery</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">scrape_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-consul&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">consul_sd_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;localhost:8500&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">services</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;geode&#39;</span><span class="p">]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">relabel_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_consul_service]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">job</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_consul_node]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">instance</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_consul_tags]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;.*,environment=([^,]+),.*&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">environment</span><span class="w"> </span></span></span></code></pre></div> <h4 id="custom-exporters" class="position-relative d-flex align-items-center group"> <span>Custom Exporters</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="custom-exporters" aria-haspopup="dialog" aria-label="Share link: Custom Exporters"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Create custom exporters for Geode-specific metrics:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># geode_exporter.py</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">prometheus_client</span> <span class="kn">import</span> <span class="n">start_http_server</span><span class="p">,</span> <span class="n">Gauge</span><span class="p">,</span> <span class="n">Counter</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">asyncio</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">geode_client</span> <span class="kn">import</span> <span class="n">Client</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="c1"># Define custom metrics</span> </span></span><span class="line"><span class="cl"><span class="n">active_queries_gauge</span> <span class="o">=</span> <span class="n">Gauge</span><span class="p">(</span><span class="s1">&#39;geode_custom_active_queries&#39;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="s1">&#39;Currently executing queries&#39;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s1">&#39;query_type&#39;</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">cache_memory_gauge</span> <span class="o">=</span> <span class="n">Gauge</span><span class="p">(</span><span class="s1">&#39;geode_custom_cache_memory_bytes&#39;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="s1">&#39;Memory used by query cache&#39;</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">collect_custom_metrics</span><span class="p">(</span><span class="n">client</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="s2">&#34;&#34;&#34;Collect custom Geode metrics&#34;&#34;&#34;</span> </span></span><span class="line"><span class="cl"> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># Active queries by type</span> </span></span><span class="line"><span class="cl"> <span class="n">result</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">&#34;&#34;&#34; </span></span></span><span class="line"><span class="cl"><span class="s2"> CALL system.active_queries() </span></span></span><span class="line"><span class="cl"><span class="s2"> RETURN query_type, count(*) as count </span></span></span><span class="line"><span class="cl"><span class="s2"> &#34;&#34;&#34;</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">result</span><span class="o">.</span><span class="n">rows</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">active_queries_gauge</span><span class="o">.</span><span class="n">labels</span><span class="p">(</span> </span></span><span class="line"><span class="cl"> <span class="n">query_type</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="s1">&#39;query_type&#39;</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> <span class="p">)</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="s1">&#39;count&#39;</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># Cache memory usage</span> </span></span><span class="line"><span class="cl"> <span class="n">cache_stats</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">&#34;&#34;&#34; </span></span></span><span class="line"><span class="cl"><span class="s2"> CALL system.cache_stats() </span></span></span><span class="line"><span class="cl"><span class="s2"> RETURN memory_bytes </span></span></span><span class="line"><span class="cl"><span class="s2"> &#34;&#34;&#34;</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="n">cache_memory_gauge</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">cache_stats</span><span class="o">.</span><span class="n">rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">&#39;memory_bytes&#39;</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">15</span><span class="p">)</span> <span class="c1"># Scrape interval</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">start_http_server</span><span class="p">(</span><span class="mi">9090</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="s2">&#34;localhost:3141&#34;</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">asyncio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">collect_custom_metrics</span><span class="p">(</span><span class="n">client</span><span class="p">))</span> </span></span></code></pre></div> <h4 id="metric-relabeling" class="position-relative d-flex align-items-center group"> <span>Metric Relabeling</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="metric-relabeling" aria-haspopup="dialog" aria-label="Share link: Metric Relabeling"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Transform metrics during scraping:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;localhost:8080&#39;</span><span class="p">]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metric_relabel_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Drop high-cardinality debug metrics in production</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode_debug_.*&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">drop</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Rename legacy metrics</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode_old_metric_name&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">__name__</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">replacement</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode_new_metric_name&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Aggregate instance-level metrics to cluster level</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">instance]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-node-.*&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">cluster</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">replacement</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-production&#39;</span><span class="w"> </span></span></span></code></pre></div> <h3 id="alert-management" class="position-relative d-flex align-items-center group"> <span>Alert Management</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alert-management" aria-haspopup="dialog" aria-label="Share link: Alert Management"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3> <h4 id="alertmanager-configuration" class="position-relative d-flex align-items-center group"> <span>Alertmanager Configuration</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alertmanager-configuration" aria-haspopup="dialog" aria-label="Share link: Alertmanager Configuration"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># alertmanager.yml</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">global</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">resolve_timeout</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">slack_api_url</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">route</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">group_by</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;alertname&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;cluster&#39;</span><span class="p">]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">group_wait</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">group_interval</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">repeat_interval</span><span class="p">:</span><span class="w"> </span><span class="l">12h</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">receiver</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;team-database&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">routes</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">match</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">receiver</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;team-database-pager&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">continue</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">match</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">receiver</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;team-database-slack&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">receivers</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;team-database&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">email_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">to</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;[email protected]&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;team-database-pager&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">pagerduty_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">service_key</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;YOUR_PAGERDUTY_KEY&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;team-database-slack&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">slack_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">channel</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;#database-alerts&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;Geode Alert&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">text</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}&#39;</span><span class="w"> </span></span></span></code></pre></div> <h4 id="alert-inhibition-rules" class="position-relative d-flex align-items-center group"> <span>Alert Inhibition Rules</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alert-inhibition-rules" aria-haspopup="dialog" aria-label="Share link: Alert Inhibition Rules"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Prevent alert storms with inhibition:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># alertmanager.yml (continued)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">inhibit_rules</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Don&#39;t alert on high memory if instance is down</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_match</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;critical&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">alertname</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;GeodeDown&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_match</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;warning&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">equal</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;instance&#39;</span><span class="p">]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Don&#39;t alert on slow queries if database is overloaded</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_match</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">alertname</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;HighQueryErrorRate&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_match</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">alertname</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;SlowQueryRateIncreasing&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">equal</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;instance&#39;</span><span class="p">]</span><span class="w"> </span></span></span></code></pre></div> <h4 id="alert-templates" class="position-relative d-flex align-items-center group"> <span>Alert Templates</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="alert-templates" aria-haspopup="dialog" aria-label="Share link: Alert Templates"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Create informative alert messages:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">groups</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_alerts</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">HighErrorRate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_queries_total{status=&#34;error&#34;}[5m]) &gt; 10</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;High query error rate on {{ $labels.instance }}&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> Query error rate is {{ $value | humanize }} errors/sec. </span></span></span><span class="line"><span class="cl"><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> Current stats: </span></span></span><span class="line"><span class="cl"><span class="sd"> - Instance: {{ $labels.instance }} </span></span></span><span class="line"><span class="cl"><span class="sd"> - Error rate: {{ $value | humanize }}/sec </span></span></span><span class="line"><span class="cl"><span class="sd"> - Time: {{ $labels.timestamp }} </span></span></span><span class="line"><span class="cl"><span class="sd"> </span></span></span><span class="line"><span class="cl"><span class="sd"> Runbook: https://docs.geodedb.com/runbooks/high-error-rate</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">dashboard</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;https://grafana.example.com/d/geode-errors&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">runbook_url</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;https://docs.geodedb.com/runbooks/high-error-rate&#34;</span><span class="w"> </span></span></span></code></pre></div> <h3 id="grafana-integration" class="position-relative d-flex align-items-center group"> <span>Grafana Integration</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="grafana-integration" aria-haspopup="dialog" aria-label="Share link: Grafana Integration"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3> <h4 id="provisioning-dashboards" class="position-relative d-flex align-items-center group"> <span>Provisioning Dashboards</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="provisioning-dashboards" aria-haspopup="dialog" aria-label="Share link: Provisioning Dashboards"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># grafana/provisioning/dashboards/geode.yml</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">providers</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;Geode&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgId</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">folder</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;Database&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">file</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">disableDeletion</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">updateIntervalSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">allowUiUpdates</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">options</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/grafana/dashboards/geode</span><span class="w"> </span></span></span></code></pre></div> <h4 id="dashboard-variables" class="position-relative d-flex align-items-center group"> <span>Dashboard Variables</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="dashboard-variables" aria-haspopup="dialog" aria-label="Share link: Dashboard Variables"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Create dynamic dashboards with variables:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;templating&#34;</span><span class="p">:</span> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;list&#34;</span><span class="p">:</span> <span class="p">[</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;instance&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;query&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;datasource&#34;</span><span class="p">:</span> <span class="s2">&#34;Prometheus&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;query&#34;</span><span class="p">:</span> <span class="s2">&#34;label_values(geode_up, instance)&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;multi&#34;</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;includeAll&#34;</span><span class="p">:</span> <span class="kc">true</span> </span></span><span class="line"><span class="cl"> <span class="p">},</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;time_range&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;interval&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;query&#34;</span><span class="p">:</span> <span class="s2">&#34;1m,5m,15m,30m,1h,6h,12h,1d&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;current&#34;</span><span class="p">:</span> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;text&#34;</span><span class="p">:</span> <span class="s2">&#34;5m&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;value&#34;</span><span class="p">:</span> <span class="s2">&#34;5m&#34;</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">]</span> </span></span><span class="line"><span class="cl"> <span class="p">},</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;panels&#34;</span><span class="p">:</span> <span class="p">[</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;title&#34;</span><span class="p">:</span> <span class="s2">&#34;Query Rate&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;targets&#34;</span><span class="p">:</span> <span class="p">[</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;expr&#34;</span><span class="p">:</span> <span class="s2">&#34;sum(rate(geode_queries_total{instance=~\&#34;$instance\&#34;}[$time_range]))&#34;</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">]</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">]</span> </span></span><span class="line"><span class="cl"><span class="p">}</span> </span></span></code></pre></div> <h4 id="panel-examples" class="position-relative d-flex align-items-center group"> <span>Panel Examples</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="panel-examples" aria-haspopup="dialog" aria-label="Share link: Panel Examples"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p><strong>Heatmap Panel</strong> for query duration distribution:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;heatmap&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;title&#34;</span><span class="p">:</span> <span class="s2">&#34;Query Duration Heatmap&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;targets&#34;</span><span class="p">:</span> <span class="p">[</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;expr&#34;</span><span class="p">:</span> <span class="s2">&#34;sum(rate(geode_query_duration_seconds_bucket{instance=~\&#34;$instance\&#34;}[$time_range])) by (le)&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;format&#34;</span><span class="p">:</span> <span class="s2">&#34;heatmap&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;legendFormat&#34;</span><span class="p">:</span> <span class="s2">&#34;{{le}}&#34;</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">],</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;dataFormat&#34;</span><span class="p">:</span> <span class="s2">&#34;tsbuckets&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;yAxis&#34;</span><span class="p">:</span> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;format&#34;</span><span class="p">:</span> <span class="s2">&#34;s&#34;</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"><span class="p">}</span> </span></span></code></pre></div><p><strong>Status Panel</strong> for system health:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;stat&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;title&#34;</span><span class="p">:</span> <span class="s2">&#34;System Status&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;targets&#34;</span><span class="p">:</span> <span class="p">[</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;expr&#34;</span><span class="p">:</span> <span class="s2">&#34;up{job=\&#34;geode\&#34;}&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;instant&#34;</span><span class="p">:</span> <span class="kc">true</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">],</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;mappings&#34;</span><span class="p">:</span> <span class="p">[</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;value&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;text&#34;</span><span class="p">:</span> <span class="s2">&#34;UP&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;color&#34;</span><span class="p">:</span> <span class="s2">&#34;green&#34;</span> </span></span><span class="line"><span class="cl"> <span class="p">},</span> </span></span><span class="line"><span class="cl"> <span class="p">{</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;value&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;text&#34;</span><span class="p">:</span> <span class="s2">&#34;DOWN&#34;</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="nt">&#34;color&#34;</span><span class="p">:</span> <span class="s2">&#34;red&#34;</span> </span></span><span class="line"><span class="cl"> <span class="p">}</span> </span></span><span class="line"><span class="cl"> <span class="p">]</span> </span></span><span class="line"><span class="cl"><span class="p">}</span> </span></span></code></pre></div> <h3 id="advanced-querying-techniques" class="position-relative d-flex align-items-center group"> <span>Advanced Querying Techniques</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="advanced-querying-techniques" aria-haspopup="dialog" aria-label="Share link: Advanced Querying Techniques"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3> <h4 id="quantile-aggregation" class="position-relative d-flex align-items-center group"> <span>Quantile Aggregation</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="quantile-aggregation" aria-haspopup="dialog" aria-label="Share link: Quantile Aggregation"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Calculate percentiles across multiple instances:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># P99 query latency across all instances</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># P99 per instance</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="p">,</span><span class="w"> </span><span class="nv">instance</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div> <h4 id="rate-calculation-windows" class="position-relative d-flex align-items-center group"> <span>Rate Calculation Windows</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="rate-calculation-windows" aria-haspopup="dialog" aria-label="Share link: Rate Calculation Windows"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Choose appropriate time windows:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Very short window (1m) - noisy but responsive</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">1m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Medium window (5m) - balanced</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Long window (1h) - smooth but slow to react</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">1h</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Use irate for instant rate (last 2 samples)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">irate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div> <h4 id="subquery-patterns" class="position-relative d-flex align-items-center group"> <span>Subquery Patterns</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="subquery-patterns" aria-haspopup="dialog" aria-label="Share link: Subquery Patterns"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Calculate rate of rate (acceleration):</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Query acceleration (change in query rate)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">deriv</span><span class="o">(</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="p">[</span><span class="s">10m</span><span class="err">:</span><span class="s">1m</span><span class="p">]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">)</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Predict future value</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">predict_linear</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">1h</span><span class="p">],</span><span class="w"> </span><span class="mi">3600</span><span class="o">)</span><span class="w"> </span></span></span></code></pre></div> <h3 id="monitoring-at-scale" class="position-relative d-flex align-items-center group"> <span>Monitoring at Scale</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="monitoring-at-scale" aria-haspopup="dialog" aria-label="Share link: Monitoring at Scale"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3> <h4 id="hierarchical-federation" class="position-relative d-flex align-items-center group"> <span>Hierarchical Federation</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="hierarchical-federation" aria-haspopup="dialog" aria-label="Share link: Hierarchical Federation"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>For large deployments with multiple regions:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Regional Prometheus federates from local instances</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;federate-local&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;/federate&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">&#39;match[]&#39;</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;{job=&#34;geode&#34;}&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-az1:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-az2:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-az3:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c"># Global Prometheus federates from regional instances</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;federate-regional&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;/federate&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">&#39;match[]&#39;</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;{job=&#34;geode&#34;}&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-us-east:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-us-west:9090&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">&#39;prometheus-eu-west:9090&#39;</span><span class="w"> </span></span></span></code></pre></div> <h4 id="remote-write-for-long-term-storage" class="position-relative d-flex align-items-center group"> <span>Remote Write for Long-Term Storage</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="remote-write-for-long-term-storage" aria-haspopup="dialog" aria-label="Share link: Remote Write for Long-Term Storage"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Send metrics to long-term storage:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">remote_write</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;https://prometheus-remote-storage.example.com/api/v1/write&#34;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">basic_auth</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">username</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-metrics&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">password</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;SECRET&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">write_relabel_configs</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Only send aggregate metrics for long-term storage</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode_(queries_total|query_duration_seconds_bucket|memory_used_bytes)&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">keep</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">queue_config</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">capacity</span><span class="p">:</span><span class="w"> </span><span class="m">10000</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">max_shards</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">min_shards</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">max_samples_per_send</span><span class="p">:</span><span class="w"> </span><span class="m">5000</span><span class="w"> </span></span></span></code></pre></div> <h4 id="thanos-for-global-view" class="position-relative d-flex align-items-center group"> <span>Thanos for Global View</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="thanos-for-global-view" aria-haspopup="dialog" aria-label="Share link: Thanos for Global View"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h4><p>Deploy Thanos for unlimited retention and global querying:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Prometheus with Thanos sidecar</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">global</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">external_labels</span><span class="p">:</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">cluster</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;geode-production&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;us-east-1&#39;</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c"># Thanos sidecar configuration</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">storage.tsdb.path=/prometheus</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">objstore.config-file=/etc/thanos/bucket.yml</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">grpc-address=0.0.0.0:10901</span><span class="w"> </span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">http-address=0.0.0.0:10902</span><span class="w"> </span></span></span></code></pre></div> <h3 id="best-practices" class="position-relative d-flex align-items-center group"> <span>Best Practices</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="best-practices" aria-haspopup="dialog" aria-label="Share link: Best Practices"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Appropriate Scrape Intervals</strong>: Balance between data resolution and storage overhead. 15-30 seconds works well for most deployments.</p> <p><strong>Use Recording Rules</strong>: Pre-compute expensive queries to reduce dashboard load times.</p> <p><strong>Set Alert Thresholds Based on Baselines</strong>: Monitor normal behavior before setting alert thresholds to minimize false positives.</p> <p><strong>Implement Alert Routing</strong>: Use Alertmanager to route alerts to appropriate teams and communication channels.</p> <p><strong>Retain Historical Data</strong>: Configure appropriate retention periods to support trend analysis and capacity planning.</p> <p><strong>Monitor Prometheus Itself</strong>: Ensure your monitoring system is healthy and performant.</p> <p><strong>Label Consistently</strong>: Use consistent label naming across all metrics for easier querying and aggregation.</p> <p><strong>Dashboard Organization</strong>: Create focused dashboards for different personas (developers, operators, executives).</p> <p><strong>Metric Naming</strong>: Follow Prometheus naming conventions (unit suffix, descriptive names).</p> <p><strong>Cardinality Management</strong>: Keep label cardinality bounded to prevent memory issues.</p> <h3 id="troubleshooting-common-issues" class="position-relative d-flex align-items-center group"> <span>Troubleshooting Common Issues</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="troubleshooting-common-issues" aria-haspopup="dialog" aria-label="Share link: Troubleshooting Common Issues"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><p><strong>Missing Metrics</strong>: Verify Geode metrics endpoint is accessible and Prometheus can reach it.</p> <p><strong>High Cardinality</strong>: Avoid labels with unbounded values (e.g., user IDs, session IDs).</p> <p><strong>Storage Issues</strong>: Monitor Prometheus disk usage and adjust retention policies as needed.</p> <p><strong>Slow Queries</strong>: Use recording rules to pre-aggregate complex queries used in dashboards.</p> <p><strong>Scrape Failures</strong>: Check network connectivity, TLS certificates, and authentication.</p> <p><strong>Out of Memory</strong>: Reduce retention period, use recording rules, or add more RAM.</p> <p><strong>Slow Dashboards</strong>: Optimize queries, use recording rules, reduce time ranges.</p> <h3 id="production-deployment-checklist" class="position-relative d-flex align-items-center group"> <span>Production Deployment Checklist</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="production-deployment-checklist" aria-haspopup="dialog" aria-label="Share link: Production Deployment Checklist"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><ul> <li><input disabled="" type="checkbox"> Scrape intervals configured appropriately</li> <li><input disabled="" type="checkbox"> Recording rules created for expensive queries</li> <li><input disabled="" type="checkbox"> Alert rules defined and tested</li> <li><input disabled="" type="checkbox"> Alertmanager configured with proper routing</li> <li><input disabled="" type="checkbox"> Dashboards created for key metrics</li> <li><input disabled="" type="checkbox"> Retention period set based on requirements</li> <li><input disabled="" type="checkbox"> Backup strategy for Prometheus data</li> <li><input disabled="" type="checkbox"> Monitoring of Prometheus itself</li> <li><input disabled="" type="checkbox"> High availability setup (if required)</li> <li><input disabled="" type="checkbox"> Documentation of metrics and alerts</li> <li><input disabled="" type="checkbox"> Runbooks created for common alerts</li> <li><input disabled="" type="checkbox"> Load testing of monitoring stack</li> </ul> <h3 id="related-topics" class="position-relative d-flex align-items-center group"> <span>Related Topics</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="related-topics" aria-haspopup="dialog" aria-label="Share link: Related Topics"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><ul> <li><a href="/tags/monitoring/" >Monitoring and Observability</a> </li> <li><a href="/tags/metrics/" >Metrics and Performance</a> </li> <li><a href="/tags/observability/" >Observability Best Practices</a> </li> <li><a href="/tags/tools/" >Grafana Dashboards</a> </li> <li><a href="/tags/operations/" >Alerting Strategies</a> </li> <li><a href="/tags/performance/" >Performance Tuning</a> </li> </ul> <h3 id="further-reading" class="position-relative d-flex align-items-center group"> <span>Further Reading</span> <button type="button" class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1" data-share-target="further-reading" aria-haspopup="dialog" aria-label="Share link: Further Reading"> <i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i> <span class="visually-hidden">Share link</span> </button> </h3><ul> <li>Prometheus Configuration Guide</li> <li>Grafana Dashboard Templates for Geode</li> <li>Alert Runbook Examples</li> <li>Performance Tuning with Metrics</li> <li>Multi-Cluster Monitoring Patterns</li> <li>Prometheus Best Practices</li> <li>Recording Rules Guide</li> <li>Federation Patterns</li> </ul>

Related Articles