<!-- CANARY: REQ=REQ-DOCS-001; FEATURE="Docs"; ASPECT=Documentation; STATUS=TESTED; OWNER=docs; UPDATED=2026-01-15 -->
<p>Prometheus is the industry-standard monitoring solution for cloud-native applications, and Geode provides first-class Prometheus integration. Geode exposes comprehensive metrics covering queries, transactions, connections, storage, memory, and system health through a Prometheus-compatible <code>/metrics</code> endpoint.</p>
<p>By integrating Geode with Prometheus, you gain real-time visibility into database performance, can set up intelligent alerting on critical conditions, and build comprehensive dashboards for operational insights. Combined with Grafana for visualization, Prometheus and Geode form a powerful observability stack for production graph database deployments.</p>
<p>This guide covers Prometheus integration patterns, essential metrics, alerting rules, and visualization strategies for monitoring Geode effectively.</p>
<h3 id="prometheus-architecture-with-geode" class="position-relative d-flex align-items-center group">
<span>Prometheus Architecture with Geode</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="prometheus-architecture-with-geode"
aria-haspopup="dialog"
aria-label="Share link: Prometheus Architecture with Geode">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><div id="headingShareModal" class="heading-share-modal" role="dialog" aria-modal="true" aria-labelledby="headingShareTitle" hidden>
<div class="hsm-dialog" role="document">
<div class="hsm-header">
<h2 id="headingShareTitle" class="h6 mb-0 fw-bold">Share this section</h2>
<button type="button" class="hsm-close" aria-label="Close">
<i class="fa-solid fa-xmark"></i>
</button>
</div>
<div class="hsm-body">
<label for="headingShareInput" class="form-label small text-muted mb-1 text-uppercase fw-bold" style="font-size: 0.7rem; letter-spacing: 0.5px;">Permalink</label>
<div class="input-group mb-4 hsm-url-group">
<input id="headingShareInput" type="text" class="form-control font-monospace" readonly aria-readonly="true" style="font-size: 0.85rem;" />
<button class="btn btn-primary hsm-copy" type="button" aria-label="Copy" title="Copy">
<i class="fa-duotone fa-clipboard" aria-hidden="true"></i>
</button>
</div>
<div class="small fw-bold mb-2 text-muted text-uppercase" style="font-size: 0.7rem; letter-spacing: 0.5px;">Share via</div>
<div class="hsm-share-grid">
<a id="share-twitter" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer">
<i class="fa-brands fa-twitter me-2"></i>Twitter
</a>
<a id="share-linkedin" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer">
<i class="fa-brands fa-linkedin me-2"></i>LinkedIn
</a>
<a id="share-facebook" class="btn btn-outline-secondary w-100" target="_blank" rel="noopener noreferrer">
<i class="fa-brands fa-facebook me-2"></i>Facebook
</a>
</div>
</div>
</div>
</div>
<style>
.heading-share-modal {
position: fixed;
inset: 0;
display: flex;
justify-content: center;
align-items: center;
background: rgba(0, 0, 0, 0.6);
z-index: 1050;
padding: 1rem;
backdrop-filter: blur(4px);
-webkit-backdrop-filter: blur(4px);
}
.heading-share-modal[hidden] { display: none !important; }
.hsm-dialog {
max-width: 420px;
width: 100%;
background: var(--bs-body-bg, #fff);
color: var(--bs-body-color, #212529);
border: 1px solid var(--bs-border-color, rgba(0,0,0,0.1));
border-radius: 1rem;
box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25);
overflow: hidden;
animation: hsm-fade-in 0.2s ease-out;
}
@keyframes hsm-fade-in {
from { opacity: 0; transform: scale(0.95); }
to { opacity: 1; transform: scale(1); }
}
[data-bs-theme="dark"] .hsm-dialog {
background: #1e293b;
border-color: rgba(255,255,255,0.1);
color: #f8f9fa;
}
.hsm-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 1rem 1.5rem;
border-bottom: 1px solid var(--bs-border-color, rgba(0,0,0,0.1));
background: rgba(0,0,0,0.02);
}
[data-bs-theme="dark"] .hsm-header {
background: rgba(255,255,255,0.02);
border-color: rgba(255,255,255,0.1);
}
.hsm-close {
background: transparent;
border: none;
color: inherit;
opacity: 0.5;
padding: 0.25rem 0.5rem;
border-radius: 0.25rem;
font-size: 1.2rem;
line-height: 1;
transition: opacity 0.2s;
}
.hsm-close:hover {
opacity: 1;
}
.hsm-body {
padding: 1.5rem;
}
.hsm-url-group {
display: flex !important;
align-items: stretch;
}
.hsm-url-group .form-control {
flex: 1;
min-width: 0;
margin: 0;
background: var(--bs-secondary-bg, #f8f9fa);
border-color: var(--bs-border-color, #dee2e6);
border-top-right-radius: 0;
border-bottom-right-radius: 0;
height: 42px;
}
.hsm-url-group .btn {
flex: 0 0 auto;
margin: 0;
margin-left: -1px;
border-top-left-radius: 0;
border-bottom-left-radius: 0;
height: 42px;
display: flex;
align-items: center;
justify-content: center;
padding: 0 1.25rem;
z-index: 2;
}
[data-bs-theme="dark"] .hsm-url-group .form-control {
background: #0f172a;
border-color: #334155;
color: #e2e8f0;
}
.hsm-share-grid {
display: flex;
flex-direction: column;
gap: 0.5rem;
}
.hsm-share-grid .btn {
display: flex;
align-items: center;
justify-content: center;
font-size: 0.9rem;
padding: 0.6rem;
border-color: var(--bs-border-color);
width: 100%;
}
[data-bs-theme="dark"] .hsm-share-grid .btn {
color: #e2e8f0;
border-color: #475569;
}
[data-bs-theme="dark"] .hsm-share-grid .btn:hover {
background: #334155;
border-color: #cbd5e1;
}
</style>
<script>
(function(){
const modal = document.getElementById('headingShareModal');
if(!modal) return;
const input = modal.querySelector('#headingShareInput');
const copyBtn = modal.querySelector('.hsm-copy');
const twitter = modal.querySelector('#share-twitter');
const linkedin = modal.querySelector('#share-linkedin');
const facebook = modal.querySelector('#share-facebook');
const closeBtn = modal.querySelector('.hsm-close');
let lastFocus=null;
let trapBound=false;
function buildUrl(id){ return window.location.origin + window.location.pathname + '#' + id; }
function isOpen(){ return !modal.hasAttribute('hidden'); }
function hydrate(id){
const url=buildUrl(id);
input.value=url;
const enc=encodeURIComponent(url);
const text=encodeURIComponent(document.title);
if(twitter) twitter.href=`https://twitter.com/intent/tweet?url=${enc}&text=${text}`;
if(linkedin) linkedin.href=`https://www.linkedin.com/sharing/share-offsite/?url=${enc}`;
if(facebook) facebook.href=`https://www.facebook.com/sharer/sharer.php?u=${enc}`;
}
function openModal(id){
lastFocus=document.activeElement;
hydrate(id);
if(!isOpen()){
modal.removeAttribute('hidden');
}
requestAnimationFrame(()=>{ input.focus(); });
trapFocus();
}
function closeModal(){
if(!isOpen()) return;
modal.setAttribute('hidden','');
if(lastFocus && typeof lastFocus.focus==='function') lastFocus.focus();
}
function copyCurrent(){
try{ navigator.clipboard.writeText(input.value).then(()=>feedback(true),()=>fallback()); }
catch(e){ fallback(); }
}
function fallback(){ input.select(); try{ document.execCommand('copy'); feedback(true);}catch(e){ feedback(false);} }
function feedback(ok){ if(!copyBtn) return; const icon=copyBtn.querySelector('i'); if(!icon) return; const prev=copyBtn.getAttribute('data-prev')||icon.className; if(!copyBtn.getAttribute('data-prev')) copyBtn.setAttribute('data-prev',prev); icon.className= ok ? 'fa-duotone fa-clipboard-check':'fa-duotone fa-circle-exclamation'; setTimeout(()=>{ icon.className=prev; },1800); }
function handleShareClick(e){ e.preventDefault(); const btn=e.currentTarget; const id=btn.getAttribute('data-share-target'); if(id) openModal(id); }
function bindShareButtons(){
document.querySelectorAll('.h-share').forEach(btn=>{
if(!btn.dataset.hShareBound){ btn.addEventListener('click', handleShareClick); btn.dataset.hShareBound='1'; }
});
}
bindShareButtons();
if(document.readyState==='loading'){
document.addEventListener('DOMContentLoaded', bindShareButtons);
} else {
requestAnimationFrame(bindShareButtons);
}
document.addEventListener('click', function(e){
const shareBtn=e.target.closest && e.target.closest('.h-share');
if(shareBtn && !shareBtn.dataset.hShareBound){ handleShareClick.call(shareBtn, e); }
}, true);
document.addEventListener('click', e=>{
if(e.target===modal) closeModal();
if(e.target.closest && e.target.closest('.hsm-close')){ e.preventDefault(); closeModal(); }
if(copyBtn && (e.target===copyBtn || (e.target.closest && e.target.closest('.hsm-copy')))) { e.preventDefault(); copyCurrent(); }
});
document.addEventListener('keydown', e=>{ if(e.key==='Escape' && isOpen()) closeModal(); });
function trapFocus(){
if(trapBound) return;
trapBound=true;
modal.addEventListener('keydown', f=>{ if(f.key==='Tab' && isOpen()){ const focusable=[...modal.querySelectorAll('a[href],button,input,textarea,select,[tabindex]:not([tabindex="-1"])')].filter(el=>!el.hasAttribute('disabled')); if(!focusable.length) return; const first=focusable[0]; const last=focusable[focusable.length-1]; if(f.shiftKey && document.activeElement===first){ f.preventDefault(); last.focus(); } else if(!f.shiftKey && document.activeElement===last){ f.preventDefault(); first.focus(); } } });
}
if(closeBtn) closeBtn.addEventListener('click', e=>{ e.preventDefault(); closeModal(); });
})();
</script><p>Prometheus uses a pull-based model where it periodically scrapes metrics from instrumented applications. Geode exposes metrics at <code>http://geode-host:8080/metrics</code> in Prometheus exposition format:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># HELP geode_queries_total Total number of queries executed
</span></span><span class="line"><span class="cl"># TYPE geode_queries_total counter
</span></span><span class="line"><span class="cl">geode_queries_total{status="success"} 125847
</span></span><span class="line"><span class="cl">geode_queries_total{status="error"} 342
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"># HELP geode_query_duration_seconds Query execution duration
</span></span><span class="line"><span class="cl"># TYPE geode_query_duration_seconds histogram
</span></span><span class="line"><span class="cl">geode_query_duration_seconds_bucket{le="0.01"} 45234
</span></span><span class="line"><span class="cl">geode_query_duration_seconds_bucket{le="0.05"} 89432
</span></span><span class="line"><span class="cl">geode_query_duration_seconds_bucket{le="0.1"} 112847
</span></span><span class="line"><span class="cl">geode_query_duration_seconds_sum 12847.3
</span></span><span class="line"><span class="cl">geode_query_duration_seconds_count 125847
</span></span></code></pre></div>
<h3 id="configuring-prometheus-scraping" class="position-relative d-flex align-items-center group">
<span>Configuring Prometheus Scraping</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="configuring-prometheus-scraping"
aria-haspopup="dialog"
aria-label="Share link: Configuring Prometheus Scraping">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Basic Prometheus Configuration</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># prometheus.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">global</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">scrape_interval</span><span class="p">:</span><span class="w"> </span><span class="l">15s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">evaluation_interval</span><span class="p">:</span><span class="w"> </span><span class="l">15s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">external_labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">cluster</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-production'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'localhost:8080'</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">instance</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-primary'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="s1">'production'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">scrape_interval</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">scrape_timeout</span><span class="p">:</span><span class="w"> </span><span class="l">5s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">'/metrics'</span><span class="w">
</span></span></span></code></pre></div><p><strong>Multi-Instance Configuration</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-cluster'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'geode-node-1:8080'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'geode-node-2:8080'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'geode-node-3:8080'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">cluster</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-prod'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">datacenter</span><span class="p">:</span><span class="w"> </span><span class="s1">'us-east-1'</span><span class="w">
</span></span></span></code></pre></div><p><strong>Kubernetes Service Discovery</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-k8s'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">kubernetes_sd_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="l">pod</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">namespaces</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">names</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">geode</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">relabel_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_kubernetes_pod_label_app]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">keep</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="l">geode</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_kubernetes_pod_name]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">instance</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_kubernetes_namespace]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">namespace</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="essential-geode-metrics" class="position-relative d-flex align-items-center group">
<span>Essential Geode Metrics</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="essential-geode-metrics"
aria-haspopup="dialog"
aria-label="Share link: Essential Geode Metrics">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Query Performance Metrics</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Query rate by status</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Average query duration (p50, p95, p99)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.50</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Slow query rate (queries > 1s)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_slow_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Query errors by type</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">error_type</span><span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_errors_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span></code></pre></div><p><strong>Transaction Metrics</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Transaction commit rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">"</span><span class="s">committed</span><span class="p">"}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Transaction rollback rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">"</span><span class="s">rolled_back</span><span class="p">"}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Transaction conflict rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_conflicts_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Active transactions</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_transactions</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Transaction duration</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span></code></pre></div><p><strong>Connection Metrics</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Active connections</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_connections</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Connection pool utilization</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_connections</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_max_connections</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Connection error rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_connection_errors_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Connections by client type</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">client_type</span><span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="nv">geode_active_connections</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div><p><strong>Memory and Resource Metrics</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Memory usage</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_memory_used_bytes</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Memory usage percentage</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_memory_total_bytes</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Cache hit rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cache_hits_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cache_hits_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cache_misses_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># MVCC version count</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_mvcc_versions_total</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Garbage collection time</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_gc_duration_seconds_sum</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div><p><strong>Storage Metrics</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Disk space used</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_disk_used_bytes</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Disk space available</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_disk_free_bytes</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># WAL size</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_wal_size_bytes</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Disk I/O operations</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_disk_io_operations_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Checkpoint duration</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_checkpoint_duration_seconds</span><span class="w">
</span></span></span></code></pre></div><p><strong>Index Metrics</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Index size</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_index_size_bytes</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Index lookups</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_index_lookups_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Index hit rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_index_hits_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_index_lookups_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="alerting-rules" class="position-relative d-flex align-items-center group">
<span>Alerting Rules</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="alerting-rules"
aria-haspopup="dialog"
aria-label="Share link: Alerting Rules">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Critical Alerts</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">groups</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_critical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">30s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">GeodeDown</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="l">up{job="geode"} == 0</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">1m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"Geode instance {{ $labels.instance }} is down"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"Geode has been unreachable for more than 1 minute"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">HighQueryErrorRate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_queries_total{status="error"}[5m]) > 10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"High query error rate on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"Query error rate is {{ $value }} errors/sec"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">DiskSpaceCritical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> geode_disk_free_bytes / geode_disk_total_bytes < 0.05</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"Critical disk space on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"Only {{ $value | humanizePercentage }} disk space remaining"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">MemoryExhaustion</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> geode_memory_used_bytes / geode_memory_total_bytes > 0.95</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">10m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"Memory near exhaustion on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"Memory usage at {{ $value | humanizePercentage }}"</span><span class="w">
</span></span></span></code></pre></div><p><strong>Warning Alerts</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_warnings</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">1m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">SlowQueryRateIncreasing</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_slow_queries_total[5m]) > 5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">10m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"Slow query rate increasing on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"{{ $value }} slow queries per second"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">HighTransactionConflicts</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_transaction_conflicts_total[5m]) > 100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">10m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"High transaction conflict rate"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"{{ $value }} conflicts per second"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">ConnectionPoolPressure</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> geode_active_connections / geode_max_connections > 0.8</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">15m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"Connection pool pressure on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"{{ $value | humanizePercentage }} of connections used"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">LongRunningQueries</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> geode_active_queries{duration_seconds=">60"} > 5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"Multiple long-running queries detected"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"{{ $value }} queries running for >60 seconds"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">WalSizeGrowing</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> deriv(geode_wal_size_bytes[30m]) > 1e9</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">for</span><span class="p">:</span><span class="w"> </span><span class="l">30m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"WAL size growing rapidly on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s2">"WAL growing at {{ $value | humanize1024 }}/sec"</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="grafana-dashboard-queries" class="position-relative d-flex align-items-center group">
<span>Grafana Dashboard Queries</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="grafana-dashboard-queries"
aria-haspopup="dialog"
aria-label="Share link: Grafana Dashboard Queries">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Query Performance Dashboard</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Panel: Query Rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">status</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Query Latency Percentiles</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.50</span><span class="p">,</span><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Top Slow Queries</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">topk</span><span class="o">(</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nv">geode_query_duration_seconds</span><span class="p">{</span><span class="nl">quantile</span><span class="o">=</span><span class="p">"</span><span class="s">0.99</span><span class="p">"}</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Query Errors by Type</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">error_type</span><span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_errors_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w">
</span></span></span></code></pre></div><p><strong>System Health Dashboard</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Panel: CPU Usage</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_cpu_seconds_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Memory Usage</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_memory_total_bytes</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Disk Usage</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_disk_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_disk_total_bytes</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Active Connections</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nv">geode_active_connections</span><span class="w">
</span></span></span></code></pre></div><p><strong>Transaction Dashboard</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Panel: Transaction Throughput</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">"</span><span class="s">committed</span><span class="p">"}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Transaction Success Rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">{</span><span class="nl">status</span><span class="o">=</span><span class="p">"</span><span class="s">committed</span><span class="p">"}[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transactions_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Transaction Conflicts</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_conflicts_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Panel: Transaction Duration Heatmap</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_transaction_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="recording-rules-for-performance" class="position-relative d-flex align-items-center group">
<span>Recording Rules for Performance</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="recording-rules-for-performance"
aria-haspopup="dialog"
aria-label="Share link: Recording Rules for Performance">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p>Pre-aggregate expensive queries using recording rules:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">groups</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_recordings</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">15s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_query_rate:5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="l">sum(rate(geode_queries_total[5m])) by (job, instance)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_query_latency_p95:5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="l">histogram_quantile(0.95, sum(rate(geode_query_duration_seconds_bucket[5m])) by (job, instance, le))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_memory_usage_percent:current</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> geode_memory_used_bytes / geode_memory_total_bytes * 100</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">record</span><span class="p">:</span><span class="w"> </span><span class="l">job:geode_cache_hit_rate:5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_cache_hits_total[5m]) /
</span></span></span><span class="line"><span class="cl"><span class="sd"> (rate(geode_cache_hits_total[5m]) + rate(geode_cache_misses_total[5m]))</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="advanced-monitoring-patterns" class="position-relative d-flex align-items-center group">
<span>Advanced Monitoring Patterns</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="advanced-monitoring-patterns"
aria-haspopup="dialog"
aria-label="Share link: Advanced Monitoring Patterns">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Predict Disk Fullness</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="kr">predict_linear</span><span class="o">(</span><span class="nv">geode_disk_free_bytes</span><span class="p">[</span><span class="s">1h</span><span class="p">],</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">3600</span><span class="o">)</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span></span></span></code></pre></div><p><strong>Detect Anomalies in Query Rate</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="kr">abs</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">-</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">avg_over_time</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="p">[</span><span class="s">1h</span><span class="err">:</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="o">></span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">stddev_over_time</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="p">[</span><span class="s">1h</span><span class="err">:</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div><p><strong>Correlate Query Performance with Memory Pressure</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_sum</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w"> </span><span class="o">/</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_count</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="ow">and</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nv">geode_memory_used_bytes</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nv">geode_memory_total_bytes</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">0.8</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="federation-for-multi-cluster-monitoring" class="position-relative d-flex align-items-center group">
<span>Federation for Multi-Cluster Monitoring</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="federation-for-multi-cluster-monitoring"
aria-haspopup="dialog"
aria-label="Share link: Federation for Multi-Cluster Monitoring">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p>Monitor multiple Geode clusters from a central Prometheus:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Central Prometheus configuration</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'federate-us-east'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">'/federate'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">'match[]'</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'{job="geode"}'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-us-east:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="s1">'us-east-1'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'federate-us-west'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">'/federate'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">'match[]'</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'{job="geode"}'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-us-west:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="s1">'us-west-2'</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="advanced-prometheus-patterns" class="position-relative d-flex align-items-center group">
<span>Advanced Prometheus Patterns</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="advanced-prometheus-patterns"
aria-haspopup="dialog"
aria-label="Share link: Advanced Prometheus Patterns">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3>
<h4 id="dynamic-service-discovery" class="position-relative d-flex align-items-center group">
<span>Dynamic Service Discovery</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="dynamic-service-discovery"
aria-haspopup="dialog"
aria-label="Share link: Dynamic Service Discovery">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>For dynamic Geode clusters, use service discovery:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Consul service discovery</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-consul'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">consul_sd_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="s1">'localhost:8500'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">services</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'geode'</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">relabel_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_consul_service]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">job</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_consul_node]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">instance</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__meta_consul_tags]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">'.*,environment=([^,]+),.*'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">environment</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="custom-exporters" class="position-relative d-flex align-items-center group">
<span>Custom Exporters</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="custom-exporters"
aria-haspopup="dialog"
aria-label="Share link: Custom Exporters">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Create custom exporters for Geode-specific metrics:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># geode_exporter.py</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">prometheus_client</span> <span class="kn">import</span> <span class="n">start_http_server</span><span class="p">,</span> <span class="n">Gauge</span><span class="p">,</span> <span class="n">Counter</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">asyncio</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">geode_client</span> <span class="kn">import</span> <span class="n">Client</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Define custom metrics</span>
</span></span><span class="line"><span class="cl"><span class="n">active_queries_gauge</span> <span class="o">=</span> <span class="n">Gauge</span><span class="p">(</span><span class="s1">'geode_custom_active_queries'</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s1">'Currently executing queries'</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s1">'query_type'</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">cache_memory_gauge</span> <span class="o">=</span> <span class="n">Gauge</span><span class="p">(</span><span class="s1">'geode_custom_cache_memory_bytes'</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s1">'Memory used by query cache'</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">collect_custom_metrics</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"""Collect custom Geode metrics"""</span>
</span></span><span class="line"><span class="cl"> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="c1"># Active queries by type</span>
</span></span><span class="line"><span class="cl"> <span class="n">result</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">"""
</span></span></span><span class="line"><span class="cl"><span class="s2"> CALL system.active_queries()
</span></span></span><span class="line"><span class="cl"><span class="s2"> RETURN query_type, count(*) as count
</span></span></span><span class="line"><span class="cl"><span class="s2"> """</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">result</span><span class="o">.</span><span class="n">rows</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">active_queries_gauge</span><span class="o">.</span><span class="n">labels</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">query_type</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="s1">'query_type'</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">)</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="s1">'count'</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="c1"># Cache memory usage</span>
</span></span><span class="line"><span class="cl"> <span class="n">cache_stats</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">"""
</span></span></span><span class="line"><span class="cl"><span class="s2"> CALL system.cache_stats()
</span></span></span><span class="line"><span class="cl"><span class="s2"> RETURN memory_bytes
</span></span></span><span class="line"><span class="cl"><span class="s2"> """</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">cache_memory_gauge</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">cache_stats</span><span class="o">.</span><span class="n">rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">'memory_bytes'</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">15</span><span class="p">)</span> <span class="c1"># Scrape interval</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">start_http_server</span><span class="p">(</span><span class="mi">9090</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="s2">"localhost:3141"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">asyncio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">collect_custom_metrics</span><span class="p">(</span><span class="n">client</span><span class="p">))</span>
</span></span></code></pre></div>
<h4 id="metric-relabeling" class="position-relative d-flex align-items-center group">
<span>Metric Relabeling</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="metric-relabeling"
aria-haspopup="dialog"
aria-label="Share link: Metric Relabeling">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Transform metrics during scraping:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'localhost:8080'</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metric_relabel_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Drop high-cardinality debug metrics in production</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode_debug_.*'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">drop</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Rename legacy metrics</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode_old_metric_name'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">__name__</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">replacement</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode_new_metric_name'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Aggregate instance-level metrics to cluster level</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">instance]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-node-.*'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_label</span><span class="p">:</span><span class="w"> </span><span class="l">cluster</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">replacement</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-production'</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="alert-management" class="position-relative d-flex align-items-center group">
<span>Alert Management</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="alert-management"
aria-haspopup="dialog"
aria-label="Share link: Alert Management">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3>
<h4 id="alertmanager-configuration" class="position-relative d-flex align-items-center group">
<span>Alertmanager Configuration</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="alertmanager-configuration"
aria-haspopup="dialog"
aria-label="Share link: Alertmanager Configuration">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># alertmanager.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">global</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">resolve_timeout</span><span class="p">:</span><span class="w"> </span><span class="l">5m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">slack_api_url</span><span class="p">:</span><span class="w"> </span><span class="s1">'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">route</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">group_by</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'alertname'</span><span class="p">,</span><span class="w"> </span><span class="s1">'cluster'</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">group_wait</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">group_interval</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">repeat_interval</span><span class="p">:</span><span class="w"> </span><span class="l">12h</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">receiver</span><span class="p">:</span><span class="w"> </span><span class="s1">'team-database'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">routes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">critical</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">receiver</span><span class="p">:</span><span class="w"> </span><span class="s1">'team-database-pager'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">continue</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l">warning</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">receiver</span><span class="p">:</span><span class="w"> </span><span class="s1">'team-database-slack'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">receivers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">'team-database'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">email_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">to</span><span class="p">:</span><span class="w"> </span><span class="s1">'[email protected]'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">'team-database-pager'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">pagerduty_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">service_key</span><span class="p">:</span><span class="w"> </span><span class="s1">'YOUR_PAGERDUTY_KEY'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">'team-database-slack'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">slack_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">channel</span><span class="p">:</span><span class="w"> </span><span class="s1">'#database-alerts'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="s1">'Geode Alert'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">text</span><span class="p">:</span><span class="w"> </span><span class="s1">'{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="alert-inhibition-rules" class="position-relative d-flex align-items-center group">
<span>Alert Inhibition Rules</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="alert-inhibition-rules"
aria-haspopup="dialog"
aria-label="Share link: Alert Inhibition Rules">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Prevent alert storms with inhibition:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># alertmanager.yml (continued)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">inhibit_rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Don't alert on high memory if instance is down</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="s1">'critical'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">alertname</span><span class="p">:</span><span class="w"> </span><span class="s1">'GeodeDown'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="s1">'warning'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">equal</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'instance'</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Don't alert on slow queries if database is overloaded</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">alertname</span><span class="p">:</span><span class="w"> </span><span class="s1">'HighQueryErrorRate'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">target_match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">alertname</span><span class="p">:</span><span class="w"> </span><span class="s1">'SlowQueryRateIncreasing'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">equal</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'instance'</span><span class="p">]</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="alert-templates" class="position-relative d-flex align-items-center group">
<span>Alert Templates</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="alert-templates"
aria-haspopup="dialog"
aria-label="Share link: Alert Templates">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Create informative alert messages:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">groups</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">geode_alerts</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">alert</span><span class="p">:</span><span class="w"> </span><span class="l">HighErrorRate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">expr</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> rate(geode_queries_total{status="error"}[5m]) > 10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="s2">"High query error rate on {{ $labels.instance }}"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> Query error rate is {{ $value | humanize }} errors/sec.
</span></span></span><span class="line"><span class="cl"><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> Current stats:
</span></span></span><span class="line"><span class="cl"><span class="sd"> - Instance: {{ $labels.instance }}
</span></span></span><span class="line"><span class="cl"><span class="sd"> - Error rate: {{ $value | humanize }}/sec
</span></span></span><span class="line"><span class="cl"><span class="sd"> - Time: {{ $labels.timestamp }}
</span></span></span><span class="line"><span class="cl"><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd"> Runbook: https://docs.geodedb.com/runbooks/high-error-rate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">dashboard</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://grafana.example.com/d/geode-errors"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">runbook_url</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://docs.geodedb.com/runbooks/high-error-rate"</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="grafana-integration" class="position-relative d-flex align-items-center group">
<span>Grafana Integration</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="grafana-integration"
aria-haspopup="dialog"
aria-label="Share link: Grafana Integration">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3>
<h4 id="provisioning-dashboards" class="position-relative d-flex align-items-center group">
<span>Provisioning Dashboards</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="provisioning-dashboards"
aria-haspopup="dialog"
aria-label="Share link: Provisioning Dashboards">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># grafana/provisioning/dashboards/geode.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">providers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s1">'Geode'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgId</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">folder</span><span class="p">:</span><span class="w"> </span><span class="s1">'Database'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">file</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">disableDeletion</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">updateIntervalSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">allowUiUpdates</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">options</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/grafana/dashboards/geode</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="dashboard-variables" class="position-relative d-flex align-items-center group">
<span>Dashboard Variables</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="dashboard-variables"
aria-haspopup="dialog"
aria-label="Share link: Dashboard Variables">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Create dynamic dashboards with variables:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"templating"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"list"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"name"</span><span class="p">:</span> <span class="s2">"instance"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"query"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"datasource"</span><span class="p">:</span> <span class="s2">"Prometheus"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"query"</span><span class="p">:</span> <span class="s2">"label_values(geode_up, instance)"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"multi"</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"includeAll"</span><span class="p">:</span> <span class="kc">true</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"name"</span><span class="p">:</span> <span class="s2">"time_range"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"interval"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"query"</span><span class="p">:</span> <span class="s2">"1m,5m,15m,30m,1h,6h,12h,1d"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"current"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"text"</span><span class="p">:</span> <span class="s2">"5m"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"value"</span><span class="p">:</span> <span class="s2">"5m"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"panels"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"title"</span><span class="p">:</span> <span class="s2">"Query Rate"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"targets"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"expr"</span><span class="p">:</span> <span class="s2">"sum(rate(geode_queries_total{instance=~\"$instance\"}[$time_range]))"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="panel-examples" class="position-relative d-flex align-items-center group">
<span>Panel Examples</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="panel-examples"
aria-haspopup="dialog"
aria-label="Share link: Panel Examples">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p><strong>Heatmap Panel</strong> for query duration distribution:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"heatmap"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"title"</span><span class="p">:</span> <span class="s2">"Query Duration Heatmap"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"targets"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"expr"</span><span class="p">:</span> <span class="s2">"sum(rate(geode_query_duration_seconds_bucket{instance=~\"$instance\"}[$time_range])) by (le)"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"format"</span><span class="p">:</span> <span class="s2">"heatmap"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"legendFormat"</span><span class="p">:</span> <span class="s2">"{{le}}"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"dataFormat"</span><span class="p">:</span> <span class="s2">"tsbuckets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"yAxis"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"format"</span><span class="p">:</span> <span class="s2">"s"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p><strong>Status Panel</strong> for system health:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"stat"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"title"</span><span class="p">:</span> <span class="s2">"System Status"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"targets"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"expr"</span><span class="p">:</span> <span class="s2">"up{job=\"geode\"}"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"instant"</span><span class="p">:</span> <span class="kc">true</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"mappings"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"value"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"text"</span><span class="p">:</span> <span class="s2">"UP"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"color"</span><span class="p">:</span> <span class="s2">"green"</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"value"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"text"</span><span class="p">:</span> <span class="s2">"DOWN"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"color"</span><span class="p">:</span> <span class="s2">"red"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="advanced-querying-techniques" class="position-relative d-flex align-items-center group">
<span>Advanced Querying Techniques</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="advanced-querying-techniques"
aria-haspopup="dialog"
aria-label="Share link: Advanced Querying Techniques">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3>
<h4 id="quantile-aggregation" class="position-relative d-flex align-items-center group">
<span>Quantile Aggregation</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="quantile-aggregation"
aria-haspopup="dialog"
aria-label="Share link: Quantile Aggregation">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Calculate percentiles across multiple instances:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># P99 query latency across all instances</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># P99 per instance</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">histogram_quantile</span><span class="o">(</span><span class="mf">0.99</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="k">sum</span><span class="o">(</span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_query_duration_seconds_bucket</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">))</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="o">(</span><span class="nv">le</span><span class="p">,</span><span class="w"> </span><span class="nv">instance</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="rate-calculation-windows" class="position-relative d-flex align-items-center group">
<span>Rate Calculation Windows</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="rate-calculation-windows"
aria-haspopup="dialog"
aria-label="Share link: Rate Calculation Windows">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Choose appropriate time windows:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Very short window (1m) - noisy but responsive</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">1m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Medium window (5m) - balanced</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Long window (1h) - smooth but slow to react</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">1h</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Use irate for instant rate (last 2 samples)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">irate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="subquery-patterns" class="position-relative d-flex align-items-center group">
<span>Subquery Patterns</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="subquery-patterns"
aria-haspopup="dialog"
aria-label="Share link: Subquery Patterns">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Calculate rate of rate (acceleration):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-promql" data-lang="promql"><span class="line"><span class="cl"><span class="c1"># Query acceleration (change in query rate)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">deriv</span><span class="o">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="kr">rate</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">5m</span><span class="p">]</span><span class="o">)</span><span class="p">[</span><span class="s">10m</span><span class="err">:</span><span class="s">1m</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1"># Predict future value</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="kr">predict_linear</span><span class="o">(</span><span class="nv">geode_queries_total</span><span class="p">[</span><span class="s">1h</span><span class="p">],</span><span class="w"> </span><span class="mi">3600</span><span class="o">)</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="monitoring-at-scale" class="position-relative d-flex align-items-center group">
<span>Monitoring at Scale</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="monitoring-at-scale"
aria-haspopup="dialog"
aria-label="Share link: Monitoring at Scale">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3>
<h4 id="hierarchical-federation" class="position-relative d-flex align-items-center group">
<span>Hierarchical Federation</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="hierarchical-federation"
aria-haspopup="dialog"
aria-label="Share link: Hierarchical Federation">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>For large deployments with multiple regions:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Regional Prometheus federates from local instances</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'federate-local'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">'/federate'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">'match[]'</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'{job="geode"}'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-az1:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-az2:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-az3:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c"># Global Prometheus federates from regional instances</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">'federate-regional'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">honor_labels</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="s1">'/federate'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">params</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">'match[]'</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'{job="geode"}'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">targets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-us-east:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-us-west:9090'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="s1">'prometheus-eu-west:9090'</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="remote-write-for-long-term-storage" class="position-relative d-flex align-items-center group">
<span>Remote Write for Long-Term Storage</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="remote-write-for-long-term-storage"
aria-haspopup="dialog"
aria-label="Share link: Remote Write for Long-Term Storage">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Send metrics to long-term storage:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">remote_write</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://prometheus-remote-storage.example.com/api/v1/write"</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">basic_auth</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">username</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-metrics'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">password</span><span class="p">:</span><span class="w"> </span><span class="s1">'SECRET'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">write_relabel_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="c"># Only send aggregate metrics for long-term storage</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode_(queries_total|query_duration_seconds_bucket|memory_used_bytes)'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">keep</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">queue_config</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">capacity</span><span class="p">:</span><span class="w"> </span><span class="m">10000</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">max_shards</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">min_shards</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">max_samples_per_send</span><span class="p">:</span><span class="w"> </span><span class="m">5000</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="thanos-for-global-view" class="position-relative d-flex align-items-center group">
<span>Thanos for Global View</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="thanos-for-global-view"
aria-haspopup="dialog"
aria-label="Share link: Thanos for Global View">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h4><p>Deploy Thanos for unlimited retention and global querying:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Prometheus with Thanos sidecar</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">global</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">external_labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">cluster</span><span class="p">:</span><span class="w"> </span><span class="s1">'geode-production'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="s1">'us-east-1'</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c"># Thanos sidecar configuration</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">storage.tsdb.path=/prometheus</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">objstore.config-file=/etc/thanos/bucket.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">grpc-address=0.0.0.0:10901</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>--<span class="l">http-address=0.0.0.0:10902</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="best-practices" class="position-relative d-flex align-items-center group">
<span>Best Practices</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="best-practices"
aria-haspopup="dialog"
aria-label="Share link: Best Practices">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Appropriate Scrape Intervals</strong>: Balance between data resolution and storage overhead. 15-30 seconds works well for most deployments.</p>
<p><strong>Use Recording Rules</strong>: Pre-compute expensive queries to reduce dashboard load times.</p>
<p><strong>Set Alert Thresholds Based on Baselines</strong>: Monitor normal behavior before setting alert thresholds to minimize false positives.</p>
<p><strong>Implement Alert Routing</strong>: Use Alertmanager to route alerts to appropriate teams and communication channels.</p>
<p><strong>Retain Historical Data</strong>: Configure appropriate retention periods to support trend analysis and capacity planning.</p>
<p><strong>Monitor Prometheus Itself</strong>: Ensure your monitoring system is healthy and performant.</p>
<p><strong>Label Consistently</strong>: Use consistent label naming across all metrics for easier querying and aggregation.</p>
<p><strong>Dashboard Organization</strong>: Create focused dashboards for different personas (developers, operators, executives).</p>
<p><strong>Metric Naming</strong>: Follow Prometheus naming conventions (unit suffix, descriptive names).</p>
<p><strong>Cardinality Management</strong>: Keep label cardinality bounded to prevent memory issues.</p>
<h3 id="troubleshooting-common-issues" class="position-relative d-flex align-items-center group">
<span>Troubleshooting Common Issues</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="troubleshooting-common-issues"
aria-haspopup="dialog"
aria-label="Share link: Troubleshooting Common Issues">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><p><strong>Missing Metrics</strong>: Verify Geode metrics endpoint is accessible and Prometheus can reach it.</p>
<p><strong>High Cardinality</strong>: Avoid labels with unbounded values (e.g., user IDs, session IDs).</p>
<p><strong>Storage Issues</strong>: Monitor Prometheus disk usage and adjust retention policies as needed.</p>
<p><strong>Slow Queries</strong>: Use recording rules to pre-aggregate complex queries used in dashboards.</p>
<p><strong>Scrape Failures</strong>: Check network connectivity, TLS certificates, and authentication.</p>
<p><strong>Out of Memory</strong>: Reduce retention period, use recording rules, or add more RAM.</p>
<p><strong>Slow Dashboards</strong>: Optimize queries, use recording rules, reduce time ranges.</p>
<h3 id="production-deployment-checklist" class="position-relative d-flex align-items-center group">
<span>Production Deployment Checklist</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="production-deployment-checklist"
aria-haspopup="dialog"
aria-label="Share link: Production Deployment Checklist">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><ul>
<li><input disabled="" type="checkbox"> Scrape intervals configured appropriately</li>
<li><input disabled="" type="checkbox"> Recording rules created for expensive queries</li>
<li><input disabled="" type="checkbox"> Alert rules defined and tested</li>
<li><input disabled="" type="checkbox"> Alertmanager configured with proper routing</li>
<li><input disabled="" type="checkbox"> Dashboards created for key metrics</li>
<li><input disabled="" type="checkbox"> Retention period set based on requirements</li>
<li><input disabled="" type="checkbox"> Backup strategy for Prometheus data</li>
<li><input disabled="" type="checkbox"> Monitoring of Prometheus itself</li>
<li><input disabled="" type="checkbox"> High availability setup (if required)</li>
<li><input disabled="" type="checkbox"> Documentation of metrics and alerts</li>
<li><input disabled="" type="checkbox"> Runbooks created for common alerts</li>
<li><input disabled="" type="checkbox"> Load testing of monitoring stack</li>
</ul>
<h3 id="related-topics" class="position-relative d-flex align-items-center group">
<span>Related Topics</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="related-topics"
aria-haspopup="dialog"
aria-label="Share link: Related Topics">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><ul>
<li><a
href="/tags/monitoring/"
>Monitoring and Observability</a>
</li>
<li><a
href="/tags/metrics/"
>Metrics and Performance</a>
</li>
<li><a
href="/tags/observability/"
>Observability Best Practices</a>
</li>
<li><a
href="/tags/tools/"
>Grafana Dashboards</a>
</li>
<li><a
href="/tags/operations/"
>Alerting Strategies</a>
</li>
<li><a
href="/tags/performance/"
>Performance Tuning</a>
</li>
</ul>
<h3 id="further-reading" class="position-relative d-flex align-items-center group">
<span>Further Reading</span>
<button type="button"
class="h-share btn btn-link p-0 text-decoration-none link-secondary opacity-50 hover-opacity-100 transition-all ms-1"
data-share-target="further-reading"
aria-haspopup="dialog"
aria-label="Share link: Further Reading">
<i class="fa-sharp-duotone fa-solid fa-share-nodes" aria-hidden="true" style="font-size: 0.8em;"></i>
<span class="visually-hidden">Share link</span>
</button>
</h3><ul>
<li>Prometheus Configuration Guide</li>
<li>Grafana Dashboard Templates for Geode</li>
<li>Alert Runbook Examples</li>
<li>Performance Tuning with Metrics</li>
<li>Multi-Cluster Monitoring Patterns</li>
<li>Prometheus Best Practices</li>
<li>Recording Rules Guide</li>
<li>Federation Patterns</li>
</ul>
Related Articles
Monitoring
Set up comprehensive monitoring for Geode including Prometheus metrics, Grafana dashboards, alerting, and log aggregation
Monitoring and Telemetry
Monitor Geode with health checks and Prometheus metrics, enable optional paging telemetry, and configure tamper-evident audit logging with tracing IDs
Monitoring Guide
Monitor Geode with built-in metrics, Prometheus, Grafana, and alerting