Fraud and Anomaly Detection
Build a real-time fraud detection system using graph patterns, machine learning embeddings, and change data capture (CDC).
Problem Statement
Financial fraud is a graph problem:
- Transaction rings: Multiple accounts transferring funds in circles
- Velocity anomalies: Unusual transaction patterns (frequency, amount, location)
- Network effects: Compromised accounts infecting connected accounts
- Synthetic identities: Fabricated identities with suspicious relationship patterns
Traditional rule-based systems miss complex patterns. Graph databases excel at detecting these relationships.
Graph Model
Schema
-- Create graph
CREATE GRAPH PaymentNetwork;
USE PaymentNetwork;
-- Nodes
(:Account {
id: UUID,
holder_name: String,
email: String,
created_at: Timestamp,
risk_score: Float,
embedding: VectorF32 -- For ML-based similarity
})
(:Transaction {
id: UUID,
amount: Decimal,
timestamp: Timestamp,
status: String, -- 'pending', 'completed', 'flagged', 'blocked'
merchant_category: String,
location: LatLon
})
(:Device {
id: UUID,
fingerprint: String,
ip_addr: IpAddr,
first_seen: Timestamp
})
-- Relationships
(:Account)-[:SENT {timestamp: Timestamp}]->(:Transaction)
(:Transaction)-[:RECEIVED_BY]->(:Account)
(:Account)-[:USED_DEVICE {first_used: Timestamp}]->(:Device)
(:Account)-[:SHARES_INFO {info_type: String}]->(:Account) -- email, phone, address
Example Data
-- Create accounts
CREATE
(:Account {id: gen_random_uuid(), holder_name: "Alice Smith", email: "[email protected]", created_at: timestamp(), risk_score: 0.1}),
(:Account {id: gen_random_uuid(), holder_name: "Bob Jones", email: "[email protected]", created_at: timestamp(), risk_score: 0.2}),
(:Account {id: gen_random_uuid(), holder_name: "Charlie Wilson", email: "[email protected]", created_at: timestamp(), risk_score: 0.8});
-- Create transactions
MATCH (alice:Account {holder_name: "Alice Smith"}), (bob:Account {holder_name: "Bob Jones"})
CREATE (tx:Transaction {id: gen_random_uuid(), amount: decimal('1000.00'), timestamp: timestamp(), status: 'completed', merchant_category: 'retail'})
CREATE (alice)-[:SENT {timestamp: timestamp()}]->(tx)-[:RECEIVED_BY]->(bob);
-- Shared device (suspicious)
MATCH (alice:Account {holder_name: "Alice Smith"}), (charlie:Account {holder_name: "Charlie Wilson"})
CREATE (dev:Device {id: gen_random_uuid(), fingerprint: "abc123", ip_addr: '192.168.1.100'::ipaddr})
CREATE (alice)-[:USED_DEVICE {first_used: timestamp()}]->(dev)
CREATE (charlie)-[:USED_DEVICE {first_used: timestamp()}]->(dev);
Detection Patterns
1. Transaction Ring Detection
Circular money flows indicating wash trading or money laundering:
-- Find transaction cycles (A -> B -> C -> A)
MATCH path = (a:Account)-[:SENT]->(:Transaction)-[:RECEIVED_BY]->(b:Account)-[:SENT]->(:Transaction)-[:RECEIVED_BY]->(c:Account)-[:SENT]->(:Transaction)-[:RECEIVED_BY]->(a)
WHERE length(path) = 6 -- 3 transactions
RETURN
a.holder_name AS account1,
b.holder_name AS account2,
c.holder_name AS account3,
length(path) AS cycle_length;
2. Velocity Anomaly Detection
Accounts with unusual transaction frequency or amounts:
-- High-velocity accounts (>10 transactions in 1 hour)
MATCH (a:Account)-[:SENT]->(tx:Transaction)
WHERE tx.timestamp > timestamp() - interval('PT1H')
WITH a, COUNT(tx) AS tx_count, SUM(tx.amount) AS total_amount
WHERE tx_count > 10
RETURN a.holder_name, tx_count, total_amount
ORDER BY tx_count DESC;
-- Large transaction anomaly (>3 standard deviations)
MATCH (a:Account)-[:SENT]->(tx:Transaction)
WITH AVG(tx.amount) AS avg_amount, STDDEV(tx.amount) AS stddev_amount
MATCH (a:Account)-[:SENT]->(tx:Transaction)
WHERE tx.amount > avg_amount + (3 * stddev_amount)
RETURN a.holder_name, tx.amount, avg_amount, stddev_amount;
3. Shared Device/IP Detection
Multiple accounts using the same device or IP address:
-- Find accounts sharing devices
MATCH (a1:Account)-[:USED_DEVICE]->(d:Device)<-[:USED_DEVICE]-(a2:Account)
WHERE a1.id < a2.id -- Avoid duplicates
RETURN
d.fingerprint,
d.ip_addr,
COLLECT(a1.holder_name) AS account1,
COLLECT(a2.holder_name) AS account2;
4. Network Centrality (Mule Accounts)
Accounts acting as hubs (receiving/sending to many accounts):
-- Compute betweenness centrality to find mule accounts
CALL graph.betweenness('PaymentNetwork', {
relationship_type: 'SENT'
})
YIELD node, score
WITH node, score
WHERE score > 10.0 -- High betweenness = hub account
MATCH (a:Account)
WHERE a.id = node.id
RETURN a.holder_name, a.risk_score, score AS betweenness
ORDER BY score DESC;
ML-Based Anomaly Detection
Generate Embeddings
Use Node2Vec to create account embeddings capturing behavioral patterns:
-- Generate account embeddings
CALL graph.node2vec('PaymentNetwork', {
relationship_type: 'SENT',
dimensions: 128,
walk_length: 80,
num_walks: 10,
p: 1.0,
q: 1.0
})
YIELD node, embedding
WITH node, embedding
MATCH (a:Account)
WHERE a.id = node.id
SET a.embedding = embedding;
Detect Anomalies with Vector Similarity
Find accounts with unusual behavior (outliers):
-- For each account, find nearest neighbors
MATCH (a:Account)
MATCH (neighbor:Account)
WHERE neighbor.id <> a.id
AND vector_distance_cosine(a.embedding, neighbor.embedding) < 0.8
WITH a, COUNT(neighbor) AS neighbor_count
WHERE neighbor_count < 3 -- Fewer than 3 similar accounts = anomaly
RETURN a.holder_name, a.risk_score, neighbor_count
ORDER BY neighbor_count ASC;
Real-Time Fraud Detection
From REAL_TIME_ANALYTICS.md:
CDC Configuration
Capture transaction events and trigger real-time analysis:
# cdc-config.yaml
cdc:
enabled: true
webhooks:
- url: "https://fraud-detection.example.com/webhook"
events:
- node.created
- edge.created
filter: "label = 'Transaction' OR type = 'SENT'"
retry:
max_attempts: 3
backoff: exponential
Webhook Processing
Webhook receives transaction event:
{
"event": "node.created",
"graph": "PaymentNetwork",
"node": {
"id": 123456,
"labels": ["Transaction"],
"properties": {
"amount": "5000.00",
"timestamp": "2024-01-15T14:30:00Z",
"merchant_category": "wire_transfer"
}
},
"trace_id": "7c9e8d6f-5b4a-3c2d-1e0f-9a8b7c6d5e4f"
}
Fraud detection service:
# Pseudo-code
async def process_transaction(event):
tx_id = event['node']['id']
amount = Decimal(event['node']['properties']['amount'])
# Query graph for context
sender = await get_sender_account(tx_id)
receiver = await get_receiver_account(tx_id)
# Compute risk score
risk_score = 0.0
# Check velocity
recent_tx_count = await count_recent_transactions(sender.id, hours=1)
if recent_tx_count > 10:
risk_score += 0.3
# Check amount anomaly
avg_amount = await get_avg_transaction_amount(sender.id)
if amount > avg_amount * 3:
risk_score += 0.4
# Check for transaction ring
in_ring = await is_part_of_ring(sender.id, receiver.id)
if in_ring:
risk_score += 0.5
# Flag if high risk
if risk_score > 0.7:
await flag_transaction(tx_id, risk_score)
await send_alert(sender, receiver, risk_score)
return risk_score
Row-Level Security for Multi-Tenant Isolation
From ADVANCED_SECURITY_FEATURES_OCTOBER_2025.md:
Financial institutions need tenant isolation:
-- Create RLS policy: users only see their organization's data
CREATE POLICY org_isolation ON Account
FOR SELECT
USING (organization_id = current_user_org_id());
CREATE POLICY org_isolation_tx ON Transaction
FOR SELECT
USING (
EXISTS (
MATCH (a:Account)-[:SENT|RECEIVED_BY]->(tx:Transaction)
WHERE tx.id = this.id AND a.organization_id = current_user_org_id()
)
);
Benefit: Analysts at Bank A cannot see Bank B’s data, even with direct database access.
Audit Logging for Compliance
From AUDIT_LOGGING.md:
Track all fraud investigations for regulatory compliance:
security:
audit:
enabled: true
log_path: "/var/log/geode/fraud-audit.jsonl"
syslog:
enabled: true
address: "siem.example.com:514"
format: "CEF"
Audit log entry:
{
"timestamp": "2024-01-15T14:30:00.123Z",
"event_type": "query_executed",
"user": "fraud_analyst_alice",
"graph": "PaymentNetwork",
"execution_time_ms": 45.2,
"rows_returned": 3,
"trace_id": "...",
"prev_log_hash": "...",
"signature": "..."
}
Performance Optimization
Index Strategy
-- Index frequently queried properties
CREATE INDEX account_risk_score_idx ON Account(risk_score) USING btree;
CREATE INDEX tx_timestamp_idx ON Transaction(timestamp) USING btree;
CREATE INDEX tx_amount_idx ON Transaction(amount) USING btree;
-- Vector index for similarity queries
CREATE INDEX account_embedding_idx ON Account(embedding) USING vector;
-- Spatial index for location-based fraud
CREATE INDEX tx_location_idx ON Transaction(location) USING spatial;
Materialized Risk Scores
Pre-compute risk scores for fast lookups:
-- Batch job (run hourly)
MATCH (a:Account)
WITH a,
-- Compute risk factors
COUNT {MATCH (a)-[:SENT]->(:Transaction) WHERE timestamp > timestamp() - interval('P1D')} AS daily_tx_count,
AVG {MATCH (a)-[:SENT]->(tx:Transaction) RETURN tx.amount} AS avg_tx_amount
SET a.risk_score = CASE
WHEN daily_tx_count > 50 THEN 0.8
WHEN avg_tx_amount > 10000 THEN 0.6
ELSE 0.2
END;
-- Queries use cached risk_score
MATCH (a:Account)
WHERE a.risk_score > 0.7
RETURN a.holder_name, a.risk_score
ORDER BY a.risk_score DESC;
Complete Workflow
- Ingest transactions via application or bulk load
- CDC triggers webhook to fraud detection service
- Service queries graph for context (velocity, rings, centrality)
- ML model scores transaction using embeddings
- High-risk transactions flagged automatically
- Analysts investigate using GQL queries
- Audit logs capture all investigations for compliance
Next Steps
- Real-Time Analytics - CDC and streaming integration
- Graph Algorithms - Centrality and embeddings
- Security Guide - RLS and audit logging
- Indexing and Optimization - Performance tuning