How Moonlet Monitors Solana Validator Health
Moonlet operates institutional-grade validator clusters on Solana. To protect delegators and maximize rewards, we combine conventional observability tools with an AI-driven monitoring layer that detects anomalies before they affect uptime or vote-credits.
1 Key Metrics We Track 24 / 7
Category | Examples | Why It Matters to You |
---|---|---|
Consensus | Slot vote rate, skip-rate, delinquent status, voted credits | Directly determines epoch rewards. |
Proof-of-History | TPU/RPC PoH drift, slot-leading lag | Ensures our node stays synchronized with cluster time. |
Network | Packet loss, UDP RTT, gossip peer count | High latency → missed leader slots and lower rewards. |
System | CPU %, RAM, NVMe I/O, GPU load | Solana is hardware-intensive; saturation can stall vote signing. |
Security | SSH login anomalies, unsigned kernel modules, validator identity mismatch | Prevents key-compromise and slashable offenses. |
Blockchain API | RPC health, rate-limit errors, block-commit times | Guarantees users get real-time data in Moonlet UI. |
2 Our Monitoring Stack
Layer | Tooling |
---|---|
Collection | Prometheus exporters (systemd, Solana-Exporter, node-exporter), Loki log ingestion |
AI Anomaly Engine | Custom LSTM & Prophet models flag outliers in vote-credits, slot times, and resource trends (24-hour look-back) |
Alerting | PagerDuty, Slack, and on-call SMS for P1 events |
Dashboards | Grafana & internal Moonlet “Validator Pulse” panel—visible to ops and exposed read-only in the user dashboard |
3 Automated Self-Healing
-
Slot-Skip Spike > 3 % (5 min window)
- AI engine triggers restart of the validator TPU and toggles leader-schedule voting.
-
PoH drift > 150 ms
- Clock resync with Chrony NTP; if unresolved, fail over to standby node in another region.
-
RPC 5xx Error Burst
- Traffic is routed to redundant RPC pool; faulty container is auto-replaced via Kubernetes.
Average MTTR (mean-time-to-recovery) last quarter: 58 seconds.
4 What Users See in the Moonlet Dashboard
Indicator | Meaning |
---|---|
Green “Healthy” badge | Vote-credits ≥ 99 % of cluster average, skip-rate ≤ network median. |
Yellow “Degraded” badge | Momentary issue detected; self-healing in progress. Rewards typically unaffected. |
Red “Action” badge | Persistent performance drop (> 2 epochs). We notify delegators via in-app banner and email. |
Click Validator Details → Health to view live skip-rate, vote credits, commission, and historical uptime charts.
5 Compliance & Transparency
-
SOC 2 Type II & ISO 27001 controls govern our monitoring pipeline (log integrity, access control, incident response).
-
Weekly performance snapshots are published to a public JSON feed so delegators can audit our vote-credit history.
-
All validator binaries are built from reproducible sources; checksums are logged and signed for every upgrade.
6 FAQ
Question | Answer |
---|---|
Will I lose rewards if the validator restarts? | No. Restarts are staggered outside leader slots; vote-credits stay ≥ 99 %. |
How often do you upgrade the validator client? | Within 24 h of an official Solana stable release—earlier if it’s a critical security patch. |
Do you slash delegators? | Solana currently has limited slashing; our architecture (geo-redundant, AI-monitored) is built to avoid slashable events if they’re enabled in the future. |
Moonlet’s AI-backed observability ensures high vote-credit performance and near-zero downtime—so your SOL keeps compounding, epoch after epoch.