NordIQ Dashboard | Our Flagship Predictive Monitoring Product

What is the NordIQ Dashboard?

The NordIQ Dashboard is our flagship out-of-the-box product: an AI-powered predictive monitoring platform that forecasts server incidents 30-60 minutes before they happen. Unlike traditional monitoring that alerts you when things break, we give you time to fix problems during business hours—not at 2 AM. Need something custom? Explore our custom AI solutions →

🔮

Predict Problems

8-hour prediction horizon with 30-60 minute early warning before critical incidents.

🧠

Understand Context

Profile-aware risk intelligence that knows the difference between normal and dangerous.

⚡

Act Proactively

Fix problems during business hours. No emergency pages. No user impact.

Core Features

Production-ready capabilities built for enterprise infrastructure teams.

🔮

8-Hour Prediction Horizon

Our Temporal Fusion Transformer (TFT) model analyzes 24 hours of historical data to predict server behavior 8 hours into the future. You get 30-60 minute advance warning before incidents become critical.

Predictions refreshed every 5 seconds
88% accuracy on critical incidents
GPU-accelerated inference (<100ms per server)
Real-time WebSocket streaming to dashboard

🧠

Contextual Risk Intelligence

We don't just alert on raw thresholds. Our fuzzy logic system understands operational context—what's normal for your infrastructure, what's trending dangerous, and what requires immediate action.

Profile Awareness: Database at 98% memory = healthy (page cache). ML server at 98% = critical (OOM imminent).
Trend Analysis: 40% CPU steady = fine. 40% CPU climbing from 20% = dangerous trend detected.
Multi-Metric Correlation: High CPU alone = watch. High CPU + high memory + high I/O wait = critical compound stress.
Prediction-Aware: Current 40%, predicted 95% = early warning. Current 85%, predicted 60% = resolving issue.

Result: Intelligent alerts that understand your environment, not just arbitrary thresholds.

📊

7 Graduated Severity Levels

Traditional monitoring: everything is either OK or ON FIRE. Our system: graduated escalation with appropriate response times.

🔴 Imminent Failure (90+): 5-minute SLA, CTO escalation, emergency response
🔴 Critical (80-89): 15-minute SLA, page on-call engineer
🟠 Danger (70-79): 30-minute SLA, team lead notification
🟡 Warning (60-69): 1-hour SLA, team awareness
🟢 Degrading (50-59): 2-hour SLA, email notification
👁️ Watch (30-49): Background monitoring, no alerts
✅ Healthy (0-29): Normal operation

Benefit: Right-sized responses. No alert fatigue. No false positives.

🎯

Profile-Based Transfer Learning

New servers get accurate predictions immediately—no training period required. Our model learns patterns from similar servers and applies that intelligence to new infrastructure.

7 Server Profiles:

ML Compute: Training nodes, high CPU/memory bursts
Database: Oracle/Postgres, high disk I/O, large page cache
Web API: REST endpoints, high network throughput
Conductor/Management: Job scheduling, orchestration
Data Ingest: Kafka/Spark streaming, high write volume
Risk Analytics: Financial calculations, CPU-intensive
Generic: Fallback for unknown workloads

Benefits:

✅ No retraining when adding servers of known types
✅ 13% better accuracy than generic models
✅ 80% less retraining frequency
✅ Immediate production value for new infrastructure

📈

14 Production LINBORG Metrics

We monitor the metrics that matter for real-world troubleshooting—not just CPU and memory.

CPU Metrics

User space CPU (application work)
System/kernel CPU (OS overhead)
I/O Wait (storage bottlenecks)—critical for troubleshooting
Idle CPU (displayed as % Used for clarity)
Java/Spark CPU (application-specific)

Memory & Storage

Memory utilization %
Swap usage (thrashing indicator)
Disk space usage

Network & System

Network ingress (MB/s)
Network egress (MB/s)
TCP backend connections
TCP frontend connections
System load average
Uptime (maintenance tracking)

Why I/O Wait matters: High I/O wait is "system troubleshooting 101"—it's the first metric experienced engineers check when diagnosing performance issues. Database servers expect 10-15% I/O wait (normal). ML compute servers should have <2% (high values indicate misconfiguration).

⚡

Real-Time Streaming Architecture

Microservices-based design for high performance and scalability.

Inference Daemon: REST API + WebSocket streaming on port 8000
Metrics Generator: Collects data from production sources (port 8001)
Dashboard: Streamlit web UI with 10 specialized tabs (port 8501)
Performance: <100ms inference latency, <2s dashboard load time
Caching: Strategic caching provides 60% performance improvement

Interactive Dashboard - 10 Specialized Tabs

Everything you need to monitor, analyze, and respond to your infrastructure—all in one place.

1. Fleet Overview

Your command center. Real-time view of all servers, risk scores, and predictions.

Environment status (Healthy, Caution, Warning, Critical)
Fleet-wide statistics (P1 alerts, degrading servers, total fleet health)
Server cards with current metrics and 8-hour predictions
Risk score (0-100) with contextual intelligence
Color-coded severity indicators
Expandable metric details per server

2. Server Heatmap

Visual fleet-wide view. Spot problems at a glance across all servers and metrics.

Interactive heatmap (servers x metrics)
Color-coded: Green (healthy) → Yellow (watch) → Red (critical)
Quickly identify which servers and metrics need attention
Profile grouping (see all databases together, all ML servers together)

3. Top Problem Servers

Focus on what matters. The 5 highest-risk servers that need immediate attention.

Ranked by risk score (highest risk first)
What's wrong (specific metrics exceeding thresholds)
Why it's dangerous (impact analysis)
What to do about it (recommended actions)
Predicted timeline to critical

4. Historical Trends

Understand patterns. 24-hour historical data for all servers and metrics.

Time-series charts (current vs. predicted)
Trend analysis (increasing, stable, decreasing)
Anomaly detection (unexpected spikes or drops)
Correlation analysis (which metrics move together)
Export data for reporting

5. Cost Avoidance Calculator

Prove ROI. Calculate the financial value of prevented incidents.

Prevented incidents this month/quarter/year
Estimated downtime avoided (hours)
Revenue protected (based on your hourly rate)
Operational costs saved (emergency response, overtime)
Customer satisfaction impact (prevented outages)
Real Example: 1 prevented 2-hour outage = $209,000 saved (for a $100M ARR SaaS company)

6. Auto-Remediation

Take action automatically. Configure responses for common issues.

Memory Pressure: Restart service, clear cache, scale horizontally
CPU Spikes: Kill runaway processes, adjust thread pools
Disk Space: Archive logs, clean temp files, expand volume
Network Issues: Restart networking, failover to backup
Dry-run mode (test before enabling)
Approval workflows for high-risk actions
Audit trail (who did what, when)

7. Alert Routing

Right alerts to right people. Intelligent routing based on severity and team.

Integrations: PagerDuty, Slack, Microsoft Teams, Email, Webhooks
Routing Rules: Imminent Failure → CTO + On-Call, Critical → On-Call, Warning → Team Channel
Escalation Policies: Auto-escalate if not acknowledged within SLA
On-Call Schedules: Respect work hours, time zones, PTO
Alert Suppression: Maintenance windows, known issues
Alert Grouping: Combine related alerts to reduce noise

8. Advanced Diagnostics

Deep technical insights for engineers.

Model confidence scores per prediction
Attention weights (which metrics influenced the prediction)
Risk calculation breakdown (how we arrived at the score)
API health (daemon status, latency, error rates)
WebSocket connection status
Debug mode (raw prediction data, logs)

9. Documentation

Everything you need to know, right in the dashboard.

Getting started guide
Risk scoring explained (fuzzy logic, contextual intelligence)
Server profile definitions
Metric definitions (what each LINBORG metric means)
Severity levels and response SLAs
API reference
Troubleshooting common issues

10. Roadmap

See what's coming. Transparency about future features.

Planned features (next quarter)
Research explorations (future vision)
Customer-requested features
Vote on priorities (what do you need most?)

See NordIQ in Action

Real screenshots from our production dashboard

Fleet Overview - Real-time monitoring with actual vs predicted state comparison

Risk Heatmap - Visual fleet health at a glance

Top Risks - Focus on servers that need attention now

Historical Trends - 8-hour prediction horizon with confidence intervals

Cost Avoidance - Quantify the ROI of prevented outages

Auto-Remediation - Automated response playbooks

Alerting Strategy - 7 graduated severity levels

Advanced Analytics - Deep-dive into model performance

Built-in Documentation - What we DO vs DON'T do (crystal clear positioning)

Interactive Demo - Test different scenarios (healthy, degrading, critical)

Transparent Roadmap - See what's coming next

Technical Capabilities

Built for enterprise infrastructure teams with demanding requirements.

Performance

<100ms inference latency per server
<2 seconds dashboard load time
60% faster with strategic caching
GPU-accelerated predictions (CUDA support)
5-second refresh rate for real-time monitoring
Handles 1000+ servers per deployment

Data Sources

MongoDB: Time-series collections
Elasticsearch: Log aggregation
Prometheus: Metrics scraping
InfluxDB: Time-series database
REST API: Push metrics from any source
Custom Adapters: Build your own (5-15K)

Deployment Options

Self-Hosted: On-prem or your cloud (Docker, Kubernetes)
Managed SaaS: We host, you monitor
Hybrid: Data stays on-prem, we provide insights
Air-Gapped: No internet required
Multi-Region: Deploy across datacenters

Security & Compliance

API Key Authentication: Auto-generated, rotatable
Okta SSO: Enterprise single sign-on
LDAP/Active Directory: Integrate with existing auth
Role-Based Access Control: Admin, Engineer, Viewer roles
Audit Logs: Track all actions and changes
Encrypted Connections: TLS 1.3 for all traffic
SOC 2 Ready: Compliance documentation included

Integrations

PagerDuty: Incident management
Slack: Team notifications
Microsoft Teams: Enterprise chat
Email: SMTP/SendGrid/SES
Webhooks: Custom integrations
ServiceNow: ITSM ticketing
Jira: Project tracking

APIs & Extensibility

REST API: Full CRUD operations
WebSocket API: Real-time streaming
Python SDK: Build custom tools
Webhooks: React to events
Custom Dashboards: Embed predictions in your tools
Data Export: CSV, JSON, Parquet

What Makes Us Different

We prevent incidents. Traditional monitoring just detects them.

Capability	Traditional Monitoring	NordIQ Predictive
Detection Time	❌ Alerts AFTER problems occur	✅ 30-60 minute advance warning
Alert Intelligence	❌ Dumb thresholds (CPU > 80%)	✅ Contextual risk scoring (fuzzy logic)
Profile Awareness	❌ One-size-fits-all	✅ 7 specialized server profiles
Alert Levels	❌ OK or Critical (binary)	✅ 7 graduated severity levels
New Servers	❌ Need weeks of training data	✅ Accurate predictions immediately
Multi-Metric	❌ Isolated metrics (CPU alone)	✅ Correlation analysis (14 metrics)
Response Time	❌ Emergency pages at 2 AM	✅ Fix during business hours
False Positives	❌ High (alert fatigue)	✅ Low (contextual intelligence)
User Impact	❌ Users experience downtime first	✅ Problems fixed before user impact
Cost	❌ $100K-500K/year (Datadog, New Relic)	✅ $5K-150K/year (transparent pricing)

Ready to Try Predictive Monitoring?

See the difference contextual intelligence makes for your infrastructure.

Request a Demo

Proven Performance

Built in 158 hours with AI assistance. Production-ready from day one.

88%

Prediction Accuracy

On critical incidents (90+ risk score)

30-60 min

Early Warning Time

Before problems become critical

8 hours

Prediction Horizon

See the future of your infrastructure

<100ms

Inference Latency

Per server prediction (GPU-accelerated)

14

LINBORG Metrics

Production-ready monitoring

1000+

Servers Supported

Per deployment instance

Development Speed: 158 hours with AI assistance = 5-8x faster than traditional development (800-1,200 hours). Cost reduction: 76-93%. Same quality, fraction of the time.

See NordIQ in Action

Request a personalized demo. See how predictive monitoring works with your infrastructure.

What you'll get:

30-minute live demo with the founder (Craig)
Walk through all 10 dashboard tabs
See predictions for your server profiles
Calculate ROI for your environment
Discuss integration with your existing tools
Get answers to technical questions

Email Craig for a Demo

Message us on Facebook - Direct access to the founder. No sales pressure.

NordIQ Dashboard: Our Flagship Product

What is the NordIQ Dashboard?

Predict Problems

Understand Context

Act Proactively

Core Features

8-Hour Prediction Horizon

Contextual Risk Intelligence

7 Graduated Severity Levels

Profile-Based Transfer Learning

14 Production LINBORG Metrics

CPU Metrics

Memory & Storage

Network & System

Real-Time Streaming Architecture

Interactive Dashboard - 10 Specialized Tabs

1. Fleet Overview

2. Server Heatmap

3. Top Problem Servers

4. Historical Trends

5. Cost Avoidance Calculator

6. Auto-Remediation

7. Alert Routing

8. Advanced Diagnostics

9. Documentation

10. Roadmap

See NordIQ in Action

Technical Capabilities

Performance

Data Sources

Deployment Options

Security & Compliance

Integrations

APIs & Extensibility

What Makes Us Different

Ready to Try Predictive Monitoring?

Proven Performance

See NordIQ in Action