NordIQ Dashboard: Our Flagship Product

Out-of-the-box predictive monitoring. From 8-hour predictions to intelligent alerts—everything you need to prevent incidents before they impact your business. Deploy in hours, not months.

What is the NordIQ Dashboard?

The NordIQ Dashboard is our flagship out-of-the-box product: an AI-powered predictive monitoring platform that forecasts server incidents 30-60 minutes before they happen. Unlike traditional monitoring that alerts you when things break, we give you time to fix problems during business hours—not at 2 AM. Need something custom? Explore our custom AI solutions →

🔮

Predict Problems

8-hour prediction horizon with 30-60 minute early warning before critical incidents.

🧠

Understand Context

Profile-aware risk intelligence that knows the difference between normal and dangerous.

Act Proactively

Fix problems during business hours. No emergency pages. No user impact.

Core Features

Production-ready capabilities built for enterprise infrastructure teams.

🔮

8-Hour Prediction Horizon

Our Temporal Fusion Transformer (TFT) model analyzes 24 hours of historical data to predict server behavior 8 hours into the future. You get 30-60 minute advance warning before incidents become critical.

  • Predictions refreshed every 5 seconds
  • 88% accuracy on critical incidents
  • GPU-accelerated inference (<100ms per server)
  • Real-time WebSocket streaming to dashboard
🧠

Contextual Risk Intelligence

We don't just alert on raw thresholds. Our fuzzy logic system understands operational context—what's normal for your infrastructure, what's trending dangerous, and what requires immediate action.

  • Profile Awareness: Database at 98% memory = healthy (page cache). ML server at 98% = critical (OOM imminent).
  • Trend Analysis: 40% CPU steady = fine. 40% CPU climbing from 20% = dangerous trend detected.
  • Multi-Metric Correlation: High CPU alone = watch. High CPU + high memory + high I/O wait = critical compound stress.
  • Prediction-Aware: Current 40%, predicted 95% = early warning. Current 85%, predicted 60% = resolving issue.

Result: Intelligent alerts that understand your environment, not just arbitrary thresholds.

📊

7 Graduated Severity Levels

Traditional monitoring: everything is either OK or ON FIRE. Our system: graduated escalation with appropriate response times.

  • 🔴 Imminent Failure (90+): 5-minute SLA, CTO escalation, emergency response
  • 🔴 Critical (80-89): 15-minute SLA, page on-call engineer
  • 🟠 Danger (70-79): 30-minute SLA, team lead notification
  • 🟡 Warning (60-69): 1-hour SLA, team awareness
  • 🟢 Degrading (50-59): 2-hour SLA, email notification
  • 👁️ Watch (30-49): Background monitoring, no alerts
  • Healthy (0-29): Normal operation

Benefit: Right-sized responses. No alert fatigue. No false positives.

🎯

Profile-Based Transfer Learning

New servers get accurate predictions immediately—no training period required. Our model learns patterns from similar servers and applies that intelligence to new infrastructure.

7 Server Profiles:

  • ML Compute: Training nodes, high CPU/memory bursts
  • Database: Oracle/Postgres, high disk I/O, large page cache
  • Web API: REST endpoints, high network throughput
  • Conductor/Management: Job scheduling, orchestration
  • Data Ingest: Kafka/Spark streaming, high write volume
  • Risk Analytics: Financial calculations, CPU-intensive
  • Generic: Fallback for unknown workloads

Benefits:

  • ✅ No retraining when adding servers of known types
  • ✅ 13% better accuracy than generic models
  • ✅ 80% less retraining frequency
  • ✅ Immediate production value for new infrastructure
📈

14 Production LINBORG Metrics

We monitor the metrics that matter for real-world troubleshooting—not just CPU and memory.

CPU Metrics

  • User space CPU (application work)
  • System/kernel CPU (OS overhead)
  • I/O Wait (storage bottlenecks)—critical for troubleshooting
  • Idle CPU (displayed as % Used for clarity)
  • Java/Spark CPU (application-specific)

Memory & Storage

  • Memory utilization %
  • Swap usage (thrashing indicator)
  • Disk space usage

Network & System

  • Network ingress (MB/s)
  • Network egress (MB/s)
  • TCP backend connections
  • TCP frontend connections
  • System load average
  • Uptime (maintenance tracking)

Why I/O Wait matters: High I/O wait is "system troubleshooting 101"—it's the first metric experienced engineers check when diagnosing performance issues. Database servers expect 10-15% I/O wait (normal). ML compute servers should have <2% (high values indicate misconfiguration).

Real-Time Streaming Architecture

Microservices-based design for high performance and scalability.

  • Inference Daemon: REST API + WebSocket streaming on port 8000
  • Metrics Generator: Collects data from production sources (port 8001)
  • Dashboard: Streamlit web UI with 10 specialized tabs (port 8501)
  • Performance: <100ms inference latency, <2s dashboard load time
  • Caching: Strategic caching provides 60% performance improvement

Interactive Dashboard - 10 Specialized Tabs

Everything you need to monitor, analyze, and respond to your infrastructure—all in one place.

1. Fleet Overview

Your command center. Real-time view of all servers, risk scores, and predictions.

  • Environment status (Healthy, Caution, Warning, Critical)
  • Fleet-wide statistics (P1 alerts, degrading servers, total fleet health)
  • Server cards with current metrics and 8-hour predictions
  • Risk score (0-100) with contextual intelligence
  • Color-coded severity indicators
  • Expandable metric details per server

2. Server Heatmap

Visual fleet-wide view. Spot problems at a glance across all servers and metrics.

  • Interactive heatmap (servers x metrics)
  • Color-coded: Green (healthy) → Yellow (watch) → Red (critical)
  • Quickly identify which servers and metrics need attention
  • Profile grouping (see all databases together, all ML servers together)

3. Top Problem Servers

Focus on what matters. The 5 highest-risk servers that need immediate attention.

  • Ranked by risk score (highest risk first)
  • What's wrong (specific metrics exceeding thresholds)
  • Why it's dangerous (impact analysis)
  • What to do about it (recommended actions)
  • Predicted timeline to critical

4. Historical Trends

Understand patterns. 24-hour historical data for all servers and metrics.

  • Time-series charts (current vs. predicted)
  • Trend analysis (increasing, stable, decreasing)
  • Anomaly detection (unexpected spikes or drops)
  • Correlation analysis (which metrics move together)
  • Export data for reporting

5. Cost Avoidance Calculator

Prove ROI. Calculate the financial value of prevented incidents.

  • Prevented incidents this month/quarter/year
  • Estimated downtime avoided (hours)
  • Revenue protected (based on your hourly rate)
  • Operational costs saved (emergency response, overtime)
  • Customer satisfaction impact (prevented outages)
  • Real Example: 1 prevented 2-hour outage = $209,000 saved (for a $100M ARR SaaS company)

6. Auto-Remediation

Take action automatically. Configure responses for common issues.

  • Memory Pressure: Restart service, clear cache, scale horizontally
  • CPU Spikes: Kill runaway processes, adjust thread pools
  • Disk Space: Archive logs, clean temp files, expand volume
  • Network Issues: Restart networking, failover to backup
  • Dry-run mode (test before enabling)
  • Approval workflows for high-risk actions
  • Audit trail (who did what, when)

7. Alert Routing

Right alerts to right people. Intelligent routing based on severity and team.

  • Integrations: PagerDuty, Slack, Microsoft Teams, Email, Webhooks
  • Routing Rules: Imminent Failure → CTO + On-Call, Critical → On-Call, Warning → Team Channel
  • Escalation Policies: Auto-escalate if not acknowledged within SLA
  • On-Call Schedules: Respect work hours, time zones, PTO
  • Alert Suppression: Maintenance windows, known issues
  • Alert Grouping: Combine related alerts to reduce noise

8. Advanced Diagnostics

Deep technical insights for engineers.

  • Model confidence scores per prediction
  • Attention weights (which metrics influenced the prediction)
  • Risk calculation breakdown (how we arrived at the score)
  • API health (daemon status, latency, error rates)
  • WebSocket connection status
  • Debug mode (raw prediction data, logs)

9. Documentation

Everything you need to know, right in the dashboard.

  • Getting started guide
  • Risk scoring explained (fuzzy logic, contextual intelligence)
  • Server profile definitions
  • Metric definitions (what each LINBORG metric means)
  • Severity levels and response SLAs
  • API reference
  • Troubleshooting common issues

10. Roadmap

See what's coming. Transparency about future features.

  • Planned features (next quarter)
  • Research explorations (future vision)
  • Customer-requested features
  • Vote on priorities (what do you need most?)

See NordIQ in Action

Real screenshots from our production dashboard

Technical Capabilities

Built for enterprise infrastructure teams with demanding requirements.

Performance

  • <100ms inference latency per server
  • <2 seconds dashboard load time
  • 60% faster with strategic caching
  • GPU-accelerated predictions (CUDA support)
  • 5-second refresh rate for real-time monitoring
  • Handles 1000+ servers per deployment

Data Sources

  • MongoDB: Time-series collections
  • Elasticsearch: Log aggregation
  • Prometheus: Metrics scraping
  • InfluxDB: Time-series database
  • REST API: Push metrics from any source
  • Custom Adapters: Build your own (5-15K)

Deployment Options

  • Self-Hosted: On-prem or your cloud (Docker, Kubernetes)
  • Managed SaaS: We host, you monitor
  • Hybrid: Data stays on-prem, we provide insights
  • Air-Gapped: No internet required
  • Multi-Region: Deploy across datacenters

Security & Compliance

  • API Key Authentication: Auto-generated, rotatable
  • Okta SSO: Enterprise single sign-on
  • LDAP/Active Directory: Integrate with existing auth
  • Role-Based Access Control: Admin, Engineer, Viewer roles
  • Audit Logs: Track all actions and changes
  • Encrypted Connections: TLS 1.3 for all traffic
  • SOC 2 Ready: Compliance documentation included

Integrations

  • PagerDuty: Incident management
  • Slack: Team notifications
  • Microsoft Teams: Enterprise chat
  • Email: SMTP/SendGrid/SES
  • Webhooks: Custom integrations
  • ServiceNow: ITSM ticketing
  • Jira: Project tracking

APIs & Extensibility

  • REST API: Full CRUD operations
  • WebSocket API: Real-time streaming
  • Python SDK: Build custom tools
  • Webhooks: React to events
  • Custom Dashboards: Embed predictions in your tools
  • Data Export: CSV, JSON, Parquet

What Makes Us Different

We prevent incidents. Traditional monitoring just detects them.

Capability Traditional Monitoring NordIQ Predictive
Detection Time ❌ Alerts AFTER problems occur ✅ 30-60 minute advance warning
Alert Intelligence ❌ Dumb thresholds (CPU > 80%) ✅ Contextual risk scoring (fuzzy logic)
Profile Awareness ❌ One-size-fits-all ✅ 7 specialized server profiles
Alert Levels ❌ OK or Critical (binary) ✅ 7 graduated severity levels
New Servers ❌ Need weeks of training data ✅ Accurate predictions immediately
Multi-Metric ❌ Isolated metrics (CPU alone) ✅ Correlation analysis (14 metrics)
Response Time ❌ Emergency pages at 2 AM ✅ Fix during business hours
False Positives ❌ High (alert fatigue) ✅ Low (contextual intelligence)
User Impact ❌ Users experience downtime first ✅ Problems fixed before user impact
Cost ❌ $100K-500K/year (Datadog, New Relic) ✅ $5K-150K/year (transparent pricing)

Ready to Try Predictive Monitoring?

See the difference contextual intelligence makes for your infrastructure.

Request a Demo

Proven Performance

Built in 158 hours with AI assistance. Production-ready from day one.

88%
Prediction Accuracy

On critical incidents (90+ risk score)

30-60 min
Early Warning Time

Before problems become critical

8 hours
Prediction Horizon

See the future of your infrastructure

<100ms
Inference Latency

Per server prediction (GPU-accelerated)

14
LINBORG Metrics

Production-ready monitoring

1000+
Servers Supported

Per deployment instance

Development Speed: 158 hours with AI assistance = 5-8x faster than traditional development (800-1,200 hours). Cost reduction: 76-93%. Same quality, fraction of the time.

See NordIQ in Action

Request a personalized demo. See how predictive monitoring works with your infrastructure.

What you'll get:

  • 30-minute live demo with the founder (Craig)
  • Walk through all 10 dashboard tabs
  • See predictions for your server profiles
  • Calculate ROI for your environment
  • Discuss integration with your existing tools
  • Get answers to technical questions
Email Craig for a Demo

Message us on Facebook - Direct access to the founder. No sales pressure.

Buy me a coffee