Skip to content

AIOps Tools Toolkit

Toolkit Track | 4 Modules | ~3 hours total

The AIOps Tools Toolkit covers practical implementations of AIOps capabilities—from open-source anomaly detection libraries to enterprise correlation platforms. You’ll learn when to use Prophet vs. Isolation Forest, how commercial platforms like BigPanda and Moogsoft work, and how to build custom AIOps pipelines on Kubernetes.

This toolkit applies concepts from AIOps Discipline.

Before starting this toolkit:

  • AIOps Discipline — Complete the conceptual foundation
  • Observability Toolkit — Data collection layer
  • Python proficiency for anomaly detection exercises
  • Kubernetes basics for custom pipeline exercises
#ModuleComplexityTime
10.1Anomaly Detection Tools[MEDIUM]40-45 min
10.2Event Correlation Platforms[MEDIUM]40-45 min
10.3Observability AI Features[MEDIUM]40-45 min
10.4Building Custom AIOps[COMPLEX]50-60 min

After completing this toolkit, you will be able to:

  1. Choose anomaly detection tools — Prophet, Luminaire, PyOD for different use cases
  2. Evaluate correlation platforms — BigPanda, Moogsoft, PagerDuty AIOps
  3. Leverage observability AI — Datadog Watchdog, Dynatrace Davis, New Relic AI
  4. Build custom pipelines — Python + Kafka + Kubernetes for custom AIOps
WHICH AIOPS TOOL?
─────────────────────────────────────────────────────────────────
"I need time series anomaly detection with seasonality"
└──▶ Prophet (Facebook)
• Handles multiple seasonalities
• Trend detection
• Holiday effects
• Good for forecasting
"I need fast, streaming anomaly detection"
└──▶ Luminaire (Zillow)
• Real-time detection
• Minimal configuration
• Handles structural breaks
• Python-native
"I need multi-dimensional anomaly detection"
└──▶ PyOD / Isolation Forest
• High-dimensional data
• Multiple algorithms
• Scikit-learn compatible
• Good for logs, metrics together
"I need enterprise event correlation"
└──▶ BigPanda / Moogsoft
• Topology-aware correlation
• ML-based grouping
• Integration ecosystem
• SLA management
"I need AI built into my observability platform"
└──▶ Datadog Watchdog / Dynatrace Davis
• No additional setup
• Correlates with metrics/traces
• Auto-baselining
• Root cause suggestions
"I need custom AIOps for my unique requirements"
└──▶ Build with Python + Kafka + K8s
• Full control
• Domain-specific models
• Data stays in-house
• Higher engineering investment
┌─────────────────────────────────────────────────────────────────┐
│ AIOPS TOOL LANDSCAPE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ OPEN SOURCE / LIBRARIES │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ANOMALY DETECTION TIME SERIES ML TOOLKITS │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ PyOD │ │ Prophet │ │Scikit- │ │ │
│ │ │(library)│ │(Facebook│ │learn │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Luminaire│ │ Kats │ │ PyTorch │ │ │
│ │ │(Zillow) │ │(Facebook│ │LSTM etc.│ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ EVENT CORRELATION PLATFORMS │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │BigPanda │ │Moogsoft │ │PagerDuty│ │ServiceNow │ │
│ │ │ │ │ │ │ AIOps │ │ ITOM │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ OBSERVABILITY PLATFORM AI │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Datadog │ │Dynatrace│ │New Relic│ │ Splunk │ │ │
│ │ │Watchdog │ │ Davis │ │ AI │ │ ITSI │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ STREAM PROCESSING (For Custom Solutions) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Kafka │ │ Flink │ │ Spark │ │ Beam │ │ │
│ │ │Streams │ │ │ │Streaming│ │ │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
FactorBuild CustomBuy Platform
Time to valueMonthsDays/Weeks
CustomizationUnlimitedLimited
MaintenanceYour teamVendor
Data controlFullVendor dependent
IntegrationAnyEcosystem
Cost modelEngineering timeLicense + usage
Best forUnique requirementsStandard ops

Recommendation: Start with a platform (Datadog, PagerDuty), build custom components only for unique requirements.

Module 10.1: Anomaly Detection Tools
│ Prophet, Luminaire, PyOD
│ When to use each
Module 10.2: Event Correlation Platforms
│ BigPanda, Moogsoft, PagerDuty
│ Enterprise capabilities
Module 10.3: Observability AI Features
│ Built-in AI in monitoring platforms
│ Datadog, Dynatrace, New Relic
Module 10.4: Building Custom AIOps
│ Python + Kafka + Kubernetes
│ End-to-end custom pipeline
[Toolkit Complete] → Production AIOps!
ToolBest ForComplexityLicense
ProphetTime series with seasonalityLowMIT
LuminaireReal-time streamingLowApache 2.0
PyODMulti-dimensional, many algorithmsMediumBSD
Isolation ForestHigh-dimensional dataLowBSD (sklearn)
PlatformStrengthDeploymentPricing
BigPandaTopology correlationSaaSEnterprise
MoogsoftML-based clusteringSaaS/On-premEnterprise
PagerDuty AIOpsIncident management integrationSaaSTiered
ServiceNow ITOMITSM integrationSaaS/On-premEnterprise
PlatformAI FeatureBest For
DatadogWatchdogBroad monitoring, auto-detection
DynatraceDavis AIFull-stack, auto-instrumentation
New RelicApplied IntelligenceAPM-centric, anomaly detection
SplunkITSILog-heavy environments
INTEGRATED AIOPS ARCHITECTURE
─────────────────────────────────────────────────────────────────
DATA SOURCES
┌─────────┬─────────┬─────────┬─────────┬─────────┐
│ Metrics │ Logs │ Traces │ Events │ Changes │
└────┬────┴────┬────┴────┬────┴────┬────┴────┬────┘
│ │ │ │ │
└─────────┴─────────┴─────────┴─────────┘
COLLECTION ▼
┌─────────────────────┐
│ Observability Stack │
│ (Prometheus, OTel) │
└──────────┬──────────┘
ANALYSIS ▼
┌─────────────────────┐
│ AIOps Platform │
│ ┌──────┬──────┐ │
│ │Anomaly│Correl│ │
│ │Detect │ation │ │
│ └──────┴──────┘ │
│ ┌──────┬──────┐ │
│ │ RCA │Predict │
│ │Engine│ Ops │ │
│ └──────┴──────┘ │
└──────────┬──────────┘
ACTION ▼
┌─────────────────────┐
│ Remediation │
│ (Auto/Manual) │
└─────────────────────┘
ModuleKey Exercise
Anomaly DetectionBuild detector with Prophet + Luminaire
Event CorrelationEvaluate platform with sample data
Observability AIConfigure Datadog Watchdog alerts
Custom AIOpsBuild end-to-end pipeline on K8s
Open Source AIOps Stack
─────────────────────────────────────────────────────────────────
Prometheus + Grafana (Metrics)
Prophet/Luminaire (Anomaly Detection)
Custom Python (Correlation)
PagerDuty (Alerting)
Commercial AIOps Stack
─────────────────────────────────────────────────────────────────
Datadog (Full Observability)
├── Watchdog (Anomaly Detection)
BigPanda (Event Correlation)
PagerDuty (Incident Management)
Hybrid AIOps Stack
─────────────────────────────────────────────────────────────────
Prometheus + Datadog (Metrics)
├── Datadog Watchdog (Standard metrics)
├── Custom ML (Domain-specific)
Custom Correlation (Topology-aware)
Auto-Remediation (Kubernetes operators)

“The best AIOps tool is the one your team will actually use. Start simple, prove value, then expand.”