Anthropic Claudeその他⭐ リポ 0品質スコア 50/100

monitoring-expert

Name: monitoring-expert
Author: jeffallan

アプリケーションの監視システム設定、構造化ログパイプラインの実装、Prometheus/Grafana ダッシュボードの作成、アラートルールの定義、分散トレーシングの計装を行うスキルです。k6 や Artillery を用いた負荷テスト、CPU/メモリのボトルネックプロファイリング、インフラのキャパシティ計画にも対応します。本番環境の障害調査、サービスへのオブザーバビリティ追加、負荷テストの実施、キャパシティ予測などの場面で活用してください。

description の原文を見る

Configures monitoring systems, implements structured logging pipelines, creates Prometheus/Grafana dashboards, defines alerting rules, and instruments distributed tracing. Implements Prometheus/Grafana stacks, conducts load testing, performs application profiling, and plans infrastructure capacity. Use when setting up application monitoring, adding observability to services, debugging production issues with logs/metrics/traces, running load tests with k6 or Artillery, profiling CPU/memory bottlenecks, or forecasting capacity needs.

SKILL.md 本文

Monitoring Expert

包括的なモニタリング、アラート、トレース、パフォーマンステストシステムを実装する、オブザーバビリティとパフォーマンスの専門家。

コアワークフロー

評価 — モニタリング対象を識別する (SLI、クリティカルパス、ビジネスメトリクス)
計装 — アプリケーションにログ、メトリクス、トレースを追加する (以下の例を参照)
収集 — 集約とストレージを設定する (Prometheus scrape、ログシッパー、OTLPエンドポイント); 進める前にデータが到着することを確認する
可視化 — RED (Rate/Errors/Duration) または USE (Utilization/Saturation/Errors) メソッドを使用してダッシュボードを構築する
アラート — クリティカルパスで閾値異常アラートを定義する; 本番環境へのリリース前に誤検知の洪水がないことを確認する

クイックスタート例

Structured Logging (Node.js / Pino)

import pino from 'pino';

const logger = pino({ level: 'info' });

// Good — structured fields, includes correlation ID
logger.info({ requestId: req.id, userId: req.user.id, durationMs: elapsed }, 'order.created');

// Bad — string interpolation, no correlation
console.log(`Order created for user ${userId}`);

Prometheus Metrics (Node.js)

import { Counter, Histogram, register } from 'prom-client';

const httpRequests = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status'],
});

const httpDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request latency',
  labelNames: ['method', 'route'],
  buckets: [0.05, 0.1, 0.3, 0.5, 1, 2, 5],
});

// Instrument a route
app.use((req, res, next) => {
  const end = httpDuration.startTimer({ method: req.method, route: req.path });
  res.on('finish', () => {
    httpRequests.inc({ method: req.method, route: req.path, status: res.statusCode });
    end();
  });
  next();
});

// Expose scrape endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

OpenTelemetry Tracing (Node.js)

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { trace } from '@opentelemetry/api';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),
});
sdk.start();

// Manual span around a critical operation
const tracer = trace.getTracer('order-service');
async function processOrder(orderId) {
  const span = tracer.startSpan('order.process');
  span.setAttribute('order.id', orderId);
  try {
    const result = await db.saveOrder(orderId);
    span.setStatus({ code: SpanStatusCode.OK });
    return result;
  } catch (err) {
    span.recordException(err);
    span.setStatus({ code: SpanStatusCode.ERROR });
    throw err;
  } finally {
    span.end();
  }
}

Prometheus アラートルール

groups:
  - name: api.rules
    rules:
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5% on {{ $labels.route }}"

k6 ロードテスト

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // ramp up
    { duration: '5m', target: 50 },   // sustained load
    { duration: '1m', target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95th percentile < 500 ms
    http_req_failed:   ['rate<0.01'],  // error rate < 1%
  },
};

export default function () {
  const res = http.get('https://api.example.com/orders');
  check(res, { 'status is 200': (r) => r.status === 200 });
  sleep(1);
}

リファレンスガイド

コンテキストに基づいて詳細なガイダンスを読み込みます:

トピック	リファレンス	読み込むタイミング
ログ	`references/structured-logging.md`	Pino、JSON ログ
メトリクス	`references/prometheus-metrics.md`	Counter、Histogram、Gauge
トレース	`references/opentelemetry.md`	OpenTelemetry、span
アラート	`references/alerting-rules.md`	Prometheus アラート
ダッシュボード	`references/dashboards.md`	RED/USE メソッド、Grafana
パフォーマンステスト	`references/performance-testing.md`	ロードテスト、k6、Artillery、ベンチマーク
プロファイリング	`references/application-profiling.md`	CPU/メモリプロファイリング、ボトルネック
キャパシティプランニング	`references/capacity-planning.md`	スケーリング、予測、予算

制約

必須事項

Structured logging (JSON) を使用する
リクエスト ID を相関のために含める
クリティカルパスのアラートを設定する
技術メトリクスだけでなく、ビジネスメトリクスを監視する
適切なメトリクスタイプを使用する (counter/gauge/histogram)
ヘルスチェックエンドポイントを実装する

禁止事項

ログに機密データを記録しない (パスワード、トークン、PII)
すべてのエラーでアラートしない (アラート疲れ)
ログで文字列補間を使用しない (structured fields を使用する)
分散システムで相関 ID をスキップしない

Documentation

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: jeffallan
リポジトリ: jeffallan/claude-skills
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/jeffallan/claude-skills / ライセンス: MIT

monitoring-expert

SKILL.md 本文

Monitoring Expert

コアワークフロー

クイックスタート例

Structured Logging (Node.js / Pino)

Prometheus Metrics (Node.js)

OpenTelemetry Tracing (Node.js)

Prometheus アラートルール

k6 ロードテスト

リファレンスガイド

制約

必須事項

禁止事項

詳細情報

関連スキル

superfluid

civ-finish-quotes

nookplot

web3-polymarket

ethskills

xxyy-trade