Anthropic Claudeその他⭐ リポ 0品質スコア 50/100

Classification Modeling

Name: Classification Modeling
Author: aj-geddes

ロジスティック回帰・決定木・アンサンブル法を用いて、カテゴリ予測や分類タスクに対応した二値・多クラス分類モデルを構築します。分類精度の向上を目的に、目的に応じた手法の選択からモデルの評価・改善まで一貫してサポートします。

description の原文を見る

Build binary and multiclass classification models using logistic regression, decision trees, and ensemble methods for categorical prediction and classification

SKILL.md 本文

分類モデリング

概要

分類モデリングは、カテゴリ目標値を予測し、入力特徴に基づいて観測値を離散的なクラスまたはカテゴリに割り当てます。

使用場面

顧客チャーン、ローン不履行、メールスパムなど二値結果の予測
製品タイプやセンチメント分析など複数カテゴリへの分類
クレジットスコアリングモデルやリスク評価システムの構築
患者データから疾患診断や医学的状態を識別
顧客の購買確度またはマーケティング応答の予測
本番システムにおける不正検知、異常検知、品質欠陥の検出

分類タイプ

二値分類: 2つのクラス（はい/いいえ、成功/失敗）
多クラス分類: 3つ以上のクラス
マルチラベル分類: 観測値ごとに複数のクラス

一般的なアルゴリズム

ロジスティック回帰: 線形分類
決定木: ルールベースの非線形
ランダムフォレスト: 決定木のアンサンブル
勾配ブースティング: 逐次的な木の構築
SVM: Support Vector Machines
ナイーブベイズ: 確率分類器

主要メトリクス

正確度（Accuracy）: 全体的な正答予測
適合率（Precision）: 真陽性 / (真陽性 + 偽陽性)
再現率（Recall）: 真陽性 / (真陽性 + 偽陰性)
F1スコア: 適合率と再現率の調和平均
AUC-ROC: 受信者動作特性曲線下の面積

Python での実装

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import (
    confusion_matrix, classification_report, roc_auc_score, roc_curve,
    precision_recall_curve, f1_score, accuracy_score
)
import seaborn as sns

# Generate sample binary classification data
np.random.seed(42)
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000, n_features=20, n_informative=10,
    n_redundant=5, random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression
lr_model = LogisticRegression(max_iter=1000)
lr_model.fit(X_train_scaled, y_train)
y_pred_lr = lr_model.predict(X_test_scaled)
y_proba_lr = lr_model.predict_proba(X_test_scaled)[:, 1]

print("Logistic Regression:")
print(classification_report(y_test, y_pred_lr))
print(f"AUC-ROC: {roc_auc_score(y_test, y_proba_lr):.4f}\n")

# Decision Tree
dt_model = DecisionTreeClassifier(max_depth=10, random_state=42)
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)
y_proba_dt = dt_model.predict_proba(X_test)[:, 1]

print("Decision Tree:")
print(classification_report(y_test, y_pred_dt))
print(f"AUC-ROC: {roc_auc_score(y_test, y_proba_dt):.4f}\n")

# Random Forest
rf_model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
y_proba_rf = rf_model.predict_proba(X_test)[:, 1]

print("Random Forest:")
print(classification_report(y_test, y_pred_rf))
print(f"AUC-ROC: {roc_auc_score(y_test, y_proba_rf):.4f}\n")

# Gradient Boosting
gb_model = GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42)
gb_model.fit(X_train, y_train)
y_pred_gb = gb_model.predict(X_test)
y_proba_gb = gb_model.predict_proba(X_test)[:, 1]

print("Gradient Boosting:")
print(classification_report(y_test, y_pred_gb))
print(f"AUC-ROC: {roc_auc_score(y_test, y_proba_gb):.4f}\n")

# Confusion matrices
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

models = [
    (y_pred_lr, 'Logistic Regression'),
    (y_pred_dt, 'Decision Tree'),
    (y_pred_rf, 'Random Forest'),
    (y_pred_gb, 'Gradient Boosting'),
]

for idx, (y_pred, title) in enumerate(models):
    cm = confusion_matrix(y_test, y_pred)
    ax = axes[idx // 2, idx % 2]
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax)
    ax.set_title(title)
    ax.set_ylabel('True Label')
    ax.set_xlabel('Predicted Label')

plt.tight_layout()
plt.show()

# ROC Curves
plt.figure(figsize=(10, 8))

probas = [
    (y_proba_lr, 'Logistic Regression'),
    (y_proba_dt, 'Decision Tree'),
    (y_proba_rf, 'Random Forest'),
    (y_proba_gb, 'Gradient Boosting'),
]

for y_proba, label in probas:
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    auc = roc_auc_score(y_test, y_proba)
    plt.plot(fpr, tpr, label=f'{label} (AUC={auc:.4f})')

plt.plot([0, 1], [0, 1], 'k--', label='Random Classifier')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves Comparison')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Precision-Recall Curves
plt.figure(figsize=(10, 8))

for y_proba, label in probas:
    precision, recall, _ = precision_recall_curve(y_test, y_proba)
    f1 = f1_score(y_test, (y_proba > 0.5).astype(int))
    plt.plot(recall, precision, label=f'{label} (F1={f1:.4f})')

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curves')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Feature importance
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Tree-based feature importance
feature_importance_rf = pd.Series(
    rf_model.feature_importances_, index=range(X.shape[1])
).sort_values(ascending=False)

axes[0].barh(range(10), feature_importance_rf.values[:10])
axes[0].set_yticks(range(10))
axes[0].set_yticklabels([f'Feature {i}' for i in feature_importance_rf.index[:10]])
axes[0].set_title('Random Forest - Top 10 Features')
axes[0].set_xlabel('Importance')

# Logistic regression coefficients
lr_coef = pd.Series(lr_model.coef_[0], index=range(X.shape[1])).abs().sort_values(ascending=False)
axes[1].barh(range(10), lr_coef.values[:10])
axes[1].set_yticks(range(10))
axes[1].set_yticklabels([f'Feature {i}' for i in lr_coef.index[:10]])
axes[1].set_title('Logistic Regression - Top 10 Features (abs coef)')
axes[1].set_xlabel('Absolute Coefficient')

plt.tight_layout()
plt.show()

# Model comparison
results = pd.DataFrame({
    'Model': ['Logistic Regression', 'Decision Tree', 'Random Forest', 'Gradient Boosting'],
    'Accuracy': [
        accuracy_score(y_test, y_pred_lr),
        accuracy_score(y_test, y_pred_dt),
        accuracy_score(y_test, y_pred_rf),
        accuracy_score(y_test, y_pred_gb),
    ],
    'AUC-ROC': [
        roc_auc_score(y_test, y_proba_lr),
        roc_auc_score(y_test, y_proba_dt),
        roc_auc_score(y_test, y_proba_rf),
        roc_auc_score(y_test, y_proba_gb),
    ],
    'F1-Score': [
        f1_score(y_test, y_pred_lr),
        f1_score(y_test, y_pred_dt),
        f1_score(y_test, y_pred_rf),
        f1_score(y_test, y_pred_gb),
    ]
})

print("Model Comparison:")
print(results)

# Cross-validation
cv_scores = cross_val_score(
    RandomForestClassifier(n_estimators=100, random_state=42),
    X_train, y_train, cv=5, scoring='roc_auc'
)
print(f"\nCross-validation AUC scores: {cv_scores}")
print(f"Mean CV AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std():.4f})")

# Probability calibration
from sklearn.calibration import calibration_curve

prob_true, prob_pred = calibration_curve(y_test, y_proba_rf, n_bins=10)

plt.figure(figsize=(8, 6))
plt.plot(prob_pred, prob_true, 'o-', label='Random Forest')
plt.plot([0, 1], [0, 1], 'k--', label='Perfect Calibration')
plt.xlabel('Mean Predicted Probability')
plt.ylabel('Fraction of Positives')
plt.title('Calibration Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

クラス不均衡への対応

オーバーサンプリング: マイノリティクラスサンプルの増加
アンダーサンプリング: マジョリティクラスサンプルの削減
SMOTE: 合成マイノリティオーバーサンプリング
クラスウェイト: マイノリティクラスの誤分類にペナルティを与える

しきい値選択

デフォルト（0.5）: 誤分類コストが同等
カスタムしきい値: ビジネス要件に基づく
最適値: F1スコアまたはAUCを最大化

成果物

分類メトリクス（正確度、適合率、再現率、F1）
すべてのモデルの混同行列
ROC曲線と適合率-再現率曲線
特徴重要度分析
モデル比較表
最適モデルの推奨事項
確率キャリブレーションプロット

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: aj-geddes
リポジトリ: aj-geddes/useful-ai-prompts
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/aj-geddes/useful-ai-prompts / ライセンス: MIT

Classification Modeling

SKILL.md 本文

分類モデリング

概要

使用場面

分類タイプ

一般的なアルゴリズム

主要メトリクス

Python での実装

クラス不均衡への対応

しきい値選択

成果物

詳細情報

関連スキル

superfluid

civ-finish-quotes

nookplot

web3-polymarket

ethskills

xxyy-trade