Anthropic Claudeその他⭐ リポ 0品質スコア 50/100

Correlation Analysis

Name: Correlation Analysis
Author: aj-geddes

相関係数・相関行列・関連性検定を用いて変数間の関係を定量化するスキルです。関係性の分析や多重共線性の検出など、データ間のつながりを明らかにしたい場面で活躍します。

description の原文を見る

Measure relationships between variables using correlation coefficients, correlation matrices, and association tests for correlation measurement, relationship analysis, and multicollinearity detection

SKILL.md 本文

相関分析

概要

相関分析は変数間の関係の強さと方向を測定し、どの特徴が関連しているかを特定し、多重共線性を検出するのに役立ちます。

使用する場合

数値変数間の関係を特定する
回帰モデリング前に多重共線性を検出する
特徴量の依存関係を理解するための探索的データ分析
特徴量選択と次元削減
変数間の関係に関する仮定を検証する
線形および非線形の関連性を比較する

相関の種類

Pearson: 線形相関（連続変数）
Spearman: 順位ベースの相関（順序データ/非線形）
Kendall: 順位相関（堅牢な選択肢）
Cramér's V: カテゴリ変数の関連性
相互情報量: 非線形依存関係

主要概念

相関係数: -1 から +1 の範囲
正の相関: 変数が一緒に動く
負の相関: 変数が反対に動く
多重共線性: 予測変数間の高い相関

Python での実装

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, spearmanr, kendalltau

# サンプルデータ
np.random.seed(42)
n = 200
age = np.random.uniform(20, 70, n)
income = age * 2000 + np.random.normal(0, 10000, n)
education_years = age / 2 + np.random.normal(0, 3, n)
satisfaction = income / 50000 + np.random.normal(0, 0.5, n)

df = pd.DataFrame({
    'age': age,
    'income': income,
    'education_years': education_years,
    'satisfaction': satisfaction,
    'years_employed': age - education_years - 6
})

# Pearson 相関（線形）
corr_matrix = df.corr(method='pearson')
print("Pearson Correlation Matrix:")
print(corr_matrix)

# p値付きの個別相関
corr_coef, p_value = pearsonr(df['age'], df['income'])
print(f"\nPearson correlation (age vs income): r={corr_coef:.4f}, p-value={p_value:.4f}")

# Spearman 相関（順位ベース）
spearman_matrix = df.corr(method='spearman')
print("\nSpearman Correlation Matrix:")
print(spearman_matrix)

spearman_coef, p_value = spearmanr(df['age'], df['income'])
print(f"Spearman correlation (age vs income): rho={spearman_coef:.4f}, p-value={p_value:.4f}")

# Kendall tau 相関
kendall_coef, p_value = kendalltau(df['age'], df['income'])
print(f"Kendall correlation (age vs income): tau={kendall_coef:.4f}, p-value={p_value:.4f}")

# 相関ヒートマップ
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Pearson ヒートマップ
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0,
            square=True, ax=axes[0], vmin=-1, vmax=1)
axes[0].set_title('Pearson Correlation Heatmap')

# Spearman ヒートマップ
sns.heatmap(spearman_matrix, annot=True, cmap='coolwarm', center=0,
            square=True, ax=axes[1], vmin=-1, vmax=1)
axes[1].set_title('Spearman Correlation Heatmap')

plt.tight_layout()
plt.show()

# 有意性検定付きの相関
def correlation_with_pvalue(df):
    rows, cols = [], []
    for col1 in df.columns:
        for col2 in df.columns:
            if col1 < col2:  # 重複を避ける
                r, p = pearsonr(df[col1], df[col2])
                rows.append({
                    'Variable 1': col1,
                    'Variable 2': col2,
                    'Correlation': r,
                    'P-value': p,
                    'Significant': 'Yes' if p < 0.05 else 'No'
                })
    return pd.DataFrame(rows)

corr_table = correlation_with_pvalue(df)
print("\nCorrelation with P-values:")
print(corr_table)

# 回帰直線付きの散布図
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

pairs = [('age', 'income'), ('age', 'education_years'),
         ('income', 'satisfaction'), ('education_years', 'years_employed')]

for idx, (var1, var2) in enumerate(pairs):
    ax = axes[idx // 2, idx % 2]
    ax.scatter(df[var1], df[var2], alpha=0.5)

    # 回帰直線を追加
    z = np.polyfit(df[var1], df[var2], 1)
    p = np.poly1d(z)
    x_line = np.linspace(df[var1].min(), df[var1].max(), 100)
    ax.plot(x_line, p(x_line), "r--", linewidth=2)

    r, p_val = pearsonr(df[var1], df[var2])
    ax.set_title(f'{var1} vs {var2}\nr={r:.4f}, p={p_val:.4f}')
    ax.set_xlabel(var1)
    ax.set_ylabel(var2)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 多重共線性の検出（VIF）
from statsmodels.stats.outliers_influence import variance_inflation_factor

X = df[['age', 'education_years', 'years_employed']]
vif_data = pd.DataFrame()
vif_data['Variable'] = X.columns
vif_data['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print("\nVariance Inflation Factor (VIF):")
print(vif_data)
print("\nVIF > 10: High multicollinearity")
print("VIF > 5: Moderate multicollinearity")

# 偏相関（交絡変数を制御）
def partial_correlation(df, x, y, control_vars):
    from scipy.stats import linregress

    # 制御変数を除去した後の x の残差
    x_residuals = df[x] - np.poly1d(
        np.polyfit(df[control_vars].values, df[x], deg=1)
    )(df[control_vars].values)

    # 制御変数を除去した後の y の残差
    y_residuals = df[y] - np.poly1d(
        np.polyfit(df[control_vars].values, df[y], deg=1)
    )(df[control_vars].values)

    return pearsonr(x_residuals, y_residuals)[0]

partial_corr = partial_correlation(df, 'income', 'satisfaction', ['age'])
print(f"\nPartial correlation (income vs satisfaction, controlling for age): {partial_corr:.4f}")

# 距離相関（非線形関係）
try:
    from dcor import distance_correlation
    dist_corr = distance_correlation(df['age'], df['income'])
    print(f"Distance correlation (age vs income): {dist_corr:.4f}")
except ImportError:
    print("dcor library not installed for distance correlation")

# 時間を通じた相関の安定性
fig, ax = plt.subplots(figsize=(12, 5))

rolling_corr = df['age'].rolling(window=50).corr(df['income'])
ax.plot(rolling_corr.index, rolling_corr.values)
ax.set_title('Rolling Correlation (age vs income, window=50)')
ax.set_ylabel('Correlation Coefficient')
ax.grid(True, alpha=0.3)
plt.show()

解釈ガイドライン

|r| = 0.0-0.3: 弱い相関
|r| = 0.3-0.7: 中程度の相関
|r| = 0.7-1.0: 強い相関
p < 0.05: 統計的に有意
VIF が高い（>10）: 多重共線性の問題

重要な注記

相関 ≠ 因果関係
Pearson では非線形関係を見落とす可能性
外れ値は相関を歪める可能性
サンプルサイズは有意性に影響
時間的トレンドは見かけの相関を作成する可能性

ビジュアライゼーション戦略

ヒートマップによる概要
関係性を見るための散布図
多変量分析のためのペアプロット
時間変動する関係のための移動相関

成果物

相関行列（Pearson、Spearman）
注釈付きの相関ヒートマップ
統計的有意性テーブル
回帰直線付きの散布図
多重共線性評価（VIF）
偏相関分析
関係性解釈レポート

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: aj-geddes
リポジトリ: aj-geddes/useful-ai-prompts
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/aj-geddes/useful-ai-prompts / ライセンス: MIT

Correlation Analysis

SKILL.md 本文

相関分析

概要

使用する場合

相関の種類

主要概念

Python での実装

解釈ガイドライン

重要な注記

ビジュアライゼーション戦略

成果物

詳細情報

関連スキル

superfluid

civ-finish-quotes

nookplot

web3-polymarket

ethskills

xxyy-trade