Anthropic Claudeその他⭐ リポ 0品質スコア 50/100

Statistical Hypothesis Testing

Name: Statistical Hypothesis Testing
Author: aj-geddes

t検定、カイ二乗検定、ANOVA、p値分析などの統計的検定を実施し、仮説の検証や統計的有意性の確認、A/Bテストの評価を行います。データから有意差を判断したい場面や、実験結果の信頼性を統計的に裏付けたい場合に活用できます。

description の原文を見る

Conduct statistical tests including t-tests, chi-square, ANOVA, and p-value analysis for statistical significance, hypothesis validation, and A/B testing

SKILL.md 本文

Statistical Hypothesis Testing

概要

仮説検定は、観測された差異が統計的に有意であるか、あるいは偶然によるものかをテストすることで、データに基づいた意思決定を行うためのフレームワークを提供します。

テストフレームワーク

帰無仮説 (H0): 効果や差異は存在しない
対立仮説 (H1): 効果や差異が存在する
有意水準 (α): H0を棄却するための閾値（通常0.05）
P値: H0が真である場合にデータを観測する確率

一般的なテスト

T検定: 2つのグループ間の平均値を比較
ANOVA: 複数のグループ間の平均値を比較
カイ二乗検定: カテゴリカル変数の独立性をテスト
Mann-Whitney U検定: T検定のノンパラメトリック代替案
Kruskal-Wallis検定: ANOVAのノンパラメトリック代替案

Pythonでの実装

import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# サンプルデータ
group_a = np.random.normal(100, 15, 50)  # Mean=100, SD=15
group_b = np.random.normal(105, 15, 50)  # Mean=105, SD=15

# テスト1: 独立サンプルのT検定
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"T-test: t={t_stat:.4f}, p-value={p_value:.4f}")
if p_value < 0.05:
    print("Reject null hypothesis: Groups are significantly different")
else:
    print("Fail to reject null hypothesis: No significant difference")

# テスト2: 対応のあるT検定（同じ被験者、2つの条件）
before = np.array([85, 90, 88, 92, 87, 89, 91, 86, 88, 90])
after = np.array([92, 95, 91, 98, 94, 96, 99, 93, 95, 97])

t_stat, p_value = stats.ttest_rel(before, after)
print(f"\nPaired t-test: t={t_stat:.4f}, p-value={p_value:.4f}")

# テスト3: 一元配置分散分析（複数グループ）
group1 = np.random.normal(100, 10, 30)
group2 = np.random.normal(105, 10, 30)
group3 = np.random.normal(102, 10, 30)

f_stat, p_value = stats.f_oneway(group1, group2, group3)
print(f"\nANOVA: F={f_stat:.4f}, p-value={p_value:.4f}")

# テスト4: カイ二乗検定（カテゴリカル変数）
# 分割表を作成
contingency = np.array([
    [50, 30],  # Control: success, failure
    [45, 35]   # Treatment: success, failure
])

chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
print(f"\nChi-square: χ²={chi2:.4f}, p-value={p_value:.4f}")

# テスト5: Mann-Whitney U検定（ノンパラメトリック）
u_stat, p_value = stats.mannwhitneyu(group_a, group_b)
print(f"\nMann-Whitney U: U={u_stat:.4f}, p-value={p_value:.4f}")

# 可視化
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 分布の比較
axes[0, 0].hist(group_a, alpha=0.5, label='Group A', bins=20)
axes[0, 0].hist(group_b, alpha=0.5, label='Group B', bins=20)
axes[0, 0].set_title('Group Distributions')
axes[0, 0].legend()

# 正規性のQ-Qプロット
stats.probplot(group_a, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot (Group A)')

# 前後の比較
axes[1, 0].plot(before, 'o-', label='Before', alpha=0.7)
axes[1, 0].plot(after, 's-', label='After', alpha=0.7)
axes[1, 0].set_title('Paired Comparison')
axes[1, 0].legend()

# 効果量（Cohen's d）
cohens_d = (np.mean(group_a) - np.mean(group_b)) / np.sqrt(
    ((len(group_a)-1)*np.var(group_a, ddof=1) +
     (len(group_b)-1)*np.var(group_b, ddof=1)) /
    (len(group_a) + len(group_b) - 2)
)
axes[1, 1].text(0.5, 0.5, f"Cohen's d = {cohens_d:.4f}",
                ha='center', va='center', fontsize=14)
axes[1, 1].axis('off')

plt.tight_layout()
plt.show()

# 正規性検定（Shapiro-Wilk）
stat, p = stats.shapiro(group_a)
print(f"\nShapiro-Wilk normality test: W={stat:.4f}, p-value={p:.4f}")

# 効果量の計算
def calculate_effect_size(group1, group2):
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    cohens_d = (np.mean(group1) - np.mean(group2)) / pooled_std
    return cohens_d

effect_size = calculate_effect_size(group_a, group_b)
print(f"Effect size (Cohen's d): {effect_size:.4f}")

# 信頼区間
from scipy.stats import t as t_dist

def calculate_ci(data, confidence=0.95):
    n = len(data)
    mean = np.mean(data)
    se = np.std(data, ddof=1) / np.sqrt(n)
    margin = t_dist.ppf((1 + confidence) / 2, n - 1) * se
    return mean - margin, mean + margin

ci = calculate_ci(group_a)
print(f"95% CI for Group A: ({ci[0]:.2f}, {ci[1]:.2f})")

# 追加のテストと可視化

# テスト6: Levene検定（等分散性）
stat_levene, p_levene = stats.levene(group_a, group_b)
print(f"\nLevene's Test for Equal Variance:")
print(f"Statistic: {stat_levene:.4f}, P-value: {p_levene:.4f}")

# テスト7: Welch's t検定（等分散を仮定しない）
t_stat_welch, p_welch = stats.ttest_ind(group_a, group_b, equal_var=False)
print(f"\nWelch's t-test (unequal variance):")
print(f"t-stat: {t_stat_welch:.4f}, p-value: {p_welch:.4f}")

# 検定力分析
from scipy.stats import nct
def calculate_power(effect_size, sample_size, alpha=0.05):
    t_critical = stats.t.ppf(1 - alpha/2, 2*sample_size - 2)
    ncp = effect_size * np.sqrt(sample_size / 2)
    power = 1 - stats.nct.cdf(t_critical, 2*sample_size - 2, ncp)
    return power

power = calculate_power(abs(effect_size), len(group_a))
print(f"\nStatistical Power: {power:.2%}")

# ブートストラップ信頼区間
def bootstrap_ci(data, n_bootstrap=10000, ci=95):
    bootstrap_means = []
    for _ in range(n_bootstrap):
        sample = np.random.choice(data, size=len(data), replace=True)
        bootstrap_means.append(np.mean(sample))
    lower = np.percentile(bootstrap_means, (100-ci)/2)
    upper = np.percentile(bootstrap_means, ci + (100-ci)/2)
    return lower, upper

boot_ci = bootstrap_ci(group_a)
print(f"\nBootstrap 95% CI for Group A: ({boot_ci[0]:.2f}, {boot_ci[1]:.2f})")

# 多重検定補正（Bonferroni）
num_tests = 4
bonferroni_alpha = 0.05 / num_tests
print(f"\nBonferroni Corrected Alpha: {bonferroni_alpha:.4f}")
print(f"Use this threshold for {num_tests} tests")

# テスト8: Kruskal-Wallis検定（ノンパラメトリックANOVA）
h_stat, p_kw = stats.kruskal(group1, group2, group3)
print(f"\nKruskal-Wallis Test (non-parametric ANOVA):")
print(f"H-statistic: {h_stat:.4f}, p-value: {p_kw:.4f}")

# ANOVAの効果量
f_stat, p_anova = stats.f_oneway(group1, group2, group3)
# イータ二乗を計算
grand_mean = np.mean([group1, group2, group3])
ss_between = sum(len(g) * (np.mean(g) - grand_mean)**2 for g in [group1, group2, group3])
ss_total = sum((x - grand_mean)**2 for g in [group1, group2, group3] for x in g)
eta_squared = ss_between / ss_total
print(f"\nEffect Size (Eta-squared): {eta_squared:.4f}")

解釈ガイドライン

p < 0.05: 統計的に有意（H0を棄却）
p ≥ 0.05: 統計的に有意でない（H0を棄却できない）
効果量: 差異の大きさ（小/中/大）
信頼区間: もっともらしいパラメータ値の範囲

仮定チェックリスト

観測値の独立性
分布の正規性（パラメトリック検定）
分散の等質性
適切なサンプルサイズ
無作為抽出

よくある落とし穴

P値の誤解釈
補正なしでの多重検定
効果量の無視
テスト仮定の違反
相関と因果関係の混同

成果物

P値と検定統計量を含むテスト結果
効果量の計算
分布の可視化
信頼区間
解釈とビジネスへの影響

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: aj-geddes
リポジトリ: aj-geddes/useful-ai-prompts
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/aj-geddes/useful-ai-prompts / ライセンス: MIT

Statistical Hypothesis Testing

SKILL.md 本文

Statistical Hypothesis Testing

概要

テストフレームワーク

一般的なテスト

Pythonでの実装

解釈ガイドライン

仮定チェックリスト

よくある落とし穴

成果物

詳細情報

関連スキル

superfluid

civ-finish-quotes

nookplot

web3-polymarket

ethskills

xxyy-trade