汎用LLM・AI開発⭐ リポ 2品質スコア 74/100

agent-governance

AIエージェントシステムにガバナンス、セーフティ、トラスト制御を追加するためのパターンと手法です。以下の場合に活用できます： - 外部ツール（API、データベース、ファイルシステム）を呼び出すAIエージェントの構築 - エージェントのツール利用に対するポリシーベースのアクセス制御の実装 - 危険なプロンプトを検出するための意図分類の追加 - マルチエージェントワークフロー向けのトラストスコアリングシステムの構築 - エージェントのアクション・判断の監査ログの作成 - エージェントに対するレート制限、コンテンツフィルタ、ツール制限の実装 - 各種エージェントフレームワーク（PydanticAI、CrewAI、OpenAI Agents、LangChain、AutoGen）での活用

description の原文を見る

Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when: - Building AI agents that call external tools (APIs, databases, file systems) - Implementing policy-based access controls for agent tool usage - Adding semantic intent classification to detect dangerous prompts - Creating trust scoring systems for multi-agent workflows - Building audit trails for agent actions and decisions - Enforcing rate limits, content filters, or tool restrictions on agents - Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen)

SKILL.md 本文

エージェントガバナンスパターン

AI エージェントシステムに安全性、信頼、ポリシー適用を追加するためのパターン。

概要

ガバナンスパターンにより、AI エージェントが定義された境界内で動作することを保証します。呼び出せるツール、処理できるコンテンツ、実行できる量を制御し、監査証跡を通じてアカウンタビリティを維持します。

ユーザー要求 → インテント分類 → ポリシーチェック → ツール実行 → 監査ログ
                   ↓                ↓            ↓
            脅威検出         許可/拒否      信頼スコア更新

使用時期

ツールアクセス付きエージェント: 外部ツール(API、データベース、シェルコマンド)を呼び出すすべてのエージェント
マルチエージェントシステム: 他のエージェントに委譲するエージェントは信頼境界が必要
本番環境デプロイメント: コンプライアンス、監査、安全性の要件
機密操作: 金融取引、データアクセス、インフラストラクチャ管理

パターン 1: ガバナンスポリシー

エージェントが実行できることを合成可能でシリアライズ可能なポリシーオブジェクトとして定義します。

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
import re

class PolicyAction(Enum):
    ALLOW = "allow"
    DENY = "deny"
    REVIEW = "review"  # flag for human review

@dataclass
class GovernancePolicy:
    """Declarative policy controlling agent behavior."""
    name: str
    allowed_tools: list[str] = field(default_factory=list)       # allowlist
    blocked_tools: list[str] = field(default_factory=list)       # blocklist
    blocked_patterns: list[str] = field(default_factory=list)    # content filters
    max_calls_per_request: int = 100                             # rate limit
    require_human_approval: list[str] = field(default_factory=list)  # tools needing approval

    def check_tool(self, tool_name: str) -> PolicyAction:
        """Check if a tool is allowed by this policy."""
        if tool_name in self.blocked_tools:
            return PolicyAction.DENY
        if tool_name in self.require_human_approval:
            return PolicyAction.REVIEW
        if self.allowed_tools and tool_name not in self.allowed_tools:
            return PolicyAction.DENY
        return PolicyAction.ALLOW

    def check_content(self, content: str) -> Optional[str]:
        """Check content against blocked patterns. Returns matched pattern or None."""
        for pattern in self.blocked_patterns:
            if re.search(pattern, content, re.IGNORECASE):
                return pattern
        return None

ポリシー合成

複数のポリシーを組み合わせます(例: 組織全体 + チーム + エージェント固有):

def compose_policies(*policies: GovernancePolicy) -> GovernancePolicy:
    """Merge policies with most-restrictive-wins semantics."""
    combined = GovernancePolicy(name="composed")

    for policy in policies:
        combined.blocked_tools.extend(policy.blocked_tools)
        combined.blocked_patterns.extend(policy.blocked_patterns)
        combined.require_human_approval.extend(policy.require_human_approval)
        combined.max_calls_per_request = min(
            combined.max_calls_per_request,
            policy.max_calls_per_request
        )
        if policy.allowed_tools:
            if combined.allowed_tools:
                combined.allowed_tools = [
                    t for t in combined.allowed_tools if t in policy.allowed_tools
                ]
            else:
                combined.allowed_tools = list(policy.allowed_tools)

    return combined


# 使用例: 広範から固有へとポリシーをレイヤー化
org_policy = GovernancePolicy(
    name="org-wide",
    blocked_tools=["shell_exec", "delete_database"],
    blocked_patterns=[r"(?i)(api[_-]?key|secret|password)\s*[:=]"],
    max_calls_per_request=50
)
team_policy = GovernancePolicy(
    name="data-team",
    allowed_tools=["query_db", "read_file", "write_report"],
    require_human_approval=["write_report"]
)
agent_policy = compose_policies(org_policy, team_policy)

YAML としてのポリシー

ポリシーをコードではなく設定として保存します:

# governance-policy.yaml
name: production-agent
allowed_tools:
  - search_documents
  - query_database
  - send_email
blocked_tools:
  - shell_exec
  - delete_record
blocked_patterns:
  - "(?i)(api[_-]?key|secret|password)\\s*[:=]"
  - "(?i)(drop|truncate|delete from)\\s+\\w+"
max_calls_per_request: 25
require_human_approval:
  - send_email

import yaml

def load_policy(path: str) -> GovernancePolicy:
    with open(path) as f:
        data = yaml.safe_load(f)
    return GovernancePolicy(**data)

パターン 2: セマンティックインテント分類

パターンベースのシグナルを使用して、プロンプトがエージェントに到達する前に危険なインテントを検出します。

from dataclasses import dataclass

@dataclass
class IntentSignal:
    category: str       # e.g., "data_exfiltration", "privilege_escalation"
    confidence: float   # 0.0 to 1.0
    evidence: str       # what triggered the detection

# 脅威検出用の加重シグナルパターン
THREAT_SIGNALS = [
    # データ流出
    (r"(?i)send\s+(all|every|entire)\s+\w+\s+to\s+", "data_exfiltration", 0.8),
    (r"(?i)export\s+.*\s+to\s+(external|outside|third.?party)", "data_exfiltration", 0.9),
    (r"(?i)curl\s+.*\s+-d\s+", "data_exfiltration", 0.7),

    # 権限昇格
    (r"(?i)(sudo|as\s+root|admin\s+access)", "privilege_escalation", 0.8),
    (r"(?i)chmod\s+777", "privilege_escalation", 0.9),

    # システム改変
    (r"(?i)(rm\s+-rf|del\s+/[sq]|format\s+c:)", "system_destruction", 0.95),
    (r"(?i)(drop\s+database|truncate\s+table)", "system_destruction", 0.9),

    # プロンプトインジェクション
    (r"(?i)ignore\s+(previous|above|all)\s+(instructions?|rules?)", "prompt_injection", 0.9),
    (r"(?i)you\s+are\s+now\s+(a|an)\s+", "prompt_injection", 0.7),
]

def classify_intent(content: str) -> list[IntentSignal]:
    """コンテンツを脅威シグナルについて分類します。"""
    signals = []
    for pattern, category, weight in THREAT_SIGNALS:
        match = re.search(pattern, content)
        if match:
            signals.append(IntentSignal(
                category=category,
                confidence=weight,
                evidence=match.group()
            ))
    return signals

def is_safe(content: str, threshold: float = 0.7) -> bool:
    """簡易チェック: コンテンツは指定されたしきい値以上で安全ですか?"""
    signals = classify_intent(content)
    return not any(s.confidence >= threshold for s in signals)

重要な洞察: インテント分類はツール実行の前に発生し、飛行前の安全チェックとして機能します。これは出力ガードレール(生成後にのみチェック)とは根本的に異なります。

パターン 3: ツールレベルガバナンスデコレータ

個々のツール関数をガバナンスチェックでラップします:

import functools
import time
from collections import defaultdict

_call_counters: dict[str, int] = defaultdict(int)

def govern(policy: GovernancePolicy, audit_trail=None):
    """ツール関数にガバナンスポリシーを適用するデコレータ。"""
    def decorator(func):
        @functools.wraps(func)
        async def wrapper(*args, **kwargs):
            tool_name = func.__name__

            # 1. ツールの許可リスト/ブロックリストをチェック
            action = policy.check_tool(tool_name)
            if action == PolicyAction.DENY:
                raise PermissionError(f"Policy '{policy.name}' blocks tool '{tool_name}'")
            if action == PolicyAction.REVIEW:
                raise PermissionError(f"Tool '{tool_name}' requires human approval")

            # 2. レート制限をチェック
            _call_counters[policy.name] += 1
            if _call_counters[policy.name] > policy.max_calls_per_request:
                raise PermissionError(f"Rate limit exceeded: {policy.max_calls_per_request} calls")

            # 3. 引数のコンテンツをチェック
            for arg in list(args) + list(kwargs.values()):
                if isinstance(arg, str):
                    matched = policy.check_content(arg)
                    if matched:
                        raise PermissionError(f"Blocked pattern detected: {matched}")

            # 4. 実行して監査
            start = time.monotonic()
            try:
                result = await func(*args, **kwargs)
                if audit_trail is not None:
                    audit_trail.append({
                        "tool": tool_name,
                        "action": "allowed",
                        "duration_ms": (time.monotonic() - start) * 1000,
                        "timestamp": time.time()
                    })
                return result
            except Exception as e:
                if audit_trail is not None:
                    audit_trail.append({
                        "tool": tool_name,
                        "action": "error",
                        "error": str(e),
                        "timestamp": time.time()
                    })
                raise

        return wrapper
    return decorator


# あらゆるエージェントフレームワークとの使用例
audit_log = []
policy = GovernancePolicy(
    name="search-agent",
    allowed_tools=["search", "summarize"],
    blocked_patterns=[r"(?i)password"],
    max_calls_per_request=10
)

@govern(policy, audit_trail=audit_log)
async def search(query: str) -> str:
    """ドキュメントを検索 — ポリシーで管理されます。"""
    return f"Results for: {query}"

# パス: search("latest quarterly report")
# ブロック: search("show me the admin password")

パターン 4: 信頼スコアリング

減衰ベースの信頼スコアで時間をかけてエージェントの信頼性を追跡します:

from dataclasses import dataclass, field
import math
import time

@dataclass
class TrustScore:
    """時間減衰を伴う信頼スコア。"""
    score: float = 0.5          # 0.0 (信頼できない) から 1.0 (完全に信頼)
    successes: int = 0
    failures: int = 0
    last_updated: float = field(default_factory=time.time)

    def record_success(self, reward: float = 0.05):
        self.successes += 1
        self.score = min(1.0, self.score + reward * (1 - self.score))
        self.last_updated = time.time()

    def record_failure(self, penalty: float = 0.15):
        self.failures += 1
        self.score = max(0.0, self.score - penalty * self.score)
        self.last_updated = time.time()

    def current(self, decay_rate: float = 0.001) -> float:
        """時間減衰を伴うスコアを取得 — 活動がないと信頼は低下します。"""
        elapsed = time.time() - self.last_updated
        decay = math.exp(-decay_rate * elapsed)
        return self.score * decay

    @property
    def reliability(self) -> float:
        total = self.successes + self.failures
        return self.successes / total if total > 0 else 0.0


# マルチエージェントシステムでの使用例
trust = TrustScore()

# エージェントが正常にタスクを完了
trust.record_success()  # 0.525
trust.record_success()  # 0.549

# エージェントがエラーを発生させる
trust.record_failure()  # 0.467

# 信頼に基づいて機密操作をゲート化
if trust.current() >= 0.7:
    # 自動動作を許可
    pass
elif trust.current() >= 0.4:
    # 人間の監視付きで許可
    pass
else:
    # 拒否または明示的な承認が必要
    pass

マルチエージェント信頼: エージェントが他のエージェントに委譲するシステムでは、各エージェントはその委任者の信頼スコアを維持します:

class AgentTrustRegistry:
    def __init__(self):
        self.scores: dict[str, TrustScore] = {}

    def get_trust(self, agent_id: str) -> TrustScore:
        if agent_id not in self.scores:
            self.scores[agent_id] = TrustScore()
        return self.scores[agent_id]

    def most_trusted(self, agents: list[str]) -> str:
        return max(agents, key=lambda a: self.get_trust(a).current())

    def meets_threshold(self, agent_id: str, threshold: float) -> bool:
        return self.get_trust(agent_id).current() >= threshold

パターン 5: 監査証跡

すべてのエージェントアクションの追記専用監査ログ — コンプライアンスとデバッグに不可欠:

from dataclasses import dataclass, field
import json
import time

@dataclass
class AuditEntry:
    timestamp: float
    agent_id: str
    tool_name: str
    action: str           # "allowed", "denied", "error"
    policy_name: str
    details: dict = field(default_factory=dict)

class AuditTrail:
    """エージェントガバナンスイベント用の追記専用監査証跡。"""
    def __init__(self):
        self._entries: list[AuditEntry] = []

    def log(self, agent_id: str, tool_name: str, action: str,
            policy_name: str, **details):
        self._entries.append(AuditEntry(
            timestamp=time.time(),
            agent_id=agent_id,
            tool_name=tool_name,
            action=action,
            policy_name=policy_name,
            details=details
        ))

    def denied(self) -> list[AuditEntry]:
        """すべての拒否されたアクションを取得 — セキュリティレビューに有用。"""
        return [e for e in self._entries if e.action == "denied"]

    def by_agent(self, agent_id: str) -> list[AuditEntry]:
        return [e for e in self._entries if e.agent_id == agent_id]

    def export_jsonl(self, path: str):
        """JSON Lines としてエクスポート — ログ集約システム向け。"""
        with open(path, "w") as f:
            for entry in self._entries:
                f.write(json.dumps({
                    "timestamp": entry.timestamp,
                    "agent_id": entry.agent_id,
                    "tool": entry.tool_name,
                    "action": entry.action,
                    "policy": entry.policy_name,
                    **entry.details
                }) + "\n")

パターン 6: フレームワーク統合

PydanticAI

from pydantic_ai import Agent

policy = GovernancePolicy(
    name="support-bot",
    allowed_tools=["search_docs", "create_ticket"],
    blocked_patterns=[r"(?i)(ssn|social\s+security|credit\s+card)"],
    max_calls_per_request=20
)

agent = Agent("openai:gpt-4o", system_prompt="You are a support assistant.")

@agent.tool
@govern(policy)
async def search_docs(ctx, query: str) -> str:
    """ナレッジベースを検索 — 管理されます。"""
    return await kb.search(query)

@agent.tool
@govern(policy)
async def create_ticket(ctx, title: str, body: str) -> str:
    """サポートチケットを作成 — 管理されます。"""
    return await tickets.create(title=title, body=body)

CrewAI

from crewai import Agent, Task, Crew

policy = GovernancePolicy(
    name="research-crew",
    allowed_tools=["search", "analyze"],
    max_calls_per_request=30
)

# クルーレベルでガバナンスを適用
def governed_crew_run(crew: Crew, policy: GovernancePolicy):
    """ガバナンスチェック付きでクルー実行をラップします。"""
    audit = AuditTrail()
    for agent in crew.agents:
        for tool in agent.tools:
            original = tool.func
            tool.func = govern(policy, audit_trail=audit)(original)
    result = crew.kickoff()
    return result, audit

OpenAI Agents SDK

from agents import Agent, function_tool

policy = GovernancePolicy(
    name="coding-agent",
    allowed_tools=["read_file", "write_file", "run_tests"],
    blocked_tools=["shell_exec"],
    max_calls_per_request=50
)

@function_tool
@govern(policy)
async def read_file(path: str) -> str:
    """ファイルコンテンツを読取 — 管理されます。"""
    import os
    safe_path = os.path.realpath(path)
    if not safe_path.startswith(os.path.realpath(".")):
        raise ValueError("Path traversal blocked by governance")
    with open(safe_path) as f:
        return f.read()

ガバナンスレベル

ガバナンスの厳密性をリスクレベルに合わせます:

レベル	コントロール	ユースケース
オープン	監査のみ、制限なし	内部開発/テスト
標準	ツール許可リスト + コンテンツフィルタ	一般的な本番環境エージェント
厳密	すべてのコントロール + 機密操作の人間承認	金融、医療、法務
ロック済み	許可リストのみ、動的ツール不可、完全監査	コンプライアンス重視システム

ベストプラクティス

プラクティス	根拠
ポリシーを設定として	YAML/JSON にポリシーを保存、ハードコード不可 — デプロイなしで変更可能
最も制限的なものが勝利	ポリシーを合成する場合、拒否は常に許可を上書き
飛行前インテントチェック	ツール実行後ではなく前にインテント分類
信頼減衰	信頼スコアは時間とともに減衰 — 継続的な良好な動作が必要
追記専用監査	監査エントリを決して改変または削除しない — 不変性がコンプライアンスを実現
閉鎖的に失敗	ガバナンスチェックがエラーする場合、許可するのではなく拒否
ポリシーをロジックから分離	ガバナンス適用はエージェントビジネスロジックから独立

クイックスタートチェックリスト

## エージェントガバナンス実装チェックリスト

### セットアップ
- [ ] ガバナンスポリシーを定義(許可ツール、ブロックパターン、レート制限)
- [ ] ガバナンスレベルを選択(オープン/標準/厳密/ロック済み)
- [ ] 監査証跡ストレージをセットアップ

### 実装
- [ ] すべてのツール関数に @govern デコレータを追加
- [ ] ユーザー入力処理にインテント分類を追加
- [ ] マルチエージェント相互作用用の信頼スコアリングを実装
- [ ] 監査証跡エクスポートをワイヤリング

### 検証
- [ ] ブロックされたツールが適切に拒否されることをテスト
- [ ] コンテンツフィルタが機密パターンをキャッチすることをテスト
- [ ] レート制限の動作をテスト
- [ ] 監査証跡がすべてのイベントをキャプチャすることを確認
- [ ] ポリシー合成(最も制限的なものが勝利)をテスト

詳細情報

作者: MoonAxis
リポジトリ: MoonAxis/azure-stack
ライセンス: MIT
最終更新: 2026/3/12

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/MoonAxis/azure-stack / ライセンス: MIT