Anthropic Claudeデータ・分析⭐ リポ 0品質スコア 50/100

sarif-parsing

Name: sarif-parsing
Author: trailofbits

CodeQL・Semgrep などの静的解析ツールが出力するSARIFファイルの解析・処理を行うスキルです。「parse sarif」「read scan results」「aggregate findings」「deduplicate alerts」「process sarif output」などをトリガーとして起動し、フィルタリング・重複排除・フォーマット変換・CI/CD連携に対応します。スキャンの実行は対象外のため、スキャン実行にはSemgrep・CodeQLの各スキルを使用してください。

description の原文を見る

>- Parses and processes SARIF files from static analysis tools like CodeQL, Semgrep, or other scanners. Triggers on "parse sarif", "read scan results", "aggregate findings", "deduplicate alerts", or "process sarif output". Handles filtering, deduplication, format conversion, and CI/CD integration of SARIF data. Does NOT run scans — use the Semgrep or CodeQL skills for that.

SKILL.md 本文

SARIF パースのベストプラクティス

あなたは SARIF パースの専門家です。あなたの役割は、ユーザーが静的分析ツールから得た SARIF 形式のファイルを効果的に読み、分析、処理するのを支援することです。

使用する場合

以下の場合にこのスキルを使用してください：

静的分析スキャン結果の SARIF 形式での読み取りまたは解釈
複数のセキュリティツールからのファイナディングの集約
セキュリティアラートの重複排除またはフィルタリング
SARIF ファイルからの特定の脆弱性抽出
CI/CD パイプラインへの SARIF データ統合
SARIF 出力を他の形式に変換

使用しない場合

以下の場合は、このスキルを使用しないでください：

静的分析スキャンの実行（代わりに CodeQL または Semgrep スキルを使用）
CodeQL または Semgrep ルールの作成（各々のスキルを使用）
ソースコードの直接的な分析（SARIF は既存のスキャン結果処理用）
SARIF 入力のないファイナディングのトリアージ（variant-analysis または audit スキルを使用）

SARIF 構造概要

SARIF 2.1.0 は現在の OASIS 標準です。すべての SARIF ファイルは以下の階層構造を持っています：

sarifLog
├── version: "2.1.0"
├── $schema: (optional, enables IDE validation)
└── runs[] (array of analysis runs)
    ├── tool
    │   ├── driver
    │   │   ├── name (required)
    │   │   ├── version
    │   │   └── rules[] (rule definitions)
    │   └── extensions[] (plugins)
    ├── results[] (findings)
    │   ├── ruleId
    │   ├── level (error/warning/note)
    │   ├── message.text
    │   ├── locations[]
    │   │   └── physicalLocation
    │   │       ├── artifactLocation.uri
    │   │       └── region (startLine, startColumn, etc.)
    │   ├── fingerprints{}
    │   └── partialFingerprints{}
    └── artifacts[] (scanned files metadata)

フィンガープリント化が重要な理由

安定したフィンガープリントがないと、実行間でファイナディングを追跡できません：

ベースライン比較：「これは新しいファイナディングか、それとも以前見たことがあるか？」
リグレッション検出：「このプルリクエストは新しい脆弱性を導入したか？」
抑制：「この既知の誤検知を今後の実行で無視する」

ツールは異なるパス（/path/to/project/ vs /github/workspace/）をレポートするため、パスベースのマッチングは失敗します。フィンガープリントはコンテンツ（コードスニペット、ルール ID、相対位置）をハッシュ化して、環境に関わらず安定した識別子を作成します。

ツール選択ガイド

ユースケース	ツール	インストール
クイック CLI クエリ	jq	`brew install jq` / `apt install jq`
Python スクリプト（シンプル）	pysarif	`pip install pysarif`
Python スクリプト（高度）	sarif-tools	`pip install sarif-tools`
.NET アプリケーション	SARIF SDK	NuGet パッケージ
JavaScript/Node.js	sarif-js	npm パッケージ
Go アプリケーション	garif	`go get github.com/chavacava/garif`
検証	SARIF Validator	sarifweb.azurewebsites.net

戦略 1：jq を使った迅速分析

迅速な探索とワンショットクエリには：

# Pretty print the file
jq '.' results.sarif

# Count total findings
jq '[.runs[].results[]] | length' results.sarif

# List all rule IDs triggered
jq '[.runs[].results[].ruleId] | unique' results.sarif

# Extract errors only
jq '.runs[].results[] | select(.level == "error")' results.sarif

# Get findings with file locations
jq '.runs[].results[] | {
  rule: .ruleId,
  message: .message.text,
  file: .locations[0].physicalLocation.artifactLocation.uri,
  line: .locations[0].physicalLocation.region.startLine
}' results.sarif

# Filter by severity and get count per rule
jq '[.runs[].results[] | select(.level == "error")] | group_by(.ruleId) | map({rule: .[0].ruleId, count: length})' results.sarif

# Extract findings for a specific file
jq --arg file "src/auth.py" '.runs[].results[] | select(.locations[].physicalLocation.artifactLocation.uri | contains($file))' results.sarif

戦略 2：pysarif を使った Python

完全なオブジェクトモデルでのプログラム的アクセス：

from pysarif import load_from_file, save_to_file

# Load SARIF file
sarif = load_from_file("results.sarif")

# Iterate through runs and results
for run in sarif.runs:
    tool_name = run.tool.driver.name
    print(f"Tool: {tool_name}")

    for result in run.results:
        print(f"  [{result.level}] {result.rule_id}: {result.message.text}")

        if result.locations:
            loc = result.locations[0].physical_location
            if loc and loc.artifact_location:
                print(f"    File: {loc.artifact_location.uri}")
                if loc.region:
                    print(f"    Line: {loc.region.start_line}")

# Save modified SARIF
save_to_file(sarif, "modified.sarif")

戦略 3：sarif-tools を使った Python

集約、レポート、CI/CD 統合向け：

from sarif import loader

# Load single file
sarif_data = loader.load_sarif_file("results.sarif")

# Or load multiple files
sarif_set = loader.load_sarif_files(["tool1.sarif", "tool2.sarif"])

# Get summary report
report = sarif_data.get_report()

# Get histogram by severity
errors = report.get_issue_type_histogram_for_severity("error")
warnings = report.get_issue_type_histogram_for_severity("warning")

# Filter results
high_severity = [r for r in sarif_data.get_results()
                 if r.get("level") == "error"]

sarif-tools CLI コマンド：

# Summary of findings
sarif summary results.sarif

# List all results with details
sarif ls results.sarif

# Get results by severity
sarif ls --level error results.sarif

# Diff two SARIF files (find new/fixed issues)
sarif diff baseline.sarif current.sarif

# Convert to other formats
sarif csv results.sarif > results.csv
sarif html results.sarif > report.html

戦略 4：複数の SARIF ファイルの集約

複数のツールからの結果を組み合わせる場合：

import json
from pathlib import Path

def aggregate_sarif_files(sarif_paths: list[str]) -> dict:
    """Combine multiple SARIF files into one."""
    aggregated = {
        "version": "2.1.0",
        "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
        "runs": []
    }

    for path in sarif_paths:
        with open(path) as f:
            sarif = json.load(f)
            aggregated["runs"].extend(sarif.get("runs", []))

    return aggregated

def deduplicate_results(sarif: dict) -> dict:
    """Remove duplicate findings based on fingerprints."""
    seen_fingerprints = set()

    for run in sarif["runs"]:
        unique_results = []
        for result in run.get("results", []):
            # Use partialFingerprints or create key from location
            fp = None
            if result.get("partialFingerprints"):
                fp = tuple(sorted(result["partialFingerprints"].items()))
            elif result.get("fingerprints"):
                fp = tuple(sorted(result["fingerprints"].items()))
            else:
                # Fallback: create fingerprint from rule + location
                loc = result.get("locations", [{}])[0]
                phys = loc.get("physicalLocation", {})
                fp = (
                    result.get("ruleId"),
                    phys.get("artifactLocation", {}).get("uri"),
                    phys.get("region", {}).get("startLine")
                )

            if fp not in seen_fingerprints:
                seen_fingerprints.add(fp)
                unique_results.append(result)

        run["results"] = unique_results

    return sarif

戦略 5：実行可能なデータ抽出

import json
from dataclasses import dataclass
from typing import Optional

@dataclass
class Finding:
    rule_id: str
    level: str
    message: str
    file_path: Optional[str]
    start_line: Optional[int]
    end_line: Optional[int]
    fingerprint: Optional[str]

def extract_findings(sarif_path: str) -> list[Finding]:
    """Extract structured findings from SARIF file."""
    with open(sarif_path) as f:
        sarif = json.load(f)

    findings = []
    for run in sarif.get("runs", []):
        for result in run.get("results", []):
            loc = result.get("locations", [{}])[0]
            phys = loc.get("physicalLocation", {})
            region = phys.get("region", {})

            findings.append(Finding(
                rule_id=result.get("ruleId", "unknown"),
                level=result.get("level", "warning"),
                message=result.get("message", {}).get("text", ""),
                file_path=phys.get("artifactLocation", {}).get("uri"),
                start_line=region.get("startLine"),
                end_line=region.get("endLine"),
                fingerprint=next(iter(result.get("partialFingerprints", {}).values()), None)
            ))

    return findings

# Filter and prioritize
def prioritize_findings(findings: list[Finding]) -> list[Finding]:
    """Sort findings by severity."""
    severity_order = {"error": 0, "warning": 1, "note": 2, "none": 3}
    return sorted(findings, key=lambda f: severity_order.get(f.level, 99))

よくある落とし穴と解決策

1. パスの正規化の問題

異なるツールは異なる方法でパスをレポートします（絶対、相対、URI エンコード）：

from urllib.parse import unquote
from pathlib import Path

def normalize_path(uri: str, base_path: str = "") -> str:
    """Normalize SARIF artifact URI to consistent path."""
    # Remove file:// prefix if present
    if uri.startswith("file://"):
        uri = uri[7:]

    # URL decode
    uri = unquote(uri)

    # Handle relative paths
    if not Path(uri).is_absolute() and base_path:
        uri = str(Path(base_path) / uri)

    # Normalize separators
    return str(Path(uri))

2. 実行間でのフィンガープリント不一致

フィンガープリントが一致しない場合がある理由：

ファイルパスが環境間で異なる
ツールバージョンがフィンガープリント化アルゴリズムを変更
コードがフォーマットされた（行番号を変更）

解決策： 複数のフィンガープリント戦略を使用：

def compute_stable_fingerprint(result: dict, file_content: str = None) -> str:
    """Compute environment-independent fingerprint."""
    import hashlib

    components = [
        result.get("ruleId", ""),
        result.get("message", {}).get("text", "")[:100],  # First 100 chars
    ]

    # Add code snippet if available
    if file_content and result.get("locations"):
        region = result["locations"][0].get("physicalLocation", {}).get("region", {})
        if region.get("startLine"):
            lines = file_content.split("\n")
            line_idx = region["startLine"] - 1
            if 0 <= line_idx < len(lines):
                # Normalize whitespace
                components.append(lines[line_idx].strip())

    return hashlib.sha256("".join(components).encode()).hexdigest()[:16]

3. 不足またはデータの不完全性

SARIF は多くのオプションフィールドを許可します。常に防御的なアクセスを使用：

def safe_get_location(result: dict) -> tuple[str, int]:
    """Safely extract file and line from result."""
    try:
        loc = result.get("locations", [{}])[0]
        phys = loc.get("physicalLocation", {})
        file_path = phys.get("artifactLocation", {}).get("uri", "unknown")
        line = phys.get("region", {}).get("startLine", 0)
        return file_path, line
    except (IndexError, KeyError, TypeError):
        return "unknown", 0

4. 大規模ファイルのパフォーマンス

非常に大規模な SARIF ファイル（100MB 以上）の場合：

import ijson  # pip install ijson

def stream_results(sarif_path: str):
    """Stream results without loading entire file."""
    with open(sarif_path, "rb") as f:
        # Stream through results arrays
        for result in ijson.items(f, "runs.item.results.item"):
            yield result

5. スキーマ検証

処理前に検証して、形式が正しくないファイルをキャッチ：

# Using ajv-cli
npm install -g ajv-cli
ajv validate -s sarif-schema-2.1.0.json -d results.sarif

# Using Python jsonschema
pip install jsonschema

from jsonschema import validate, ValidationError
import json

def validate_sarif(sarif_path: str, schema_path: str) -> bool:
    """Validate SARIF file against schema."""
    with open(sarif_path) as f:
        sarif = json.load(f)
    with open(schema_path) as f:
        schema = json.load(f)

    try:
        validate(sarif, schema)
        return True
    except ValidationError as e:
        print(f"Validation error: {e.message}")
        return False

CI/CD 統合パターン

GitHub Actions

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

- name: Check for high severity
  run: |
    HIGH_COUNT=$(jq '[.runs[].results[] | select(.level == "error")] | length' results.sarif)
    if [ "$HIGH_COUNT" -gt 0 ]; then
      echo "Found $HIGH_COUNT high severity issues"
      exit 1
    fi

新しい問題が見つかったら失敗

from sarif import loader

def check_for_regressions(baseline: str, current: str) -> int:
    """Return count of new issues not in baseline."""
    baseline_data = loader.load_sarif_file(baseline)
    current_data = loader.load_sarif_file(current)

    baseline_fps = {get_fingerprint(r) for r in baseline_data.get_results()}
    new_issues = [r for r in current_data.get_results()
                  if get_fingerprint(r) not in baseline_fps]

    return len(new_issues)

主要な原則

最初に検証：処理前に SARIF 構造を確認
オプショナルを処理：多くのフィールドはオプション。防御的なアクセスを使用
パスを正規化：ツールはパスを異なる方法でレポート。早期に正規化
フィンガープリント化を慎重に：安定した重複排除のため複数の戦略を組み合わせ
大規模ファイルはストリーム：100MB 以上のファイルに ijson を使用
集約は慎重に：ファイルを組み合わせる場合、ツールのメタデータを保持

スキルリソース

使用可能なクエリテンプレートについては、{baseDir}/resources/jq-queries.md を参照：

SARIF 操作の 40 以上の jq クエリ
重大度フィルタリング、ルール抽出、集約パターン

Python ユーティリティについては、{baseDir}/resources/sarif_helpers.py を参照：

normalize_path() - ツール固有のパス形式を処理
compute_fingerprint() - パスを無視した安定したフィンガープリント化
deduplicate_results() - 実行間での重複削除

参考リンク

ライセンス: CC-BY-SA-4.0(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: trailofbits
リポジトリ: trailofbits/skills
ライセンス: CC-BY-SA-4.0
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/trailofbits/skills / ライセンス: CC-BY-SA-4.0

sarif-parsing

SKILL.md 本文

SARIF パースのベストプラクティス

使用する場合

使用しない場合

SARIF 構造概要

フィンガープリント化が重要な理由

ツール選択ガイド

戦略 1：jq を使った迅速分析

戦略 2：pysarif を使った Python

戦略 3：sarif-tools を使った Python

戦略 4：複数の SARIF ファイルの集約

戦略 5：実行可能なデータ抽出

よくある落とし穴と解決策

1. パスの正規化の問題

2. 実行間でのフィンガープリント不一致

3. 不足またはデータの不完全性

4. 大規模ファイルのパフォーマンス

5. スキーマ検証

CI/CD 統合パターン

GitHub Actions

新しい問題が見つかったら失敗

主要な原則

スキルリソース

参考リンク

詳細情報

関連スキル

hugging-face-trackio

btc-bottom-model

protein_solubility_optimization

research-lookup

tree-formatting

querying-indonesian-gov-data

SKILL.md 本文

SARIF パース のベストプラクティス

使用する場合

使用しない場合

SARIF 構造概要

フィンガープリント化が重要な理由

ツール選択ガイド

戦略 1：jq を使った迅速分析

戦略 2：pysarif を使った Python

戦略 3：sarif-tools を使った Python

戦略 4：複数の SARIF ファイルの集約

戦略 5：実行可能なデータ抽出

よくある落とし穴と解決策

1. パスの正規化の問題

2. 実行間でのフィンガープリント不一致

3. 不足またはデータの不完全性

4. 大規模ファイルのパフォーマンス

5. スキーマ検証

CI/CD 統合パターン

GitHub Actions

新しい問題が見つかったら失敗

主要な原則

スキル リソース

参考リンク

詳細情報

関連スキル

hugging-face-trackio

btc-bottom-model

protein_solubility_optimization

research-lookup

tree-formatting

querying-indonesian-gov-data

SARIF パースのベストプラクティス

スキルリソース