Anthropic Claudeソフトウェア開発⭐ リポ 0品質スコア 50/100

firecrawl-scraper

Name: firecrawl-scraper
Author: benedictking

Firecrawl APIを使用して、Webページのスクレイピング、構造化データの抽出、スクリーンショット撮影、PDFの解析、サイト全体のクロールなど、高度なコンテンツ取得と変換を行うスキルです。Webコンテンツの抽出や形式変換が必要なときに活用してください。

description の原文を見る

Web scraping skill using Firecrawl API for deep content extraction, format conversion, and page interaction. Use when you need to scrape web pages, extract structured data, take screenshots, parse PDFs, or crawl entire websites. Triggers: firecrawl, scrape, extract content, screenshot, parse pdf, crawl website, 抓取网页, 提取内容, 网页截图

SKILL.md 本文

Firecrawl Scraper スキル

トリガー条件とエンドポイント選択

ユーザーの意図に基づいて Firecrawl エンドポイントを選択します：

scrape: 単一のウェブページからコンテンツを抽出する必要がある場合（markdown、html、json、screenshot、pdf）
crawl: ウェブサイト全体をクロールする必要がある場合（深さ制御とパスフィルタリング）
map: ウェブサイト上のすべての URL のリストを素早く取得する必要がある場合
batch-scrape: 複数の URL を並行してスクレイプする必要がある場合
crawl-status: クロールジョブ ID が与えられている場合、クロール進捗/結果を確認（オプション：--wait）

推奨アーキテクチャ（メインスキル + サブスキル）

このスキルは 2 段階のアーキテクチャを使用しています：

メインスキル（現在のコンテキスト）: ユーザーの質問を理解 → エンドポイントを選択 → JSON ペイロードを組み立て
サブスキル（フォークコンテキスト）: HTTP コール実行のみを担当し、会話履歴のトークン浪費を回避

実行方法

Task ツールを使用して firecrawl-fetcher サブスキルを呼び出し、コマンドと JSON を（stdin で）渡します：

Task パラメータ:
- subagent_type: Bash
- description: "Call Firecrawl API"
- prompt: cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs <scrape|crawl|map|batch-scrape|crawl-status> [--wait]
  { ...payload... }
  JSON

ペイロード例

1) 単一ページのスクレイプ

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape
{
  "url": "https://example.com",
  "formats": ["markdown", "links"],
  "onlyMainContent": true,
  "includeTags": [],
  "excludeTags": ["nav", "footer"],
  "waitFor": 0,
  "timeout": 30000
}
JSON

利用可能なフォーマット：

"markdown", "html", "rawHtml", "links", "images", "summary"
{"type": "json", "prompt": "Extract product info", "schema": {...}}
{"type": "screenshot", "fullPage": true, "quality": 85}

2) アクション付きスクレイプ（ページ操作）

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape
{
  "url": "https://example.com",
  "formats": ["markdown"],
  "actions": [
    {"type": "wait", "milliseconds": 2000},
    {"type": "click", "selector": "#load-more"},
    {"type": "wait", "milliseconds": 1000},
    {"type": "scroll", "direction": "down", "amount": 500}
  ]
}
JSON

利用可能なアクション：

wait, click, write, press, scroll, screenshot, scrape, executeJavascript

3) PDF の解析

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape
{
  "url": "https://example.com/document.pdf",
  "formats": ["markdown"],
  "parsers": ["pdf"]
}
JSON

4) 構造化 JSON の抽出

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape
{
  "url": "https://example.com/product",
  "formats": [
    {
      "type": "json",
      "prompt": "Extract product information",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "price": {"type": "number"},
          "description": {"type": "string"}
        },
        "required": ["name", "price"]
      }
    }
  ]
}
JSON

5) ウェブサイト全体のクロール

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl
{
  "url": "https://docs.example.com",
  "formats": ["markdown"],
  "includePaths": ["^/docs/.*"],
  "excludePaths": ["^/blog/.*"],
  "maxDiscoveryDepth": 3,
  "limit": 100,
  "allowExternalLinks": false,
  "allowSubdomains": false
}
JSON

5.1) クロール + 完了を待つ

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl --wait
{
  "url": "https://docs.example.com",
  "formats": ["markdown"],
  "limit": 100
}
JSON

6) ウェブサイト URL のマッピング

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs map
{
  "url": "https://example.com",
  "search": "documentation",
  "limit": 5000
}
JSON

7) 複数 URL の一括スクレイプ

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs batch-scrape
{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
  ],
  "formats": ["markdown"]
}
JSON

8) クロール状態の確認

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id>

完了を待つ：

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id> --wait

主な機能

フォーマット

markdown: クリーンな markdown コンテンツ
html: パース済み HTML
rawHtml: 元の HTML
links: ページ上のすべてのリンク
images: ページ上のすべての画像
summary: AI 生成の要約
json: スキーマ付き構造化データ抽出
screenshot: ページスクリーンショット（PNG）

コンテンツ制御

onlyMainContent: メインコンテンツのみを抽出（デフォルト：true）
includeTags: 含める CSS セレクター
excludeTags: 除外する CSS セレクター
waitFor: スクレイプ前の待機時間（ミリ秒）
maxAge: キャッシュ期間（デフォルト：48 時間）

アクション（ブラウザ自動化）

wait: 指定時間待機
click: セレクターによる要素クリック
write: フィールドにテキスト入力
press: キーボードキーを押す
scroll: ページをスクロール
executeJavascript: カスタム JS を実行

クロールオプション

includePaths: 含める正規表現パターン
excludePaths: 除外する正規表現パターン
maxDiscoveryDepth: 最大クロール深さ
limit: クロール対象ページの最大数
allowExternalLinks: 外部リンクをフォロー
allowSubdomains: サブドメインをフォロー

環境変数と API キー

API キーの設定方法 2 つ（優先順：環境変数 > .env）：

環境変数：FIRECRAWL_API_KEY
.env ファイル：.claude/skills/firecrawl-scraper/.env に配置（.env.example からコピー可）

レスポンス形式

すべてのエンドポイントは以下を含む JSON を返します：

success: 成功を示すブール値
data: 抽出されたコンテンツ（フォーマットはエンドポイントによって異なる）
クロール時：ジョブ ID を返す。crawl-status を使用（または GET /v2/crawl/{id}）して状態を確認

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: benedictking
リポジトリ: benedictking/firecrawl-scraper
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/benedictking/firecrawl-scraper / ライセンス: MIT