Anthropic Claudeソフトウェア開発⭐ リポ 0品質スコア 50/100

playwright-scraper

Name: playwright-scraper
Author: alphaonedev

Playwrightを使用したWebスクレイピングを実行するスキルで、動的コンテンツの取得、認証フローの処理、ページネーションの自動操作、データ抽出、スクリーンショットの撮影に対応します。JavaScriptレンダリングが必要なサイトや、ログインが必要なページからの情報収集に活用できます。

description の原文を見る

Playwright web scraping: dynamic content, auth flows, pagination, data extraction, screenshots

SKILL.md 本文

playwright-scraper

目的

このスキルは、ブラウザ自動化用の Node.js ライブラリである Playwright を使用した Web スクレイピングを実現します。動的コンテンツ、認証フロー、ページネーション、データ抽出、スクリーンショットの処理に焦点を当て、最新の Web サイトを確実にスクレイピングします。

使用時機

JavaScript でレンダリングされたコンテンツを含むサイト（React や Angular アプリなど）、ログインが必要なサイト（ダッシュボードなど）、複数ページの結果を処理する必要がある場合（検索結果など）、または視覚的なデータをキャプチャする場合（検証用のスクリーンショットなど）にこのスキルを使用してください。request のようなより簡単なツールで十分な静的 HTML サイトは避けてください。

主な機能

Playwright のブラウザコントロールを使用して、動的にコンテンツを読み込み、操作します。
ログインフォームや API トークンでのログインなど、認証フローを管理します。
ページナビゲーション、「次へ」ボタンのクリック、URL の解析によってページネーションを処理します。
セレクターを使用してデータを抽出し、JSON 出力またはファイル保存のオプションを提供します。
デバッグまたはレポート用にスクリーンショットまたはフルページ PDF をキャプチャします。
柔軟性のためにヘッドレスまたは表示ブラウザモードをサポートします。

使用パターン

常にブラウザコンテキストを初期化してから、ナビゲーション用のページを作成します。信頼性のため async パターンを使用します。認証付きスクレイピングの場合、コンテキストごとにクッキーまたはセッションを処理します。スクリプトをページネーション用にループするように構成し、不安定な要素には try-catch を使用します。再利用性のため、JSON ファイルまたは環境変数を通じて設定を渡します。

共通コマンド/API

Playwright の Node.js API を使用します。npm install playwright でインストールします。主要なメソッドは以下の通りです：

ブラウザ起動：const browser = await playwright.chromium.launch({ headless: true });
ページナビゲーション：const page = await browser.newPage(); await page.goto('https://example.com');
認証処理：await page.fill('#username', process.env.USERNAME); await page.fill('#password', process.env.PASSWORD); await page.click('#login');
データ抽出：const data = await page.evaluate(() => document.querySelector('#target').innerText); console.log(data);
ページネーション：while (await page.$('#next-button')) { await page.click('#next-button'); await page.waitForSelector('.item'); }
スクリーンショット取得：await page.screenshot({ path: 'screenshot.png' }); スクリプト実行用の CLI フラグ：npx playwright test で --headed のようなフラグを使用して表示モードにするか、--timeout 30000 で長めの待機時間を設定します。

統合に関する注釈

Node.js プロジェクトに Playwright をインポートして統合します。認証の場合、$PLAYWRIGHT_USERNAME や $PLAYWRIGHT_PASSWORD などの環境変数を使用して、ハードコーディングを避けます。設定形式：設定用に JSON ファイルを使用します。例：{ "url": "https://target.com", "selector": "#data-element" }。スクリプト引数で渡します：node scraper.js --config config.json。より大きなシステムの場合、Puppeteer のようなツール（移行する場合）とチェーンするか、page.evaluate の結果をデータベースにエクスポートします。Node.js 14 以上との互換性を確認し、browser.launch({ proxy: { server: 'http://myproxy.com:8080' } }) でプロキシ設定を処理します。

エラー処理

動的ロードのタイムアウトやセレクター障害などの一般的なエラーを予期します。page.waitForSelector をタイムアウト付きで使用：await page.waitForSelector('#element', { timeout: 10000 }).catch(err => console.error('Element not found:', err));。ネットワーク問題の場合、page.goto を try-catch でラップ：try { await page.goto(url, { waitUntil: 'networkidle' }); } catch (e) { console.error('Navigation failed:', e.message); await browser.close(); }。エラー要素をチェックして認証失敗を処理：if (await page.$('#error-message')) { throw new Error('Login failed'); }。詳細をログに記録し、ループを使用して最大 3 回まで再試行します。

具体的な使用例

ログイン済みダッシュボードのスクレイピング： まず環境変数を設定します：export PLAYWRIGHT_USERNAME='user@example.com' と export PLAYWRIGHT_PASSWORD='securepass'。次に実行：const browser = await playwright.chromium.launch(); const page = await browser.newPage(); await page.goto('https://dashboard.com/login'); await page.fill('#username', process.env.PLAYWRIGHT_USERNAME); await page.fill('#password', process.env.PLAYWRIGHT_PASSWORD); await page.click('#submit'); const data = await page.evaluate(() => document.querySelector('#dashboard-data').innerText); console.log(data); await browser.close(); これは保護されたページからデータを抽出します。
検索サイトのページネーション処理： スクリプト：const browser = await playwright.chromium.launch(); const page = await browser.newPage(); await page.goto('https://search.com?q=query'); let items = []; while (true) { items.push(...await page.$$eval('.result-item', elements => elements.map(el => el.innerText))); const nextButton = await page.$('#next-page'); if (!nextButton) break; await nextButton.click(); await page.waitForTimeout(2000); } console.log(items); await browser.close(); これは複数のページ間で結果を収集します。

グラフの関係

関連：「selenium-automation」（代替ブラウザ自動化ツール）
依存：「node-runtime」（Playwright 実行用）
補完：「data-extraction」（スクレイピングされたデータの後処理用）
クラスタ内：「community」（その他のオープンソースツールと共有）

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: alphaonedev
リポジトリ: alphaonedev/openclaw-graph
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/alphaonedev/openclaw-graph / ライセンス: MIT

playwright-scraper

SKILL.md 本文

playwright-scraper

目的

使用時機

主な機能

使用パターン

共通コマンド/API

統合に関する注釈

エラー処理

具体的な使用例

グラフの関係

詳細情報

関連スキル

doubt-driven-development

apprun-skills

desloppify

debugging-and-error-recovery

test-driven-development

incremental-implementation