汎用LLM・AI開発⭐ リポ 2品質スコア 59/100

seed-prover-automated-theorem-proving

Name: seed-prover-automated-theorem-proving
Author: ADu2021

形式検証フィードバックと長鎖の思考プロセスを組み合わせたフレームワークです。自動定理証明のための深く幅広い数学推論を実現します。補題ベースの改善とテスト時推論戦略により、形式化されたIMO問題で78.1%の成果を達成します。

description の原文を見る

Framework combining formal verification feedback with long chain-of-thought reasoning to enable deep and broad mathematical reasoning for automated theorem proving. Achieves 78.1% on formalized IMO problems through lemma-based refinement and test-time inference strategies.

SKILL.md 本文

Seed-Prover: 自動定理証明による形式的検証

Seed-Prover は、強化学習と Lean 証明支援系からの形式的検証フィードバックを組み合わせることで、自動定理証明における大きな進歩を実現しています。このシステムにより、言語モデルが複数の推論戦略を通じて数学的証明を反復的に改良でき、形式的数学の最先端パフォーマンスを達成しています。

コアコンセプト

重要な洞察は、形式的検証が明確で曖昧性のない教師信号を提供することです。証明は Lean で受け入れられるか（正しい）、または具体的なエラーを示して改良を導きます。Seed-Prover は言語モデルの出力だけに頼るのではなく、この構造化されたフィードバックを使用して以下を実現します:

Lean エラーメッセージに基づいて証明を反復的に改良
以前に証明された補題を活用して段階的により複雑な証明を構築
自己要約を適用して本質的な証明構造を抽出
3つのテスト時推論戦略を採用して深さ（単一証明の探索）と広さ（複数証明の試行）の両方の推論を実現

アーキテクチャ概要

システムは以下のコンポーネントで構成されています:

証明生成モジュール: 言語モデルが Lean 形式構文で証明を生成
形式的検証ループ: Lean 型チェッカーが具体的なエラーメッセージ付きで二値フィードバック（受理/拒否）を提供
補題ライブラリ: 以前に証明された定理のリポジトリで、合成的な証明構築を実現
改良エージェント: 検証フィードバックを受け取り、証明を再構成
テスト時戦略:
- 深い推論: 単一の証明ブランチを詳細な探索で拡張
- 広い推論: 複数の証明仮説を並列生成
- ハイブリッド: 深さと広さの戦略を適応的に組み合わせ

実装ステップ

ステップ 1: 形式的検証インフラストラクチャのセットアップ

システムは Lean 証明支援系との統合を必要とし、証明の試行に対して構造化されたフィードバックを提供します:

import subprocess
import json
from typing import Tuple

class LeanVerifier:
    """Wrapper for Lean formal verification"""
    def __init__(self, lean_path: str = "lean"):
        self.lean_path = lean_path

    def verify_proof(self, theorem: str, proof: str) -> Tuple[bool, str]:
        """
        Verify a proof in Lean and return status and feedback.

        Args:
            theorem: The theorem statement in Lean syntax
            proof: The proof attempt in Lean syntax

        Returns:
            (is_valid, feedback) where is_valid is bool and feedback is error msg or success
        """
        lean_code = f"{theorem}\n{proof}"

        try:
            result = subprocess.run(
                [self.lean_path, "--stdin"],
                input=lean_code.encode(),
                capture_output=True,
                timeout=5
            )

            if result.returncode == 0:
                return True, "Proof accepted"
            else:
                error_msg = result.stderr.decode()
                return False, error_msg

        except subprocess.TimeoutExpired:
            return False, "Proof verification timeout"

LeanVerifier クラスは Lean との通信を処理し、証明が型チェックされたかどうかをキャプチャし、改良を導く具体的なエラーメッセージを返します。

ステップ 2: 補題ライブラリと証明状態トラッカーの実装

以前に証明された補題のデータベースを保持して、合成的な証明構築を実現します:

class LemmaLibrary:
    """Stores and retrieves previously proved lemmas"""
    def __init__(self):
        self.lemmas: dict[str, str] = {}  # name -> proof code
        self.theorem_statements: dict[str, str] = {}  # name -> statement

    def add_lemma(self, name: str, theorem: str, proof: str):
        """Store a successfully verified lemma"""
        self.lemmas[name] = proof
        self.theorem_statements[name] = theorem

    def get_applicable_lemmas(self, goal: str, top_k: int = 5) -> list[Tuple[str, str]]:
        """
        Retrieve lemmas relevant to current goal using semantic matching.
        In practice, use embeddings to find related lemmas.
        """
        # Simplified: could use embedding similarity in production
        relevant = []
        for name, statement in self.theorem_statements.items():
            if self._relevance_score(statement, goal) > 0.7:
                relevant.append((name, self.lemmas[name]))

        return relevant[:top_k]

    def _relevance_score(self, lemma: str, goal: str) -> float:
        """Simple relevance scoring; use embeddings in production"""
        # Placeholder for semantic similarity computation
        common_terms = set(lemma.split()) & set(goal.split())
        return len(common_terms) / max(len(lemma.split()), 1)

このコンポーネントはシステムが以前の成功に基づいて構築でき、冗長な証明探索を削減することを可能にします。

ステップ 3: 検証フィードバックによる改良の実装

Lean のエラーメッセージを使用して証明の再定式化を導きます:

class ProofRefinement:
    """Refines proofs based on verification feedback"""
    def __init__(self, llm, verifier: LeanVerifier, lemma_library: LemmaLibrary):
        self.llm = llm  # Language model for generation
        self.verifier = verifier
        self.lemma_library = lemma_library

    def refine_proof(self, theorem: str, proof_attempt: str,
                    feedback: str, context_lemmas: list[str]) -> str:
        """
        Given a failed proof and Lean's error message, generate improved version.

        Args:
            theorem: The theorem statement
            proof_attempt: Previous attempt that failed
            feedback: Error message from Lean
            context_lemmas: Available lemmas to use

        Returns:
            Refined proof attempt
        """
        prompt = f"""You are a formal proof assistant. Fix the following Lean proof.

Theorem: {theorem}

Previous proof attempt:
{proof_attempt}

Error feedback:
{feedback}

Available lemmas:
{chr(10).join(context_lemmas)}

Generate a corrected proof:"""

        refined = self.llm.generate(prompt, max_tokens=1500)
        return refined

    def iterative_refinement(self, theorem: str, max_iterations: int = 5) -> Tuple[bool, str]:
        """
        Iteratively refine proof until it verifies or max iterations reached.
        """
        # Get relevant lemmas
        lemmas = self.lemma_library.get_applicable_lemmas(theorem)
        lemma_strs = [f"-- {name}: {stmt}" for name, stmt in lemmas]

        # Generate initial proof
        proof = self.llm.generate(f"Prove: {theorem}", max_tokens=1500)

        for iteration in range(max_iterations):
            is_valid, feedback = self.verifier.verify_proof(theorem, proof)

            if is_valid:
                return True, proof

            # Refine based on feedback
            proof = self.refine_proof(theorem, proof, feedback, lemma_strs)

        return False, proof

この改良ループは Seed-Prover の核であり、形式的検証信号を使用して反復的な改善を導きます。

ステップ 4: テスト時推論戦略の実装

深さと広さの両方の推論モードを実装します:

class TestTimeStrategies:
    """Multiple reasoning strategies at inference time"""
    def __init__(self, prover: ProofRefinement):
        self.prover = prover

    def deep_reasoning(self, theorem: str, max_depth: int = 10) -> Tuple[bool, str]:
        """
        Deep reasoning: explore a single proof path exhaustively.
        Uses extended chain-of-thought with detailed intermediate steps.
        """
        prompt = f"""Prove this theorem with detailed step-by-step reasoning:

{theorem}

Provide extensive intermediate steps and justifications. Show your complete reasoning process."""

        proof = self.prover.llm.generate(prompt, max_tokens=3000)  # Extended context
        is_valid, _ = self.prover.verifier.verify_proof(theorem, proof)
        return is_valid, proof

    def broad_reasoning(self, theorem: str, num_attempts: int = 3) -> Tuple[bool, str]:
        """
        Broad reasoning: generate multiple proof hypotheses in parallel.
        Returns the first valid proof found.
        """
        for attempt in range(num_attempts):
            prompt = f"""Generate a proof for: {theorem}

Attempt {attempt + 1} - Try a different approach from previous attempts."""

            proof = self.prover.llm.generate(prompt, max_tokens=1500)
            is_valid, _ = self.prover.verifier.verify_proof(theorem, proof)

            if is_valid:
                return True, proof

        return False, None

    def hybrid_reasoning(self, theorem: str) -> Tuple[bool, str]:
        """
        Hybrid: apply broad reasoning first to find viable approaches,
        then deep reasoning to refine the best candidate.
        """
        # Phase 1: Broad search for viable approaches
        is_valid, candidate = self.broad_reasoning(theorem, num_attempts=2)

        if not is_valid:
            # Phase 2: If broad search fails, apply deep reasoning
            return self.deep_reasoning(theorem)

        # Phase 3: Refine the candidate with deep reasoning
        theorem_with_hint = f"{theorem}\n-- Hint: Start with the approach above"
        return self.deep_reasoning(theorem_with_hint)

これらの戦略により、システムは証明の複雑さに基づいて推論の深さを適応させることができます。

ステップ 5: 証明圧縮のための自己要約の統合

システムが主要な証明構造を抽出および要約することを可能にします:

class ProofSummarization:
    """Summarizes proofs to extract essential structures"""
    def __init__(self, llm):
        self.llm = llm

    def summarize_proof(self, theorem: str, proof: str) -> str:
        """
        Extract high-level proof structure, removing low-level tactics.
        """
        prompt = f"""Summarize the key logical steps of this proof, ignoring low-level Lean tactics:

Theorem: {theorem}

Proof:
{proof}

Provide a high-level summary of the proof's logical structure:"""

        summary = self.llm.generate(prompt, max_tokens=500)
        return summary

    def extract_subgoals(self, proof: str) -> list[str]:
        """
        Parse proof to extract intermediate subgoals.
        These become candidates for lemmatization.
        """
        # Simple heuristic: find "have" and "show" statements
        lines = proof.split('\n')
        subgoals = []

        for line in lines:
            if 'have ' in line or 'show ' in line:
                subgoals.append(line.strip())

        return subgoals

これにより、中間的な結果が補題としてキャプチャされる反復的な証明構築が可能になります。

実践的なガイダンス

Seed-Prover を使用するべき場合:

Lean で十分に定義された形式数学問題（最適なケース）
競技問題（IMO スタイル）で豊かな数学構造を持つもの
補題の再利用が重要な価値を持つ多段階推論
証明探索が複数の異なる試行から利益を得る場合

Seed-Prover を使用すべきでない場合:

非形式的な自然言語数学推論
形式的検証インフラストラクチャのないドメイン
反復的な改良が遅すぎるリアルタイムアプリケーション
単純な記号計算で解決可能な問題

主要なハイパーパラメータと調整:

max_iterations: 改良の深さを制御（通常 5～10 で十分）
num_attempts（広い推論）: 探索コストと成功率のバランス（2～5 試行）
max_depth（深い推論）: 詳細な推論と計算のトレードオフ
補題ライブラリサイズ: より多くの補題は役立ちますが、取得が遅くなります。トップ k 関連性を維持
Lean 検証のタイムアウト: 標準 5 秒。複雑な証明では増加

予想されるパフォーマンス特性:

IMO 問題: 最適な構成で約 78% の成功率
平均して証明あたり約 1.5～3 回の検証呼び出しが必要
各反復は通常、漸進的な進歩をもたらします（ソリューションへのジャンプではない）
関連する補題ライブラリでパフォーマンスが大幅に向上（26～35% の改善が観測）

参考文献

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving. arXiv:2507.23726

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: ADu2021
リポジトリ: ADu2021/skillXiv
ライセンス: MIT
最終更新: 2026/3/26

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/ADu2021/skillXiv / ライセンス: MIT

seed-prover-automated-theorem-proving

SKILL.md 本文

Seed-Prover: 自動定理証明による形式的検証

コアコンセプト

アーキテクチャ概要

実装ステップ

実践的なガイダンス

参考文献

詳細情報

関連スキル

agent-browser

anyskill

engram

skyvern

pinchbench

openui