Anthropic ClaudeLLM・AI開発⭐ リポ 0品質スコア 50/100

agent-memory-systems

Name: agent-memory-systems
Author: sickn33

メモリはインテリジェントエージェントの根幹であり、これがなければすべてのインタラクションがゼロからのスタートになります。このスキルでは、短期記憶（コンテキストウィンドウ）・長期記憶（ベクターストア）・それらを統合する認知アーキテクチャなど、エージェントのメモリ設計全体を解説します。

description の原文を見る

"Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector stores), and the cognitive architectures that organize them."

SKILL.md 本文

Agent Memory Systems

メモリはインテリジェントエージェントの基盤です。メモリがなければ、すべての相互作用がゼロから始まります。このスキルでは、エージェントメモリのアーキテクチャを扱います：短期メモリ（コンテキストウィンドウ）、長期メモリ（ベクトルストア）、およびそれらを整理する認知アーキテクチャです。

重要な洞察：メモリとは単なるストレージではなく、検索です。保存された100万の事実も、正しいものを見つけられなければ意味がありません。チャンキング、埋め込み、および検索戦略が、エージェントが記憶するか忘れるかを決定します。

この分野は一貫性のない用語で断片化しています。私たちは CoALA 認知アーキテクチャフレームワークを使用します：セマンティックメモリ（事実）、エピソディックメモリ（経験）、プロシージャルメモリ（方法知識）。

Principles

Memory quality = retrieval quality, not storage quantity
Chunk for retrieval, not for storage
Context isolation is the enemy of memory
Right memory type for right information
Decay old memories - not everything should be forever
Test retrieval accuracy before production
Background memory formation beats real-time

Capabilities

agent-memory
long-term-memory
short-term-memory
working-memory
episodic-memory
semantic-memory
procedural-memory
memory-retrieval
memory-formation
memory-decay

Scope

vector-database-operations → data-engineer
rag-pipeline-architecture → llm-architect
embedding-model-selection → ml-engineer
knowledge-graph-design → knowledge-engineer

Tooling

Memory_frameworks

LangMem (LangChain) - When: LangGraph agents with persistent memory Note: Semantic, episodic, procedural memory types
MemGPT / Letta - When: Virtual context management, OS-style memory Note: Hierarchical memory tiers, automatic paging
Mem0 - When: User memory layer for personalization Note: Designed for user preferences and history

Vector_stores

Pinecone - When: Managed, enterprise-scale (billions of vectors) Note: Best query performance, highest cost
Qdrant - When: Complex metadata filtering, open-source Note: Rust-based, excellent filtering
Weaviate - When: Hybrid search, knowledge graph features Note: GraphQL interface, good for relationships
ChromaDB - When: Prototyping, small/medium apps Note: Developer-friendly, ~20ms p50 at 100K vectors
pgvector - When: Already using PostgreSQL, simpler setup Note: Good for <1M vectors, familiar tooling

Embedding_models

OpenAI text-embedding-3-large - When: Best quality, 3072 dimensions Note: $0.13/1M tokens
OpenAI text-embedding-3-small - When: Good balance, 1536 dimensions Note: $0.02/1M tokens, 5x cheaper
nomic-embed-text-v1.5 - When: Open-source, local deployment Note: 768 dimensions, good quality
all-MiniLM-L6-v2 - When: Lightweight, fast local embedding Note: 384 dimensions, lowest latency

Patterns

Memory Type Architecture

異なる情報に対して正しいメモリタイプを選択する

When to use: Designing agent memory system

MEMORY TYPE ARCHITECTURE (CoALA Framework):

異なる目的のための3つのメモリタイプ：

1. Semantic Memory: 事実と知識
   - あなたが世界について知っていることは何か
   - ユーザーの好み、ドメイン知識
   - プロファイル（構造化）またはコレクション（非構造化）に保存

2. Episodic Memory: 経験と出来事
   - 何が起こったか（タイムスタンプ付きイベント）
   - 過去の会話、タスク結果
   - 経験から学習するために使用

3. Procedural Memory: 方法知識
   - ルール、スキル、ワークフロー
   - しばしば少数のショット例として実装
   - 「以前はこれをどうやって解いたか？」

LangMem Implementation

from langmem import MemoryStore
from langgraph.graph import StateGraph

# Initialize memory store
memory = MemoryStore(
    connection_string=os.environ["POSTGRES_URL"]
)

# Semantic memory: user profile
await memory.semantic.upsert(
    namespace="user_profile",
    key=user_id,
    content={
        "name": "Alice",
        "preferences": ["dark mode", "concise responses"],
        "expertise_level": "developer",
    }
)

# Episodic memory: past interaction
await memory.episodic.add(
    namespace="conversations",
    content={
        "timestamp": datetime.now(),
        "summary": "Helped debug authentication issue",
        "outcome": "resolved",
        "key_insights": ["Token expiry was root cause"],
    },
    metadata={"user_id": user_id, "topic": "debugging"}
)

# Procedural memory: learned pattern
await memory.procedural.add(
    namespace="skills",
    content={
        "task_type": "debug_auth",
        "steps": ["Check token expiry", "Verify refresh flow"],
        "example_interaction": few_shot_example,
    }
)

Memory Retrieval at Runtime

async def prepare_context(user_id, query):
    # Get user profile (semantic)
    profile = await memory.semantic.get(
        namespace="user_profile",
        key=user_id
    )

    # Find relevant past experiences (episodic)
    similar_experiences = await memory.episodic.search(
        namespace="conversations",
        query=query,
        filter={"user_id": user_id},
        limit=3
    )

    # Find relevant skills (procedural)
    relevant_skills = await memory.procedural.search(
        namespace="skills",
        query=query,
        limit=2
    )

    return {
        "profile": profile,
        "past_experiences": similar_experiences,
        "relevant_skills": relevant_skills,
    }

Vector Store Selection Pattern

ユースケースに適したベクトルデータベースを選択する

When to use: Setting up persistent memory storage

VECTOR STORE SELECTION:

決定マトリックス：

|            | Pinecone | Qdrant | Weaviate | ChromaDB | pgvector |
|------------|----------|--------|----------|----------|----------|
| Scale      | Billions | 100M+  | 100M+    | 1M       | 1M       |
| Managed    | Yes      | Both   | Both     | Self     | Self     |
| Filtering  | Basic    | Best   | Good     | Basic    | SQL      |
| Hybrid     | No       | Yes    | Best     | No       | Yes      |
| Cost       | High     | Medium | Medium   | Free     | Free     |
| Latency    | 5ms      | 7ms    | 10ms     | 20ms     | 15ms     |

Pinecone (Enterprise Scale)

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("agent-memory")

# Upsert with metadata
index.upsert(
    vectors=[
        {
            "id": f"memory-{uuid4()}",
            "values": embedding,
            "metadata": {
                "user_id": user_id,
                "timestamp": datetime.now().isoformat(),
                "type": "episodic",
                "content": memory_text,
            }
        }
    ],
    namespace=namespace
)

# Query with filter
results = index.query(
    vector=query_embedding,
    filter={"user_id": user_id, "type": "episodic"},
    top_k=5,
    include_metadata=True
)

Qdrant (Complex Filtering)

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Filter, FieldCondition

client = QdrantClient(url="http://localhost:6333")

# Complex filtering with Qdrant
results = client.search(
    collection_name="agent_memory",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(key="user_id", match={"value": user_id}),
            FieldCondition(key="type", match={"value": "semantic"}),
        ],
        should=[
            FieldCondition(key="topic", match={"any": ["auth", "security"]}),
        ]
    ),
    limit=5
)

ChromaDB (Prototyping)

import chromadb

client = chromadb.PersistentClient(path="./memory_db")
collection = client.get_or_create_collection("agent_memory")

# Simple and fast for prototypes
collection.add(
    ids=[str(uuid4())],
    embeddings=[embedding],
    documents=[memory_text],
    metadatas=[{"user_id": user_id, "type": "episodic"}]
)

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=5,
    where={"user_id": user_id}
)

Chunking Strategy Pattern

ドキュメントをクエリ可能なチャンクに分割する

When to use: Processing documents for memory storage

CHUNKING STRATEGIES:

チャンキングのジレンマ：
- 大きすぎる：ベクトルが詳細度を失う
- 小さすぎる：コンテキストを失う

最適なチャンクサイズは以下に依存：
- ドキュメントタイプ（コード対散文対データ）
- クエリパターン（事実的対探索的）
- 埋め込みモデル（それぞれに最適ポイントがある）

一般的なガイダンス：ほとんどのユースケースで256-512トークン

Fixed-Size Chunking (Baseline)

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,      # Characters
    chunk_overlap=50,    # Overlap prevents cutting sentences
    separators=["\n\n", "\n", ". ", " ", ""]  # Priority order
)

chunks = splitter.split_text(document)

Semantic Chunking (Better Quality)

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

# Splits based on semantic similarity
splitter = SemanticChunker(
    embeddings=OpenAIEmbeddings(),
    breakpoint_threshold_type="percentile",
    breakpoint_threshold_amount=95
)

chunks = splitter.split_text(document)

Structure-Aware Chunking (Documents with Hierarchy)

from langchain.text_splitter import MarkdownHeaderTextSplitter

# Respect document structure
splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[
        ("#", "Header 1"),
        ("##", "Header 2"),
        ("###", "Header 3"),
    ]
)

chunks = splitter.split_text(markdown_doc)
# Each chunk has header metadata for context

Contextual Chunking (Anthropic's Approach)

# 埋め込み前に各チャンクにコンテキストを追加
# 検索失敗を35%削減

def add_context_to_chunk(chunk, document_summary):
    context_prompt = f'''
    Document summary: {document_summary}

    The following is a chunk from this document:
    {chunk}
    '''
    return context_prompt

# コンテキスト化されたチャンクを埋め込む、生のチャンクではなく
for chunk in chunks:
    contextualized = add_context_to_chunk(chunk, summary)
    embedding = embed(contextualized)
    store(chunk, embedding)  # Store original, embed contextualized

Code-Specific Chunking

from langchain.text_splitter import Language, RecursiveCharacterTextSplitter

# Language-aware splitting
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=200
)

# Respects function/class boundaries
chunks = python_splitter.split_text(python_code)

Background Memory Formation

非同期でメモリを処理してより高い品質を実現

When to use: You want higher recall without slowing interactions

BACKGROUND MEMORY FORMATION:

リアルタイムのメモリ抽出は会話を遅くし、エージェント
ツール呼び出しに複雑さを追加します。会話終了後の背景処理
は、より高品質なメモリを生成します。

パターン：無意識的なメモリ形成

LangGraph Background Processing

from langgraph.graph import StateGraph
from langgraph.checkpoint.postgres import PostgresSaver

async def background_memory_processor(thread_id: str):
    # Run after conversation ends or goes idle
    conversation = await load_conversation(thread_id)

    # Extract insights without time pressure
    insights = await llm.invoke('''
        Analyze this conversation and extract:
        1. Key facts learned about the user
        2. User preferences revealed
        3. Tasks completed or pending
        4. Patterns in user behavior

        Be thorough - this runs in background.

        Conversation:
        {conversation}
    ''')

    # Store to long-term memory
    for insight in insights:
        await memory.semantic.upsert(
            namespace="user_insights",
            key=generate_key(insight),
            content=insight,
            metadata={"source_thread": thread_id}
        )

# Trigger on conversation end or idle timeout
@on_conversation_idle(timeout_minutes=5)
async def process_conversation(thread_id):
    await background_memory_processor(thread_id)

Memory Consolidation (Like Sleep)

# メモリを定期的に統合および重複排除

async def consolidate_memories(user_id: str):
    # Get all memories for user
    memories = await memory.semantic.list(
        namespace="user_insights",
        filter={"user_id": user_id}
    )

    # Find similar memories (potential duplicates)
    clusters = cluster_by_similarity(memories, threshold=0.9)

    # Merge similar memories
    for cluster in clusters:
        if len(cluster) > 1:
            merged = await llm.invoke(f'''
                Consolidate these related memories into one:
                {cluster}

                Preserve all important information.
            ''')
            await memory.semantic.upsert(
                namespace="user_insights",
                key=generate_key(merged),
                content=merged
            )
            # Delete originals
            for old in cluster:
                await memory.semantic.delete(old.id)

Memory Decay Pattern

古い、無関係なメモリを忘れる

When to use: Memory grows large, retrieval slows down

MEMORY DECAY:

すべてのメモリが永遠に保つべきではありません：
- 古い好みは時代遅れかもしれない
- タスク詳細は関連性を失う
- 矛盾するメモリは検索を混乱させる

以下に基づいてインテリジェント減衰を実装：
- Recency（いつ作成/アクセスされたか？）
- Frequency（どのくらい頻繁に検索されるか？）
- Importance（コアな事実か詳細か？）

Time-Based Decay

from datetime import datetime, timedelta

async def decay_old_memories(namespace: str, max_age_days: int):
    cutoff = datetime.now() - timedelta(days=max_age_days)

    old_memories = await memory.episodic.list(
        namespace=namespace,
        filter={"last_accessed": {"$lt": cutoff.isoformat()}}
    )

    for mem in old_memories:
        # Soft delete (mark as archived)
        await memory.episodic.update(
            id=mem.id,
            metadata={"archived": True, "archived_at": datetime.now()}
        )

Utility-Based Decay (MIRIX Approach)

def calculate_memory_utility(memory):
    '''
    認知科学に触発された複合ユーティリティスコア：
    - Recency：いつ最後にアクセスされたか？
    - Frequency：どのくらい頻繁にアクセスされるか？
    - Importance：この情報はどのくらい重要か？
    '''
    now = datetime.now()

    # Recency score (exponential decay with 72h half-life)
    hours_since_access = (now - memory.last_accessed).total_seconds() / 3600
    recency_score = 0.5 ** (hours_since_access / 72)

    # Frequency score
    frequency_score = min(memory.access_count / 10, 1.0)

    # Importance (from metadata or heuristic)
    importance = memory.metadata.get("importance", 0.5)

    # Weighted combination
    utility = (
        0.4 * recency_score +
        0.3 * frequency_score +
        0.3 * importance
    )

    return utility

async def prune_low_utility_memories(threshold=0.2):
    all_memories = await memory.list_all()
    for mem in all_memories:
        if calculate_memory_utility(mem) < threshold:
            await memory.archive(mem.id)

Sharp Edges

Chunking Isolates Information From Its Context

Severity: CRITICAL

Situation: Processing documents for vector storage

Symptoms: 検索がチャンクを見つけるが、単独では意味をなさない。エージェント回答が全体像を見逃す。「関数は X を返す」がどの関数かを知らずに検索される。「これ」への参照が「これ」が何かを知らずに行われる。

Why this breaks: AI処理のためにドキュメントをチャンク化すると、全体的なナラティブを断片化し、全体像を見逃すことが多い孤立した断片に削減しています。「構成」について、構成されているシステムの内容を知らずにチャンク化しても、ほぼ無用です。

Recommended fix:

Contextual Chunking (Anthropic's approach)

埋め込み前に各チャンクにドキュメントコンテキストを追加検索失敗を35%削減

def contextualize_chunk(chunk, document):
    summary = summarize(document)

    # LLM generates context for chunk
    context = llm.invoke(f'''
        Document summary: {summary}

        Generate a brief context statement for this chunk
        that would help someone understand what it refers to:

        {chunk}
    ''')

    return f"{context}\n\n{chunk}"

# Embed the contextualized version
for chunk in chunks:
    contextualized = contextualize_chunk(chunk, full_doc)
    embedding = embed(contextualized)
    # Store original chunk, embed contextualized
    store(original=chunk, embedding=embedding)

Hierarchical Chunking

# 複数の粒度で保存
chunks_small = split(doc, size=256)
chunks_medium = split(doc, size=512)
chunks_large = split(doc, size=1024)

# クエリに基づいて適切なレベルで検索

Chunk Size Mismatched to Query Patterns

Severity: HIGH

Situation: Configuring chunking for memory storage

Symptoms: 高品質のドキュメントが低品質の検索を生成する。シンプルな質問が関連情報を見逃す。複雑な質問が完全な回答の代わりに断片を取得する。

Why this breaks: 最適なチャンクサイズはクエリパターンに依存：

事実的クエリは小さく、具体的なチャンクが必要
概念的クエリはより大きなコンテキストが必要
コードは関数レベルの境界が必要

最適なポイントはドキュメントタイプと埋め込みモデルによって異なります。デフォルトの1000文字は何にも対応しません。

Recommended fix:

異なるサイズをテスト

from sklearn.metrics import recall_score

def evaluate_chunk_size(documents, test_queries, chunk_size):
    chunks = split_documents(documents, size=chunk_size)
    index = build_index(chunks)

    correct_retrievals = 0
    for query, expected_chunk in test_queries:
        results = index.search(query, k=5)
        if expected_chunk in results:
            correct_retrievals += 1

    return correct_retrievals / len(test_queries)

# Test multiple sizes
for size in [256, 512, 768, 1024]:
    recall = evaluate_chunk_size(docs, test_queries, size)
    print(f"Size {size}: Recall@5 = {recall:.2%}")

コンテンツタイプ別サイズ推奨

CHUNK_SIZES = {
    "documentation": 512,   # Complete concepts
    "code": 1000,          # Function-level
    "conversation": 256,   # Turn-level
    "articles": 768,       # Paragraph-level
}

境界の問題を防ぐためにオーバーラップを使用

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,  # 10% overlap
)

Semantic Search Returns Irrelevant Results

Severity: HIGH

Situation: Querying memory for context

Symptoms: エージェントが関連しているように見えるが役に立たないメモリを検索する。「ユーザーの好みについて教えてください」が一般的な好みについての会話を返す、このユーザーのではなく。関連性スコアが高いが不適切なコンテンツ。

Why this breaks: セマンティック類似性は関連性と同じではありません。「ユーザーはPythonが好き」と「Pythonはプログラミング言語である」はセマンティック的に類似していますが、情報タイプは非常に異なります。メタデータフィルタリングなしでは、検索は単なる単語マッチングです。

Recommended fix:

常にメタデータフィルタリングを最初に実行

# セマンティック類似性だけに依存しない

# Bad: Only semantic search
results = index.query(
    vector=query_embedding,
    top_k=5
)

# Good: Filter then search
results = index.query(
    vector=query_embedding,
    filter={
        "user_id": current_user.id,
        "type": "preference",
        "created_after": cutoff_date,
    },
    top_k=5
)

ハイブリッド検索を使用（セマンティック＋キーワード）

from qdrant_client import QdrantClient

client = QdrantClient(...)

# Hybrid search with fusion
results = client.search(
    collection_name="memories",
    query_vector=semantic_embedding,
    query_text=query,  # Also keyword match
    fusion={"method": "rrf"},  # Reciprocal Rank Fusion
)

クロスエンコーダで結果を再ランク付け

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

# Initial retrieval (recall-oriented)
candidates = index.query(query_embedding, top_k=20)

# Rerank (precision-oriented)
pairs = [(query, c.text) for c in candidates]
scores = reranker.predict(pairs)
reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)

Old Memories Override Current Information

Severity: HIGH

Situation: User preferences or facts change over time

Symptoms: エージェントが廃止された好みを使用する。6ヶ月前の「ユーザーはダークモードを好む」が最近の「ライトモードに切り替える」リクエストをオーバーライドする。エージェントが確実に古いデータを使用する。

Why this breaks: ベクトルストアはデフォルトではテンポラル認識がありません。1年前のメモリは今日のメモリと同じ検索ウェイトを持ちます。好みと可変の事実については、最近の情報が通常古い情報をオーバーライドする必要があります。

Recommended fix:

テンポラルスコアリングを追加

from datetime import datetime, timedelta

def time_decay_score(memory, half_life_days=30):
    age = (datetime.now() - memory.created_at).days
    decay = 0.5 ** (age / half_life_days)
    return decay

def retrieve_with_recency(query, user_id):
    # Get candidates
    candidates = index.query(
        vector=embed(query),
        filter={"user_id": user_id},
        top_k=20
    )

    # Apply time decay
    for candidate in candidates:
        time_score = time_decay_score(candidate)
        candidate.final_score = candidate.similarity * 0.7 + time_score * 0.3

    # Re-sort by final score
    return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:5]

好みの追加の代わりに更新

async def update_preference(user_id, category, value):
    # Delete old preference
    await memory.delete(
        filter={"user_id": user_id, "type": "preference", "category": category}
    )

    # Store new preference
    await memory.upsert(
        id=f"pref-{user_id}-{category}",
        content={"category": category, "value": value},
        metadata={"updated_at": datetime.now()}
    )

事実の明示的なバージョニング

await memory.upsert(
    id=f"fact-{fact_id}-v{version}",
    content=new_fact,
    metadata={
        "version": version,
        "supersedes": previous_id,
        "valid_from": datetime.now()
    }
)

Contradictory Memories Retrieved Together

Severity: MEDIUM

Situation: User has changed preferences or provided conflicting info

Symptoms: エージェントが「ユーザーはダークモードを好む」と「ユーザーはライトモードを好む」を同じコンテキストで検索する。矛盾した回答を与える。ユーザーに対して混乱しているか健忘症のように見える。

Why this breaks: 競合解決がなければ、古い情報と新しい情報の両方が共存します。セマンティック検索は、同じトピック（好み）についてであるため、両方を返す可能性があります。エージェントはどちらが現在のかを知る方法がありません。

Recommended fix:

保存時に競合を検出

async def store_with_conflict_check(memory, user_id):
    # Find potentially conflicting memories
    similar = await index.query(
        vector=embed(memory.content),
        filter={"user_id": user_id, "type": memory.type},
        threshold=0.9,  # Very similar
        top_k=5
    )

    for existing in similar:
        if is_contradictory(memory.content, existing.content):
            # Ask for resolution
            resolution = await resolve_conflict(memory, existing)
            if resolution == "replace":
                await index.delete(existing.id)
            elif resolution == "version":
                await mark_superseded(existing.id, memory.id)

    await index.upsert(memory)

競合検出ヒューリスティック

def is_contradictory(new_content, old_content):
    # Use LLM to detect contradiction
    result = llm.invoke(f'''
        Do these two statements contradict each other?

        Statement 1: {old_content}
        Statement 2: {new_content}

        Respond with just YES or NO.
    ''')
    return result.strip().upper() == "YES"

定期的な統合

async def consolidate_memories(user_id):
    all_memories = await index.list(filter={"user_id": user_id})
    clusters = cluster_by_topic(all_memories)

    for cluster in clusters:
        if has_conflicts(cluster):
            resolved = await llm.invoke(f'''
                These memories may conflict. Create one consolidated
                memory that represents the current truth:
                {cluster}
            ''')
            await replace_cluster(cluster, resolved)

Retrieved Memories Exceed Context Window

Severity: MEDIUM

Situation: Retrieving too many memories at once

Symptoms: トークン制限エラー。エージェントが重要な情報を切り詰める。システムプロンプトが削減される。検索されたメモリが空間をめぐってユーザークエリと競争する。

Why this breaks: 検索は通常トップk結果を返します。k が大きすぎるか、チャンクが大きい場合、検索されたコンテキストはウィンドウを圧倒します。重要な情報（システムプロンプト、最近のメッセージ）が押し出されます。

Recommended fix:

異なるメモリタイプのトークンを予算化

TOKEN_BUDGET = {
    "system_prompt": 500,
    "user_profile": 200,
    "recent_messages": 2000,
    "retrieved_memories": 1000,
    "current_query": 500,
    "buffer": 300,  # Safety margin
}

def budget_aware_retrieval(query, context_limit=4000):
    remaining = context_limit - TOKEN_BUDGET["system_prompt"] - TOKEN_BUDGET["buffer"]

    # Prioritize recent messages
    recent = get_recent_messages(limit=TOKEN_BUDGET["recent_messages"])
    remaining -= count_tokens(recent)

    # Then user profile
    profile = get_user_profile(limit=TOKEN_BUDGET["user_profile"])
    remaining -= count_tokens(profile)

    # Finally retrieved memories with remaining budget
    memories = retrieve_memories(query, max_tokens=remaining)

    return build_context(profile, recent, memories)

チャンクサイズに基づく動的k

def retrieve_with_budget(query, max_tokens=1000):
    avg_chunk_tokens = 150  # From your data
    max_k = max_tokens // avg_chunk_tokens

    results = index.query(query, top_k=max_k)

    # Trim if still over budget
    total_tokens = 0
    filtered = []
    for result in results:
        tokens = count_tokens(result.text)
        if total_tokens + tokens <= max_tokens:
            filtered.append(result)
            total_tokens += tokens
        else:
            break

    return filtered

Query and Document Embeddings From Different Models

Severity: MEDIUM

Situation: Upgrading embedding model or mixing providers

Symptoms: 検索品質が急に低下する。関連ドキュメントが見つからない。ランダムな結果が返される。新しいドキュメントで動作し、古いドキュメントで失敗する。

Why this breaks: 埋め込みモデルは異なるベクトル空間を生成します。text-embedding-3 でクエリを埋め込んだ結果は、text-ada-002 で埋め込まれたドキュメントと一致しません。モデルの混合は結果として不適切な類似性スコアが得られます。

Recommended fix:

メタデータで埋め込みモデルを追跡

await index.upsert(
    id=doc_id,
    vector=embedding,
    metadata={
        "embedding_model": "text-embedding-3-small",
        "embedding_version": "2024-01",
        "content": content
    }
)

検索時にモデルバージョンでフィルタリング

results = index.query(
    vector=query_embedding,
    filter={"embedding_model": current_model},
    top_k=10
)

モデルアップグレードのための移行戦略

async def migrate_embeddings(old_model, new_model):
    # Get all documents with old model
    old_docs = await index.list(filter={"embedding_model": old_model})

    for doc in old_docs:
        # Re-embed with new model
        new_embedding = await embed(doc.content, model=new_model)

        # Update in place
        await index.update(
            id=doc.id,
            vector=new_embedding,
            metadata={"embedding_model": new_model}
        )

移行中に別のコレクションを使用

# Old collection: production queries
# New collection: re-embedding in progress
# Switch over when complete

Validation Checks

In-Memory Store in Production Code

Severity: ERROR

メモリ内ストアは再起動時にデータを失います

Message: In-memory store detected. Use persistent storage (Postgres, Qdrant, Pinecone) for production.

Vector Upsert Without Metadata

Severity: WARNING

ベクトルはフィルタリングのためにメタデータを持つべき

Message: Vector upsert without metadata. Add user_id, type, timestamp for proper filtering.

Query Without User Filtering

Severity: ERROR

クエリはデータ漏洩を防ぐためにユーザーでフィルタリングする必要があります

Message: Vector query without user filtering. Always filter by user_id to prevent data leakage.

Hardcoded Chunk Size Without Justification

Severity: INFO

チャンクサイズをテストし、正当化する必要があります

Message: Hardcoded chunk size. Test different sizes for your content type and measure retrieval accuracy.

Chunking Without Overlap

Severity: WARNING

チャンクオーバーラップは境界の問題を防ぎます

Message: Text splitting without overlap. Add chunk_overlap (10-20%) to prevent boundary issues.

Semantic Search Without Filters

Severity: WARNING

純粋なセマンティック検索は、しばしば無関係な結果を返します

Message: Pure semantic search. Add metadata filters (user, type, time) for better relevance.

Retrieval Without Result Limit

Severity: WARNING

無制限の検索はコンテキストをオーバーフローさせる可能性があります

Message: Retrieval without limit. Set top_k to prevent context overflow.

Embeddings Without Model Version Tracking

Severity: WARNING

埋め込みモデルを追跡して、移行を処理します

Message: Store embedding model version in metadata to handle model migrations.

Different Models for Document and Query Embedding

Severity: ERROR

ドキュメントとクエリは同じ埋め込みモデルを使用する必要があります

Message: Ensure same embedding model for indexing and querying.

Collaboration

Delegation Triggers

user needs vector database at scale -> data-engineer (Production vector store operations)
user needs embedding model optimization -> ml-engineer (Custom embeddings, fine-tuning)
user needs knowledge graph -> knowledge-engineer (Graph-based memory structures)
user needs RAG pipeline -> llm-architect (End-to-end retrieval augmented generation)
user needs multi-agent shared memory -> multi-agent-orchestration (Memory sharing between agents)

Related Skills

Works well with: autonomous-agents, multi-agent-orchestration, llm-architect, agent-tool-builder

When to Use

User mentions or implies: agent memory
User mentions or implies: long-term memory
User mentions or implies: memory systems
User mentions or implies: remember across sessions
User mentions or implies: memory retrieval
User mentions or implies: episodic memory
User mentions or implies: semantic memory
User mentions or implies: vector store
User mentions or implies: rag
User mentions or implies: langmem
User mentions or implies: memgpt
User mentions or implies: conversation history

Limitations

Use this skill only when the task clearly matches the scope described above.
Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.

ライセンス: MIT(寛容ライセンスのため全文を引用しています) · 原本リポジトリ

詳細情報

作者: sickn33
リポジトリ: sickn33/antigravity-awesome-skills
ライセンス: MIT
最終更新: 不明

GitHubで原本を見る →フィードバックを送る

Source: https://github.com/sickn33/antigravity-awesome-skills / ライセンス: MIT

SKILL.md 本文

Agent Memory Systems

Principles

Capabilities

Scope

Tooling

Memory_frameworks

Vector_stores

Embedding_models

Patterns

Memory Type Architecture

MEMORY TYPE ARCHITECTURE (CoALA Framework):

LangMem Implementation

Memory Retrieval at Runtime

Vector Store Selection Pattern

VECTOR STORE SELECTION:

Pinecone (Enterprise Scale)

Qdrant (Complex Filtering)

ChromaDB (Prototyping)

Chunking Strategy Pattern

CHUNKING STRATEGIES:

Fixed-Size Chunking (Baseline)

Semantic Chunking (Better Quality)

Structure-Aware Chunking (Documents with Hierarchy)

Contextual Chunking (Anthropic's Approach)

Code-Specific Chunking

Background Memory Formation

BACKGROUND MEMORY FORMATION:

LangGraph Background Processing

Memory Consolidation (Like Sleep)

Memory Decay Pattern

MEMORY DECAY:

Time-Based Decay

Utility-Based Decay (MIRIX Approach)

Sharp Edges

Chunking Isolates Information From Its Context

Contextual Chunking (Anthropic's approach)

Hierarchical Chunking

Chunk Size Mismatched to Query Patterns

異なるサイズをテスト

コンテンツタイプ別サイズ推奨

境界の問題を防ぐためにオーバーラップを使用

Semantic Search Returns Irrelevant Results

常にメタデータフィルタリングを最初に実行

ハイブリッド検索を使用（セマンティック＋キーワード）

クロスエンコーダで結果を再ランク付け

Old Memories Override Current Information

テンポラルスコアリングを追加

好みの追加の代わりに更新

事実の明示的なバージョニング

Contradictory Memories Retrieved Together

保存時に競合を検出

競合検出ヒューリスティック

定期的な統合

Retrieved Memories Exceed Context Window

異なるメモリタイプのトークンを予算化

チャンクサイズに基づく動的k

Query and Document Embeddings From Different Models

メタデータで埋め込みモデルを追跡

検索時にモデルバージョンでフィルタリング

モデルアップグレードのための移行戦略

移行中に別のコレクションを使用

Validation Checks

In-Memory Store in Production Code

Vector Upsert Without Metadata

Query Without User Filtering

Hardcoded Chunk Size Without Justification

Chunking Without Overlap

Semantic Search Without Filters

Retrieval Without Result Limit

Embeddings Without Model Version Tracking

Different Models for Document and Query Embedding

Collaboration

Delegation Triggers

Related Skills

When to Use

Limitations

詳細情報

関連スキル

agent-browser