All notable changes to SMEAGLE will be documented in this file.
The format is based on Keep a Changelog.
deploy-do.yml workflow with SSH-based deploy/destroy lifecycle, persistent DO Volume for model cache, and full-precision OLMo 3.1 32B Think (BF16)docker-compose.olmo-32b-think-bf16.yml for allenai/Olmo-3.1-32B-Think at full 65k context on H200docker-compose.olmo-32b-fp8.yml for kaitchup/Olmo-3.1-32B-Instruct-fp8-dynamic with FP8 KV cache on L40Sdocker-compose.olmo-32b-8bit.yml for cyankiwi/Olmo-3.1-32B-Think-AWQ-8bit as an alternativecross-encoder/ms-marco-MiniLM-L-12-v2 (33M params) with mixedbread-ai/mxbai-rerank-large-v2 (435M params) for higher-quality passage rerankingkaitchup/Olmo-3.1-32B-Instruct-fp8-dynamic (FP8, 20k context, FP8 KV cache) instead of the AWQ 4-bit Think variant. Weave traces showed peak usage at ~16k tokens — 20k context fits comfortably on L40S 48GBLLM_REPETITION_PENALTY raised from 1.1 to 1.25 to reduce circular chain-of-thought/smeagle/{service} with 14-day retention[DEPLOY_TIMELINE] lines for each deploy stagemake update-ip refreshes security group rules across both regionscorpus/source_perspectives.json maps all 88 sources to perspective categorieseval/sweep.py) — define a manifest of parameter variations (repetition penalty, top_k, temperature, etc.), run all experiments in one command, and generate a markdown comparison report[S##] references tracked per response in Weave evalshttpx with openai.AsyncOpenAI for auto-instrumented token tracking and retriesLLM_REPETITION_PENALTY (1.1) and LLM_MAX_TOKENS (4,096) curb circular chain-of-thoughtrerank_enabled, max_context_tokens, top_k_*, llm_model match deployed values<think> tags; skip-RAG path also captures chain-of-thought--max-model-len 49152, MAX_CONTEXT_CHARS 80000 for reasoning headroomcross-encoder/ms-marco-MiniLM-L-12-v2 (Microsoft Research). Reranks the merged vector+lexical candidate pool before final top-K truncation for more precise passage selection.top_k_rerank setting: New config parameter (default 50) caps how many candidates enter the cross-encoder, balancing accuracy vs latency.GET /v1/bundles/{bundle_id}/health returns chunk size distribution, metadata coverage, duplicate detection, and index consistency diagnostics for ingestion tuninganswer_choice field: API responses now include explicit answer_choice field for A/B/C questions, parsed via dedicated logic independent of LLM output formatcompare_rag now prints RAG vs No-RAG scoring directly to terminal after completion (previously only in HTML reports)compare_rag auto-loads environment variables from .env file for API_KEY and API_URLfeb8withletters-mini.txt (3 questions) for quick smoke testsdocker-compose.dev.yml mirrors the production stack (Caddy, OAuth, all services) but routes LLM calls to the remote AWS GPU instance via SSM port forwarding -- no local GPU requiredmake dev / make dev-down / make dev-logs / make dev-tunnel etc.api/ and tools/ via volume mounts + uvicorn --reloadmake dev-ingestMinistral-3-14B-Instruct-2512 (FP8) to Ministral-3-14B-Reasoning-2512 (BF16) for chain-of-thought reasoning. Hypothesis: native [THINK] reasoning should improve worldview-vs-source discrimination where prompt engineering alone could not. Context window reduced from 64k to 32k tokens to fit BF16 weights in VRAM.top_k_final from 20 to 8 to cut noise for the 14B model.Tags: and Sources: metadata lines from the constitution prompt to eliminate source number collisions with RAG citation labels ([S9], [S10], etc.).SmeagleModel in weave_eval.py now logs full server-side config (LLM model, top_k, chunk_size, chunk_overlap, rerank) per eval run for reproducibility.all-MiniLM-L6-v2 (384d, 256 tokens) to nomic-ai/nomic-embed-text-v1.5 (768d, 8,192 tokens) for significantly better retrieval quality. Chunks are now fully represented without truncation.task field (search_query / search_document) for asymmetric retrieval, as required by the nomic model.--concurrency and --multi-turn flags (questions now always run sequentially)eval/prompts/ to api/rag/prompts/default.txt to ship with API code in Docker containerscompare_rag so progress prints appear immediately in all terminal environmentsdocker-compose.local.yml, docker-compose.local-auth.yml, and docker-compose.override.yml -- all replaced by docker-compose.dev.ymlmake local, make local-simple, and related targets replaced by make dev familyapi/gpu_manager.py, admin GPU start/stop/status endpoints, and frontend GPU control panel -- no longer needed since everything runs on a single GPU instancedeploy.ymleval.prompts)eval.questions)[S1], [S2] references matching the master spreadsheet, with Google Drive linksskip_rag and system_prompt API parameters for A/B testing/v1/config endpoint exposing RAG configuration