Changelog¶
All notable changes to Quoriv will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.0.1] — 2026-05-19¶
Fixed¶
- Dependency drift on fresh installs.
pyproject.tomloriginally pinneddeepagents>=0.0.1and the LangChain ecosystem with open upper bounds. A freshpip install quoriv==1.0.0therefore resolveddeepagents 0.7+, which movedLocalShellBackendout ofdeepagents.backends, andquoriv chatcrashed at import time withImportError: cannot import name 'LocalShellBackend' from 'deepagents.backends'. - Tightened all agent-runtime pins to the version windows v1.0 was actually tested against:
deepagents>=0.6.1,<0.7.0langchain>=1.0,<2.0,langchain-core>=1.0,<2.0,langgraph>=1.0,<2.0,langgraph-checkpoint-sqlite>=2,<3,langchain-openai>=1.0,<2.0- Provider extras (
anthropic,gemini,ollama) and theall-providersbundle now carry matching major-version caps. pydantic>=2.7,<3.0,pydantic-settings>=2.3,<3.0,typer>=0.12,<1.0— same rationale, lower blast radius.- No code changes; this is purely a dependency-constraint hotfix.
quoriv chatboots cleanly onpip install quoriv==1.0.1from a fresh interpreter.
[1.0.0] — 2026-05-19¶
First stable release. The project crossed feature-complete at the end of Phase 4: full chat / TUI loop, six model providers (OpenAI, Anthropic, Gemini, Ollama, vLLM, OpenRouter), permission modes with HITL, AST + git + tests + web tools, MCP + Python plugins, hooks, replay viewer, eval harness with runner + CLI, opt-in telemetry with HTTP backend, tag-triggered PyPI release pipeline (OIDC trusted publishing), cross-platform PyInstaller binaries, and an mkdocs-material documentation site.
Added¶
Phase 0, Day 1 — Foundation scaffold¶
pyproject.tomlwith full Phase 1 dependencies and optional groups (ast,mcp,anthropic,gemini,ollama,dev,docs)- Apache 2.0 license
- README with project overview, architecture, and usage examples
CONTRIBUTING.mdwith development workflow and architectural rulesSECURITY.mdwith disclosure policy.gitignorecovering Python, IDE, and Quoriv-specific artifacts.pre-commit-config.yamlwith ruff, mypy, and standard hooks- GitHub Actions CI:
test.yml(Windows / macOS / Linux × Python 3.11 / 3.12),lint.yml(ruff + mypy) - Issue templates: bug report, feature request
Phase 0, Day 2 — Config layer + folder skeleton¶
src/quoriv/folder skeleton with subpackages forcore,models,tools,permissions,plugins,plugins.mcp,ui,config,observability,reposrc/quoriv/py.typed(PEP 561 marker — Quoriv ships types)quoriv.config.schema— Pydantic v2 models (QuorivConfig,ModelConfig,PermissionsConfig,UIConfig,ToolsConfig) withextra="forbid"strictness and Literal types for enums (PermissionMode,Theme)quoriv.config.loader— TOML loader with global (~/.quoriv/config.toml) + project (.quoriv/config.tomlwalked up from cwd) merge;global_config_path,project_config_path,load_config,_deep_mergeconfig.example.toml— annotated example at repo roottests/conftest.pywithfake_homefixture- 36 unit tests for config schema + loader
Phase 0, Day 3 — API keys + model factory¶
quoriv.config.keychain—keyring-backed API key storage with env-var fallback precedence (PROVIDER_ENV_VARS,set_api_key,get_api_key,delete_api_key,list_known_providers)quoriv.models.base—ModelSpec(parses"provider:name"with first-colon split for Ollama tags),ModelCapabilities,MissingAPIKeyErrorquoriv.models.factory—get_model("provider:name")with lazy provider loading viaimportlib;UnknownProviderError;list_providersquoriv.models.openai— OpenAI provider vialangchain-openairesolving keys through keychainfake_keyringtest fixture (in-memory keychain + env-var isolation)- Tests for keychain (12), base (14), factory (7), openai (5) — 74 tests total, all passing
Phase 0, Day 4 — CLI + chat loop (no DeepAgents yet)¶
quoriv.__main__— makespython -m quorivwork the same as thequorivconsole scriptquoriv.cli— Typer app with commands:quoriv chat [--model] [--mode]— start an interactive sessionquoriv doctor— health-check Rich table (Python, configured models, permission mode, API key status per provider)quoriv version— print the installed versionquoriv config show— print the loaded merged configuration as JSONquoriv config set <provider>— prompt for API key (hidden input) and store in OS keychainquoriv config list-providers— table of known providers, env-var names, and whether a key is configuredquoriv.app— async chat loop usingrich.Console+prompt_toolkit.PromptSession; streams responses via LangChainmodel.astream(messages); slash commands/help,/clear,/exit,/quit; graceful Ctrl+C handling; helpful prompt when an API key is missing- Tests for the CLI commands using
typer.testing.CliRunner(interactivechatdeferred to Phase 1 integration tests)
Phase 0, Day 5 — DeepAgents wired¶
quoriv.core.agent.build_agent— constructs a session-scopedCompiledStateGraphviadeepagents.create_deep_agent, withLocalShellBackend(root_dir=cwd)for real file ops + shell, an in-memoryMemorySavercheckpointer for multi-turn state, and always-onPATH_PROTECTIONrules denying writes to.env*,.git/**,.ssh/**, andsecrets/**quoriv.core.PATH_PROTECTION— tuple ofFilesystemPermissiondeny rules that no permission mode can disable (security invariant)quoriv.core.events— Rich-rendering helpers for LangGraph events:render_token,render_tool_start,render_tool_end,_format_argsquoriv.apprewritten to drive the DeepAgent:- Replaced direct
model.astream(messages)withagent.astream_events({"messages": [HumanMessage]}, version="v2") - Per-session
thread_idkeys the checkpointer —/clearrotates to a new thread for a fresh conversation - Event-kind dispatch for
on_chat_model_stream,on_tool_start,on_tool_end quoriv chatgains a--cwdoption to target a specific repo root- New tests:
tests/unit/core/test_agent.py—PATH_PROTECTIONshape (5 rules covering env/git/ssh/secrets),build_agentraisesMissingAPIKeyErrorwithout keys, returns a graph exposingastream_events/ainvoke/invokewhen keys are present, honorsmodel_overridetests/unit/core/test_events.py—render_token,render_tool_start,render_tool_end(short/long/None/multiline cases),_format_args(key=value, truncation per-value and per-call, non-dict fallback, empty dict)
With Day 5 wired, quoriv chat now has the full DeepAgents built-in toolset available out of the gate: write_todos, ls, read_file, write_file, edit_file, glob, grep, execute, and task (for sub-agent delegation).
Phase 1 Slice 1 — Permission modes wired¶
quoriv.permissions.modes— new module exportingPermissionMode(Literal["read-only", "ask", "auto", "yolo"]),WRITE_TOOLS({"write_file", "edit_file"}),SHELL_TOOLS({"execute"}),interrupt_on_for_mode(mode), andis_read_only(mode)quoriv.permissions.paths— canonical home forPATH_PROTECTION(moved out ofcore/agent.py, where it was a transient placeholder)quoriv.permissions.__init__— re-exports the public surface from both submodulesquoriv.core.agent.build_agentnow accepts amode: PermissionMode = "ask"parameter and applies the compiledinterrupt_on=dict viainterrupt_on_for_mode(mode). Modes compile as:yolo→{}(no prompts)auto→{"execute": True}(prompt only before shell)ask,read-only→{"write_file": True, "edit_file": True, "execute": True}(prompt before every write or shell call). Hard write denial inread-onlyis enforced at the approval-prompt UI (Slice 2).quoriv.app.run_chatvalidates the mode string againstALLOWED_MODESand passes it through tobuild_agent- 24 new tests:
tests/unit/permissions/test_modes.py— 16 tests covering all 4 modes, tool-set membership and disjointness, dict freshness,is_read_onlytests/unit/permissions/test_paths.py— 8 tests covering rule shape, env/git/ssh/secrets coverage, POSIX rooting, immutabilitytests/unit/core/test_agent.pyupdated:TestPathProtectionremoved (moved with the data totest_paths.py),TestBuildAgentModesadded (parametrized build for all 4 modes)
Changed¶
PATH_PROTECTIONis no longer re-exported fromquoriv.core— import it fromquoriv.permissionsinstead.
Test count: 108 → 130 (+22). All ruff / ruff format / mypy strict / pytest gates green.
Phase 1 Slice 2 — Approval prompt UI for HITL pauses¶
quoriv.ui.prompts— new module withprompt_approval(),ApprovalDecision(frozen dataclass),parse_choice(),READ_ONLY_DENIAL_MESSAGEconstant,DecisionTypeLiteral- Renders a yellow Rich
Panelshowing tool name, JSON-formatted args, and middleware description; prompts viaprompt_toolkitfora/r(aliases:approve/reject/y/yes/n/no/deny) auto_deny=True(used inread-onlymode) renders the panel then auto-rejects with an explanatory message back to the agentquoriv.app._stream_agentrefactored to_drive_turn: loop that streams events, callsagent.aget_state()to detect pending HITL interrupts, prompts the user for eachActionRequest, and resumes withCommand(resume={"decisions": [...]})- 28 new tests for
parse_choice(approve/reject aliases, invalid input),ApprovalDecision(defaults, frozen behavior),prompt_approval(auto_deny path),_render_approval_panel,_format_args
Test count: 130 → 158 (+28). All gates green.
Phase 1 Slice 3 — Markdown streaming + edit_file diff renderer¶
quoriv.ui.stream— new module withStreamRenderer: RichLive+Markdownwrapper that accumulates streamed tokens and live-renders them with markdown semantics (bold, code blocks, lists, syntax highlighting). Properties:is_streaming,buffer. Methods:push(text),finalize() -> str. Safe to callfinalizeon idle state.quoriv.ui.diff— new module withcompute_diff()(pure function returningdifflib.unified_difftext) andrender_edit_diff()(renders the diff with RichSyntax(theme="ansi_dark")and anedit_fileheader line). Handles no-changes and missing-file-path cases.quoriv.app._stream_eventsrewritten:- Each call now owns a
StreamRendererinstance (lifecycle managed bytry/finally) on_chat_model_stream→renderer.push(text)(replaces rawrender_token)on_chat_model_endandon_tool_start→renderer.finalize()to close the Live cleanlyon_tool_startforedit_file→render_edit_diff()(colored unified diff) instead of generic header- All other tools fall through to the existing
render_tool_start/render_tool_end - 16 new tests:
test_stream.py(initial state, empty-push noop, accumulation, finalize semantics, restart after finalize);test_diff.py(identical strings → empty diff, change → unified diff, file path in headers, context lines respected, addition/removal only, render handles no-changes and missing path)
Test count: 158 → 174 (+16). All gates green. Source files: 25 → 27.
Phase 1 Slice 1b — PathProtectionMiddleware (custom guard)¶
quoriv.permissions.guard— new module withPathProtectionMiddleware, alangchain.agents.middleware.AgentMiddlewaresubclass that enforcesPATH_PROTECTIONdeny rules at the middleware layer. Runs inafter_model(andaafter_model), scans the latestAIMessage.tool_callsfor path-bearing tools, and replaces denied calls with synthetic errorToolMessageobjects so the agent observes the rejection on its next turn — a hard denial that bypasses HITL._TOOL_OPERATIONmap covers DeepAgents' built-in filesystem tools (ls,read_file,glob,grep→read;write_file,edit_file→write). Tools outside the map are treated as path-irrelevant and pass through._check_denialuseswcmatch.glob.globmatchwithBRACE | GLOBSTARflags — same semantics as DeepAgents' ownFilesystemMiddleware, so deny patterns match identically whether DeepAgents adopts native sandbox-compatiblepermissions=later or not._extract_pathreadsfile_pathorpathfrom the tool call's args dict (the two argument names DeepAgents' file tools use).quoriv.permissions.__init__re-exportsPathProtectionMiddlewarefromguard.quoriv.core.agent.build_agentnow wiresmiddleware=[PathProtectionMiddleware(list(PATH_PROTECTION))]intocreate_deep_agent. The custom guard layer is required because DeepAgents 0.6.1 raisesNotImplementedErrorwhenpermissions=is combined with aSandboxBackendProtocolbackend (whichLocalShellBackendis) — and we needLocalShellBackendfor real shell execution.- 26 new tests in
tests/unit/permissions/test_guard.pycovering: rule passthrough, allow-rule precedence, every denied tool name (write/edit/read/ls/glob/grep), path arg variants (file_pathvspath), glob patterns (*.env,.git/**,secrets/**), unrelated tools passing through, no-AIMessage / no-tool-calls early exits, multiple tool calls in one message (some denied, some kept),aafter_modeldelegating to sync, immutable rules view, and integration withPATH_PROTECTIONitself.
Test count: 174 → 200 (+26). All gates green.
Phase 1 Slice 4 (minimal) — Python find_symbol tool¶
quoriv.tools.ast_tools— new module withfind_symbol, a@tool-decorated callable that walks*.pyfiles under a path and returns every definition matching a target name. Returns a list of records:{file, lineno, col_offset, kind, name, parent}.- Symbol kinds:
function,async_function,class,variable(module/class-levelName = ...assignments). Methods recurse one level into class bodies and reportparent=<ClassName>. - Implementation uses the stdlib
astmodule only — no tree-sitter yet. Skips.venv,venv,__pycache__,.git,build,distdirectories so third-party code doesn't pollute results. Silently skips files that fail to parse or decode. - Accepts either a directory or a single file path. Nonexistent paths return
[]. quoriv.tools.__init__—QUORIV_TOOLS = [find_symbol];quoriv.core.agentregisters it viatools=list(QUORIV_TOOLS)increate_deep_agent. The DeepAgents built-ins (ls,read_file,write_file,edit_file,glob,grep,execute,task,write_todos) remain —find_symbolis purely additive.- 12 new tests in
tests/unit/tools/test_ast_tools.pycovering: function / async function / class / method-with-parent / module-level variable / no-match / subdirectory recursion /.venv+__pycache__skip / syntactically broken file skip / nonexistent path / single-file path / BaseTool registration.
Slice 4b (tree-sitter expansion for ~30 languages, go_to_definition, find_references) is deferred.
Test count: 200 → 212 (+12). All gates green.
Phase 1 Slice 5 — Git tools (read-only)¶
quoriv.tools.git— new module with four plain@toolcallables shelling out togitviasubprocess.run(shell=False, list args,cwd-bound, UTF-8 decoded witherrors="replace"):git_status(cwd=".")— returns{branch, ahead, behind, is_clean, files}(each file:{path, index, worktree, old_path?}). Detached HEAD reportsbranch=None. Renames keepold_path.git_diff(path=None, staged=False, revision_range=None, cwd=".")— returns{diff, is_empty}. Combines path scoping with working-tree, staged (--cached), or revision-range diffs.git_log(limit=20, path=None, cwd=".")— returns{entries, count}where each entry is{sha, short_sha, author, email, date, subject}. Uses\x1ffield separator + ISO dates for unambiguous parsing.git_blame(file, line_start=None, line_end=None, cwd=".")— returns{file, entries}with{sha, author, date, lineno, content}per line.-L start,end(or single line) scopes the blame.- Uniform failure shape across all four tools:
{"error": "<message>"}for non-zero git exit, not-a-repo errors, missing-file errors, invalid args (e.g.,limit < 1), and thegit-not-on-PATH case (FileNotFoundError→"git executable not found on PATH (...)"). - Parser helpers
_parse_status_porcelainand_parse_branch_lineare exported for direct unit tests. Branch line covers## main,## main...origin/main,[ahead N],[behind M],[ahead N, behind M], and## HEAD (no branch). - Write operations (
git add/git commit/git stash/ ...) are intentionally not in this slice — they land later behindinterrupt_on=so HITL prompts before mutating the working tree. quoriv.tools.__init__—QUORIV_TOOLSnow exposes[find_symbol, git_status, git_diff, git_log, git_blame];__all__re-exports each tool by name.- 42 new tests in
tests/unit/tools/test_git.py: TestParseBranchLine(6) — every branch-line variantTestParseStatusPorcelain(6) — modified / staged / untracked / rename / too-short / malformedTestGitStatus(6) — clean repo, untracked file, modified-then-staged, not-a-repo,FileNotFoundErrorfor missing git binary, default-cwdviamonkeypatch.chdirTestGitDiff(7) — no-changes, working-tree, staged, revision range, path scoping, not-a-repo, bad revisionTestGitLog(7) — reverse-chronological ordering, entry-field shape, limit, path filter, invalidlimit, empty repo, not-a-repoTestGitBlame(5) — full-file, line range, single line, missing file, not-a-repoTestToolRegistration(5) — each tool is aBaseToolwith the right.nameand is present inQUORIV_TOOLS- Test infrastructure: a small in-file helper trio (
_git,_init_repo,_commit) builds deterministic repos by pinningGIT_AUTHOR_*/GIT_COMMITTER_*env vars andcommit.gpgsign=falseper repo so tests are stable across hosts.
Test count: 212 → 254 (+42). All gates green.
Phase 1 Slice 7 — SQLite session persistence + /save / /load / /resume¶
langgraph-checkpoint-sqlite>=2.0.0added to runtime dependencies (resolves to 3.1.0).aiosqlite>=0.20.0was already in place.quoriv.core.persistence— new module with:- Path helpers:
quoriv_dir(cwd),db_path(cwd),registry_path(cwd),ensure_quoriv_dir(cwd). The agent's SQLite checkpointer lives at<cwd>/.quoriv/sessions.dband the named-session sidecar at<cwd>/.quoriv/sessions.json(per-project, mirroring how.git/is per-repo). NamedSessionfrozen dataclass:{name, thread_id, saved_at}(ISO-8601 UTC timestamp).SessionRegistry— file-backedname → thread_idmapping. Loaded on construction, written eagerly on every mutation. Malformed / missing files reset to an empty registry rather than raising — the underlying SQLite DB is the real source of truth, so a corrupted name index is a recoverable convenience-layer issue. Public API:for_cwd(cwd),save(name, thread_id, *, now=None),load(name),list_named(),most_recent(),remove(name),path.quoriv.core.__init__re-exportsSessionRegistry,NamedSession,db_path,ensure_quoriv_dir,quoriv_dir,registry_path.quoriv.app.run_chatrewritten to manage the saver lifecycle:- Resolves
cwdto an absolutePath, callsensure_quoriv_dir, opensAsyncSqliteSaver.from_conn_string(str(sessions_db))viaasync with, passes the saver tobuild_agent(..., checkpointer=saver). The prompt-loop body moved into_interactive_loop(console, agent, registry, mode)so the lifecycle stays explicit. - New slash commands (with handler helpers
_handle_save,_handle_load,_handle_resume,_print_saved_sessions): /save [name]— anchor the current thread undername(default: first 8 chars of the thread id). Overwrites any prior entry under that name./load— list saved sessions (most-recent first), or/load <name>to switch the active thread./resume— switch to the most-recently-saved thread (bysaved_at)._handle_slashsignature now takes aSessionRegistry; legacy commands (/help,/clear,/exit,/quit) still work and surface the new entries through/help.SLASH_COMMANDSextended accordingly.- 44 new tests:
tests/unit/core/test_persistence.py(26) — path helpers, construction (empty / no-file-on-init),savereturns/persists/round-trips/overwrites/timestamps,loadknown/unknown,list_namedordering,most_recentbysaved_at,removeexisting/unknown/persists, malformed-file recovery (bad JSON, non-dict root, missingsessionskey, non-list value, dropped-field entries).tests/unit/test_app_slash.py(18) —SLASH_COMMANDStable,/savewith-name / default-name / reports / empty-thread-id / overwrites,/loadknown / unknown / empty-list / populated-list,/resumemost-recent / empty-registry, legacy/exit//quit//clear//help/ unknown-command paths.
Test count: 254 → 298 (+44). All gates green.
Phase 1 Slice 5b — Git write tools (HITL-gated)¶
quoriv.tools.gitextended with three write callables sharing the samesubprocess.run(shell=False, list args,cwd-bound) anddict[str, Any]return shape as the read tools:git_add(paths=None, cwd=".")— stages specific paths or all changes (git add -A). Returns{"staged_files": list[str]}fromgit diff --cached --name-onlyafter the add. Emptypathslist is treated as "add all".git_commit(message, cwd=".")— creates a commit from the current index. Returns{"sha", "short_sha", "subject", "branch": str | None}parsed fromgit rev-parse/git log -1rather than the locale-sensitivegit commitoutput. Empty message rejected locally with a structured error.git_stash(message=None, include_untracked=False, cwd=".")— pushes the working tree onto the stash. Returns{"stashed": bool, "message": str | None}—stashed=Falsewhen git printed"No local changes to save".- All three respect local git config: no
--no-gpg-sign, no--no-verify. Tests configurecommit.gpgsign=falseper fixture repo so signing-required hosts do not block the suite. quoriv.permissions.modes.GIT_WRITE_TOOLS = frozenset({"git_add", "git_commit", "git_stash"})— Quoriv-specific git tools that mutate repo state, gated alongsideWRITE_TOOLSinask/read-onlymodes.automode lets them run silently (likewrite_file);yololets everything through.interrupt_on_for_mode("ask")andinterrupt_on_for_mode("read-only")now include the new tool names;autoandyolodeliberately do not.quoriv.permissions.__init__re-exportsGIT_WRITE_TOOLS.quoriv.tools.__init__extendsQUORIV_TOOLSto include the three new tools and re-exports them.- 23 new tests:
tests/unit/permissions/test_modes.py(5) —GIT_WRITE_TOOLSmembership shape, pairwise-disjoint withWRITE_TOOLSandSHELL_TOOLS,autodoes NOT gate them,askdoes,read-onlydoes.tests/unit/tools/test_git.py(18):TestGitAdd(6) — add-all, specific paths, empty-paths defaults, nonexistent path errors, clean-repo returns empty, not-a-repo errors.TestGitCommit(5) — staged commit, nothing-staged errors, empty-message rejected locally, subject is first line, not-a-repo errors.TestGitStash(5) — with changes, with no changes (stashed=False), with message,-uincludes untracked, not-a-repo errors.TestToolRegistrationparametrizes over all 7 git tools now.
Test count: 298 → 321 (+23). All gates green.
Phase 1 Slice 6 — Language-aware run_tests tool¶
quoriv.tools.tests— new module with one@toolcallablerun_tests(framework=None, path=None, cwd=".")that auto-detects the project's test framework from marker files and runs the suite viasubprocess.run(shell=False, list args, cwd-bound).- Detection (in order — first match wins, so a polyglot repo with both
pyproject.tomlandpackage.jsondefaults to Python, matching Quoriv's own layout): pyproject.toml/pytest.ini/setup.cfg→pytestpackage.json→npm testCargo.toml→cargo testgo.mod→go test ./...- Command construction (
_build_command): each framework gets its idiomatic invocation.pytest -q,npm test --silent,cargo test,go test ./.... Path scoping uses the framework's native convention — positional arg for pytest, after--for npm/cargo, replaces./...for go. - Returns
{framework, command, exit_code, passed, stdout, stderr}on success. The structured shape lets the LLM checkpassed: boolwithout parsing free-form output. On failure (no detection, cwd missing, unrecognized override, runner binary not on PATH) returns{"error": "..."}plus the attemptedframework/commandwhen known, so the agent can surface what was tried. quoriv.tools.__init__registersrun_testsinQUORIV_TOOLSand re-exports it.run_testsdeliberately stays outsideGIT_WRITE_TOOLS— it executes a runner locally without mutating repo state; the session's existing shell-execution gate applies via DeepAgents'executeif the underlying runner shells out further.- 29 new tests in
tests/unit/tools/test_runner.py: TestDetectFramework(8) — each marker file maps to the right framework, empty dir returns None, polyglot tie-break favors Python.TestBuildCommand(9) — default + with-path for each framework, unknown framework raisesValueError.TestRunTests(10) — passes / failure-sets-passed-false, framework override (works even with no marker files), path scoping for pytest, no-framework-detected error, unknown-framework override error, nonexistent cwd error, runner-binary-missing error (withframework+commandecho), subprocess called with resolved absolute cwd,shell=never set (subprocess defaults toshell=False).TestToolRegistration(2) —run_testsis a BaseTool with the right name and is present inQUORIV_TOOLS.
Slice 6b (parsed test-count summary from each runner's output) is deferred.
Test count: 321 → 350 (+29). All gates green.
Phase 1 Slice 8 — Status line + introspection slash commands¶
-
quoriv.app— persistentbottom_toolbarwired into thePromptSession. New pure helper_build_status_line(model_id, mode, cwd, thread_id)returns the formatted bar:| mode= | | thread=
Bottom-bar callable closes over the loop's thread_id so /clear / /load / /resume rotations are reflected on the next prompt without any extra plumbing.
- Four new read-only slash commands wired through _handle_slash (now accepting keyword-only model_id / cwd / mode with safe defaults so legacy call sites and prior-slice tests still type-check):
- /tools — lists DeepAgents built-ins (write_todos, ls, read_file, write_file, edit_file, glob, grep, execute, task) under one heading and QUORIV_TOOLS (find_symbol, git_*, run_tests) under another, each with a one-line description.
- /memory — shows status of ~/.quoriv/memory.md (global) and <cwd>/PROJECT.md (project): which exist + byte size, with a hint when neither is present. These are the files DeepAgents' MemoryMiddleware will load once build_agent wires memory=[...] in a later slice.
- /mode — prints the active permission mode, its description, the current interrupt_on_for_mode(mode) tool list (so the user sees exactly which tools will prompt), and the full available-modes table with the current one marked. Includes a hint that live-switching needs quoriv chat --mode <name> for now.
- /cost — explicit stub pointing at Slice 9 (token tracking arrives with the local JSON trace log).
- _handle_slash dispatch extended with the four new commands. The function gained keyword-only model_id / cwd / mode parameters with defaults; _interactive_loop now also takes model_id / cwd and threads them into both the toolbar closure and the slash dispatcher.
- SLASH_COMMANDS table extended so /help lists every new entry.
- Two new module-level tables provide the source-of-truth descriptions: _DEEPAGENTS_BUILTIN_TOOLS (9 entries) and _MODE_DESCRIPTIONS (4 entries, keyed by PermissionMode Literal).
- 10 new tests in tests/unit/test_app_slash.py:
- TestSlice8SlashCommandsListed (1) — all four new commands appear in SLASH_COMMANDS.
- TestToolsCommand (1) — output names representative built-ins (write_todos) and Quoriv tools (git_status, run_tests) under their respective headings.
- TestMemoryCommand (2) — empty-cwd reports "No memory files found"; PROJECT.md is detected and its byte count is shown.
- TestModeCommand (3) — ask mode lists every gated tool (including the git writes from Slice 5b); yolo mode reports nothing-gated; the available-modes table is always rendered.
- TestCostCommand (1) — output names the Slice 9 deferral plainly so the user knows where token tracking lands.
- TestBuildStatusLine (2) — pure-function checks: every field appears, thread_id is truncated to 8 chars, and the delimiter shape is stable for edge-case paths.
Test count: 350 → 360 (+10). All gates green.
Phase 1 Slice 9 — Local JSONL trace log + token-aware /cost¶
quoriv.observability.trace— new module withTraceLogger. Append-only JSONL writer per chat thread; lazy file creation (no on-disk artifact until first write);_sanitize()recursively coerces non-JSON-native values (Path, dataclasses, sets, arbitrary objects viastr()) so unserializable values never raise. Public API:pathproperty,log(event, **fields),read_events(),token_totals().read_events()tolerates corrupt lines and non-dict JSON values (skipped silently — a single bad line never poisons the log).token_totals()sums across everymodel_completeevent with sensible fallback (input + outputwhentotal_tokensis absent).quoriv.core.persistence— newtrace_path(cwd, thread_id)andtraces_dir(cwd)helpers, both re-exported fromquoriv.core. Canonical location:<cwd>/.quoriv/traces/<thread_id>.jsonl. Mirrors thedb_path/registry_pathpattern from Slice 7.quoriv.app—_interactive_loopnow owns aTraceLogger. The logger rotates alongsidethread_idwhen/clear//load//resumeswitch threads (old log file remains on disk so a future/loadcan return to it and see its history)._drive_turnbrackets the turn withturn_start/turn_endevents;_stream_eventsrecordsmodel_complete(withinput_tokens/output_tokens/total_tokensextracted from the finalAIMessage.usage_metadatawhen LangChain provides it),tool_start(with args), andtool_end(with output preview, truncated at 500 chars)./costis no longer a stub — it readstracer.token_totals()for the active thread and prints aligned counts plus the trace file path. Surface still notes that per-provider dollar-cost calculation is deferred (waiting on a rate table)._handle_slashgained a keyword-onlytracer: TraceLogger | None = Noneparameter, defaulted so existing test calls without a tracer still type-check. WhenNone,/costfalls back to a "no logger attached" message.- 28 new tests:
tests/unit/observability/test_trace.py(24):TestSanitize(7) — primitives passthrough, dict recursion, Path coercion, dataclasses, sets/tuples, fallback tostr(), non-string dict keys.TestTraceLoggerWrites(7) —.pathproperty, lazy file creation, parent-dir auto-creation, JSONL append, ISO-8601 UTC timestamps, supplied fields preserved, unserializable values sanitized.TestTraceLoggerReads(4) — missing-file empty list, round trip, malformed-line resilience, non-dict JSON skipped.TestTokenTotals(5) — empty log zeros, multi-event sum,total_tokensfallback frominput + output, ignores non-model_completeevents, ignores non-int token fields.TestTracePathIntegration(2) — canonical filesystem location, round-trip through a fresh logger instance.tests/unit/test_app_slash.py(4) —TestCostCommandrewritten: no-tracer reports "No trace logger"; empty log reports zero calls + trace file path; populated log shows token totals (input/output/total/calls); trace file path always surfaced.
Test count: 360 → 388 (+28). All gates green.
Phase 1 Slice 6b — Parsed pytest counts in run_tests¶
quoriv.tools.tests._parse_pytest_summary— new pure helper that extracts{passed, failed, errors, skipped, duration_seconds}from the terminal summary line emitted by pytest. Regex anchors on the"in <duration>s"suffix so unrelated===separator lines never match; the last match in the output wins so per-session header lines do not pollute the result. Returns the all-Noneshape when no summary line is found — the caller can tell "couldn't parse" (e.g., pytest crashed at collection) from "0 of everything".run_testsreturn shape gains asummaryblock with the parsed fields whenframework == "pytest". Other frameworks get the placeholder all-Nonesummary until Slice 6c lands cargo / go / npm parsers — keeping the shape stable lets the LLM checksummary["passed"]once instead of branching per framework.- Parser reads
stdout + stderrconcatenated so CI environments that redirect pytest output to stderr still get counts. - 13 new tests in
tests/unit/tools/test_runner.py: TestParsePytestSummary(9) — passing-only, failed-only, mixed (passed/failed/errors), passed+skipped,"no tests ran", plural"errors", no-summary-line returns all-None, empty input returns all-None, last-match-wins when multiple===lines.TestRunTests(4 new) — pytest summary surfaces counts on the result dict; stderr summary parsed; pytest with no summary line returns null counts; non-pytest framework gets the all-None placeholder.
Test count: 388 → 401 (+13). All gates green.
Phase 1 Slice 9c — Per-provider dollar-cost estimates in /cost¶
quoriv.observability.cost— new module with the shipping rate table for/cost:ProviderRate— frozen dataclass{input_per_1k: float, output_per_1k: float}(USD).RATES: dict[str, ProviderRate]— 17 entries keyed byprovider:modelprefix. OpenAI (gpt-5,gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4,gpt-3.5-turbo), Anthropic (claude-opus-4,claude-sonnet-4,claude-haiku-4,claude-3-5-sonnet,claude-3-5-haiku,claude-3-opus,claude-3-haiku), Gemini (gemini-1.5-pro,gemini-1.5-flash), and local sentinels (ollama:,vllm:→ free).lookup_rate(model_id)— longest-prefix match so"openai:gpt-4o-mini"resolves to its own entry rather than the broader"openai:gpt-4o"; versioned ids like"openai:gpt-4o-2024-08-06"fall back to the prefix.estimate_cost(rate, input_tokens, output_tokens)— returns{input_cost_usd, output_cost_usd, total_cost_usd}from the per-1k rate.quoriv.observability.__init__re-exportsRATES,ProviderRate,lookup_rate,estimate_cost./costis no longer dollar-blind._handle_costgained a keyword-onlymodel_idparameter, threaded through_handle_slash. When the model has a rate, output now includes an "Estimated cost" block with input/output/total dollar amounts to 4-decimal precision. When no rate is configured, a friendly "updatequoriv.observability.cost.RATES" hint is printed alongside the token totals — the agent still gets actionable info without a stale or fabricated dollar figure.- 20 new tests:
tests/unit/observability/test_cost.py(17):TestProviderRate(2) — frozen + value equality.TestRatesTable(5) — non-empty, every entry isProviderRate, no negative rates, every known provider (openai/anthropic/gemini/ollama) has at least one row, every key contains a colon.TestLookupRate(6) — exact match, longest-prefix wins, versioned suffix falls back to the prefix, ollama sentinel matches every model, unknown provider returnsNone, empty id returnsNone.TestEstimateCost(4) — zero tokens, basic math, sub-thousand tokens, free rate zeros out.tests/unit/test_app_slash.py(3 new inTestCostCommand): known model shows dollar estimate with provider id; unknown model shows "No rate configured for ... — update RATES"; ollama renders$0.0000.
Test count: 401 → 421 (+20). All gates green.
Phase 1 Slice 6c — cargo / go / npm output parsers¶
quoriv.tools.testsextended with three pure helpers that mirror_parse_pytest_summary's contract — input = combined stdout+stderr, output ={passed, failed, errors, skipped, duration_seconds}with the all-Nonefallback when nothing matches:_parse_cargo_summary— matchestest result: ok. N passed; M failed; K ignored; ...; finished in Xslines (one per test binary). Multi-crate workspaces produce multiple summary lines; counts and durations are summed across all of them.ignoredmaps toskipped. cargo doesn't distinguish errors from failures soerrorsis always 0._parse_go_summary— counts per-test status lines (--- PASS:/--- FAIL:/--- SKIP:) for the count fields, and sums per-package summary durations (ok pkg X.XXs/FAIL pkg X.XXs). When only one or the other appears, the absent fields stay None or 0 in a documented way._parse_npm_summary— parses jest / vitest-style summary blocks (Tests: N passed, M failed, K total+Time: X.XX s). vitest'stodo/pendingcategories collapse intoskippedto keep the cross-runner shape stable. Other npm runners (mocha, ava, …) fall through to the all-Noneshape — they emit summaries in different formats and aren't covered by this slice._FRAMEWORK_PARSERSdispatch dict replaces theif chosen == "pytest"branch inrun_tests. Adding a new framework now means adding a marker file, command builder, and parser entry — all three concerns sit next to each other in the module.- The all-
Nonefallback still applies when a framework's runner emits output that doesn't match the expected summary shape (e.g., cargo compile error before tests run), so the caller can still tell "couldn't parse" from real zero counts. The Slice 6b test that exercised the placeholder path was renamed and rewritten to assert this contract under the new parsers. - 16 new tests in
tests/unit/tools/test_runner.py: TestParseCargoSummary(4) — single-package success, single-package failure, multi-package counts + duration sum, no-summary returns all-None.TestParseGoSummary(4) —--- PASS / FAIL / SKIPcounts with package-duration sum, package-summary-only zero counts but non-None duration, multiple passes only (no package summary → duration stays None), no-recognisable-output returns all-None.TestParseNpmSummary(5) — jest-style full summary, passing-only,todo+pendingcollapse toskipped, no-summary returns all-None, noTime:line keeps duration None.TestRunTests(3 new) — each new framework dispatches to its own parser end-to-end throughrun_tests.
Test count: 421 → 437 (+16). All gates green.
Phase 1 Slice 4b — Tree-sitter multi-language symbol intelligence¶
- Migrated the
astextra from the abandonedtree-sitter-languages(no Python 3.13 wheels) to the maintainedtree-sitter-language-pack>=0.6.0and bumpedtree-sitter>=0.24.0. The new pack ships bundled C wheels for Python 3.10-3.13 on Linux / macOS / Windows and covers ~80 languages. quoriv.repo.ast— new module:LANGUAGE_BY_EXTENSION(40+ entries),detect_language(path)by suffix,get_parser(language)lazy-loaded from the pack,is_available()so callers without the extra installed degrade gracefully. The lazy imports keep the rest of Quoriv working when someone installs without[ast].quoriv.repo.symbols— new module withSymbolfrozen dataclass and two public functions:extract_definitions(source, language, *, target=None)andfind_references(source, language, target). Per-languageDEFINITION_KINDSmaps tree-sitter node kinds to Quoriv symbol kinds for python / javascript / typescript / tsx / go / rust / java / kotlin / c / cpp / csharp / ruby / php / lua / elixir / swift.CONTAINER_KINDStracks scopes (class / struct / trait / impl / module / namespace / protocol / interface / enum) so nested method definitions record theirparent. Uses direct tree walks (notQueryCursor) because the language pack ships its ownNodeclass that's binary-incompatible with the publictree_sitterQuery API.quoriv.tools.ast_toolsexpanded:find_symbolis now multi-language. Python (.py/.pyi) keeps the stdlibastpath (no extra needed); every other extension routes through tree-sitter viaquoriv.repo.symbols.- New
@toolcallables:go_to_definition(name, path=".")— strict alias offind_symbolnamed for the agent's "jump-to-def" intent;find_references(name, path=".")— every identifier-like node whose text equalsname(definition + callers + type uses + field accesses). _iter_source_fileswalks the path, skipping common build / vendor dirs (.venv,venv,__pycache__,.git,build,dist,node_modules,target).QUORIV_TOOLSnow exposes 11 tools:find_symbol,go_to_definition,find_references, the 7 git tools, andrun_tests.- 63 new tests across three files:
tests/unit/repo/test_ast.py(26) — extension → language for 21 file types, table sanity, case insensitivity, parser smoke tests for python/go, unknown language →LookupError.tests/unit/repo/test_symbols.py(14) — Symbol frozenness, table coverage, Python def/class/method extraction with parent, Python reference search hits both def and call sites, Go type/method/function extraction, Go reference search across declarations and uses, TypeScript interface/type/class/method/function, Rust struct/trait/impl/function, graceful empty-list for unsupported languages.tests/unit/tools/test_ast_tools.py(23 new) —TestFindSymbolMultiLanguage(5) covers go/ts/rust +node_modules/targetskip + a mixed-language file tree.TestGoToDefinition(3) verifies alias semantics + registration.TestFindReferences(6) covers Go callsites, TS field access, empty/missing cases, registration.
Test count: 437 → 500 (+63). All gates green.
Phase 1 Slice 9d — Config-driven cost rates¶
quoriv.config.schema— two new Pydantic v2 models:CostRate(USD per 1,000 tokens withField(..., ge=0.0)on bothinput_per_1kandoutput_per_1k) andCostConfig(rates: dict[str, CostRate], defaults to empty). Both carryextra="forbid"so a typo in~/.quoriv/config.tomlfails loudly at validation time.QuorivConfiggains acost: CostConfig = Field(default_factory=CostConfig)section.quoriv.observability.cost.effective_rates(config) -> dict[str, ProviderRate]— merges the user'scost.ratesover a fresh copy of the built-inRATES. The built-in table is never mutated; the returned dict is fresh each call.effective_rates(None)returns a copy ofRATESso callers without a config object still get the standard table.quoriv.observability.cost.lookup_rategained an optional secondratesargument. PassingNonefalls back to the built-inRATES(legacy behaviour preserved). Passing the result ofeffective_rates(config)means longest-prefix lookup operates over the merged table — a user's fine-grainedanthropic:claude-opus-4-7entry naturally wins over the broader built-inanthropic:claude-opus-4prefix.quoriv.observability.__init__re-exportseffective_rates.quoriv.app.run_chatprecomputescost_rates = effective_rates(config)once per session and threads it through_interactive_loop→_handle_slash→_handle_cost. The new keyword-onlycost_ratesparameter carries aNonedefault on every layer so older test entry points keep working.- The "no rate configured" hint in
/costnow points the user at[cost.rates."{provider}:{model}"]in~/.quoriv/config.tomlrather than at the in-sourceRATESdict, matching the new override path. config.example.toml— documents the[cost.rates."provider:model"]block with example overrides, the non-negative-float constraint, and the longest-prefix lookup rule.- 20 new tests:
tests/unit/config/test_schema.pyTestCostConfig(8) — empty defaults,QuorivConfig.cost.rates == {}, rate accepts 0.0, rate rejects negative input / output, missing fields rejected, extra fields rejected on bothCostRateandCostConfig, full round-trip throughQuorivConfig.model_validate.tests/unit/observability/test_cost.pyTestLookupRateCustomTable(3) — uses supplied table not built-in, longest-prefix within custom table, explicitNonefalls back to built-in.tests/unit/observability/test_cost.pyTestEffectiveRates(6) —Nonereturns a built-ins copy that is safe to mutate, empty config matches built-ins, user override replaces built-in by key (other entries survive), user can add a new provider, callingeffective_ratesdoes not mutateRATES, a more specific user key wins over a broader built-in prefix via merged-table longest-prefix.tests/unit/test_app_slash.pyTestCostCommand(3 new) — user rate override shadows the built-in (1k input @ $0.05 + 1k output @ $0.20 → totals appear in/cost); user rate can add an unknown model so a previously rateless id now renders an estimate; legacy "No rate configured" hint absent when an override exists.
Test count: 500 → 520 (+20). All gates green.
Phase 1 Slice 8b — Live /mode switch¶
/mode <name>now rebuilds the compiled DeepAgent in place against the sameAsyncSqliteSavercheckpointer. The running thread's conversational state survives the switch — only theinterrupt_on=dict changes (viainterrupt_on_for_mode(new_mode)). No restart, no new thread id.quoriv.app._SlashResultgained a third slot,new_mode: PermissionMode | None. The interactive loop branches on it after a slash dispatch: when set, it callsbuild_agent(config, model_override=..., cwd=..., mode=new_mode, checkpointer=saver)and reassigns the localagent. The_toolbarclosure reads the latestpermission_modeat call time, so the status line reflects the new mode on the next prompt redraw without explicit refresh._handle_modeis now mode-aware: with no argument it preserves the Slice 8 display (current mode + gated tools + menu); with an argument it normalises to lowercase, validates againstALLOWED_MODES, short-circuits on same-mode requests with a friendly"Already in <mode>"note, and surfaces unknown values with the valid set listed inline. The closing line of the display variant now readsSwitch live with /mode <name>instead of the staleLive-switch lands in a later slice.quoriv.app.run_chatand_interactive_loopthreadconfig,model_override, and the openAsyncSqliteSaverthrough as keyword-only args. All three default toNoneso legacy single-mode test entry points keep working without modification.SLASH_COMMANDS["/mode"]description updated from"Show the current permission mode and what each mode gates"to"Show permission mode (no arg) or live-switch (/mode <name>)"so/helpadvertises the new form.- 6 new tests in
tests/unit/test_app_slash.py::TestModeCommand: test_no_arg_does_not_switch— display-only path returns_SlashResult(new_mode=None).test_valid_arg_returns_new_mode—/mode yolofromaskreturns_SlashResult(new_mode="yolo")silently (the loop, not the handler, prints the confirmation).test_valid_arg_with_uppercase_normalized—/mode YOLOnormalises to"yolo".test_same_mode_does_not_switch—/mode askwhile inaskprints"Already in"and returns no switch.test_invalid_arg_reports_error—/mode bananaprints the unknown-mode error with the offending input and the full valid set; returns no switch.test_all_modes_can_be_targets— round-trips every valid mode as a target from a different starting mode to catch anyPermissionModeliteral-narrowing regression in the dispatch path.
Test count: 520 → 526 (+6). All gates green.
Phase 1 Slice 9b — End-to-end stubbed-LLM turn test¶
tests/integration/test_e2e_stubbed_chat.py— first integration test: drives a full user turn throughquoriv.app._drive_turnagainst a realbuild_agent-compiled DeepAgent whose model is a_StubChatModel(alangchain_core.language_models.fake_chat_models.GenericFakeChatModelsubclass withbind_toolsshort-circuited toself, since the base class raisesNotImplementedErrorand DeepAgents binds tools at compile time).quoriv.core.agent.get_modelis monkeypatched to return the stub, so the test exercises the same code path the CLI uses — including the LangGraph event stream, theStreamRendererLive, theTraceLoggerwrites, theMemorySavercheckpointer, and thePathProtectionMiddleware. Mode isyoloto skip HITL interrupts so the agent finishes in one model turn.- 4 tests in
TestEndToEndTurn: test_drive_turn_writes_turn_start_and_end— verifies the trace bracket: first record isturn_startwith the originalthread_id/user_input/mode, last record isturn_endwith the matchingthread_id. That bracket is the contract/costand any future observability tooling depend on.test_drive_turn_records_model_complete— verifies at least onemodel_completerecord lands per turn (one stubAIMessage→ oneon_chat_model_end→ one trace entry).test_drive_turn_renders_stub_response— sanity that the LLM payload is actually rendered to the console buffer throughStreamRenderer, not just traced.test_status_line_built_from_session_context—_build_status_line(model_id, mode, cwd, thread_id)is unaffected by a completed turn and still returns a well-formed string with the expected fields, mode marker, truncated thread id, and three separators.- Catches future regressions where
_drive_turn/_stream_events/TraceLoggerdrift out of sync — a renamed LangGraph event key, a missing tracer call, or a status-line format change all surface here instead of in production.
Test count: 526 → 530 (+4). All gates green. With Slice 9b done, Phase 1 is complete.
Phase 2 Slice 1 — Memory wiring¶
quoriv.core.memory— new module:MemoryCandidateNamedTuple (label,path),memory_candidates(cwd)returning the ordered list of two candidates (global first as~/.quoriv/memory.md, project second as<cwd>/PROJECT.md), andresolve_memory_files(cwd)filtering to ones that exist viaPath.is_file()(so a directory namedPROJECT.mdis correctly rejected). The order matters: DeepAgents concatenates the list in load order under<agent_memory>in the system prompt, so global-then-project lets a project file refine a global note — same precedence rule the TOML loader uses.quoriv.core.agent.build_agentnow callsresolve_memory_files(root), converts to strings, and passes the result tocreate_deep_agent(memory=...). When the resolved list is empty, the argument isNone— DeepAgents documentsNone(not[]) as the contract for "don't attach the middleware", so we honor that.quoriv.core.__init__re-exportsMemoryCandidate,memory_candidates,resolve_memory_files.quoriv.app._handle_memoryrewritten to source its candidate list frommemory_candidates(cwd)rather than hardcoded paths. Each present file now gains a(loaded)tag, making it clear the agent'sMemoryMiddlewarehas actually seen the file — not just that it exists on disk.quoriv.app._render_welcomeadds aMemory: PROJECT.md, memory.mdline to the welcome panel when at least one file is loaded; silent when neither exists so first-time users don't see clutter.- 19 new tests:
tests/unit/core/test_memory.py(10):TestMemoryCandidates(5) — load-order, global path uses fake_home/.quoriv, project path tracks the supplied cwd, NamedTuple round-trip, different cwds yield different project paths.TestResolveMemoryFiles(5) — neither / project-only / global-only / both / directory-with-the-name rejected.tests/unit/core/test_agent.py::TestBuildAgentMemoryWiring(4) — monkeypatchedcreate_deep_agentcaptures kwargs and verifiesmemory=Nonewhen no files,memory=[...PROJECT.md]when only project file present,memory=[...memory.md]when only global, and global-then-project ordering when both exist.tests/unit/test_app_slash.py::TestMemoryCommand(2 new) —(loaded)tag appears next to present files; absent files do not show the tag. The two pre-existing/memorytests gained thefake_homefixture so a developer's real~/.quoriv/memory.mddoesn't leak into the assertion.tests/unit/test_app_slash.py::TestWelcomePanel(3) — no memory line when neither file present; PROJECT.md surfaces in the panel; global memory.md surfaces.
Test count: 530 → 549 (+19). All gates green.
Phase 2 Slice 2 — quoriv init¶
- New
quoriv init [PATH]command inquoriv.cliscaffolds a starterPROJECT.mdat the target directory (defaults tocwd). Refuses to overwrite an existing file by default — exits non-zero with a "pass--forceto overwrite" hint so a CI script can't silently nuke a hand-editedPROJECT.md.--force/-foverrides. quoriv.core.memory.PROJECT_MEMORY_TEMPLATE— the starter content. One-screen template: top-level note explaining what Quoriv does with the file, then sections forProject overview,Architecture,Conventions,Useful commands,Things to avoid. Designed to fit on one screen so users can scan the shape before editing.- Removed an outdated
# noqa: TC003onfrom pathlib import Pathincli.py— the import is now used at runtime by the newinitcommand'sPatharguments andPath.cwd()fallback, not just in type annotations. - 6 new tests in
tests/unit/test_cli.py::TestInit: test_creates_project_md_in_target_dir— happy path with explicit PATH; "Created" message; file lands at<dir>/PROJECT.md.test_refuses_to_overwrite_by_default— pre-existingPROJECT.mdsurvives untouched; exit code non-zero; output mentions--force.test_force_overwrites_existing—--forcereplaces the file; "Overwrote" message; old content gone; new content carries the template header.test_force_short_flag—-fshort form works the same as--force.test_template_covers_expected_sections— guards the UX of the starter: all six headings must be present (regression catcher for an accidental template gut).test_no_path_writes_to_cwd— without an argument, writes toPath.cwd()(monkeypatched totmp_pathbecause Typer'sCliRunnerdoesn't chdir).TestTopLevel::test_help_lists_commandsupdated to includeinitin the expected command list.
Test count: 549 → 555 (+6). All gates green.
Phase 2 Slice 3 — "Always allow" session allowlist¶
quoriv.permissions.allowlist— new module:SessionAllowlist, an in-memory set of tool names the user has promoted from a one-time HITL approval to a session-persistent one.__contains__,allow,clear,__len__,tools()returning an immutablefrozensetsnapshot. Keyed by tool name only (matches the granularityinterrupt_on=itself uses).quoriv.ui.prompts.DecisionTypegained"approve_always". The interactive prompt now readsapprove / reject / always [a/r/A].parse_choiceacceptsA,aa, and the spelled-outalways(case-insensitive) for the new decision; bare lowercaseastill means "approve once" so a user can't accidentally promote a tool by typing the same key as before.quoriv.app._collect_decisionsconsults the allowlist before callingprompt_approval: matching tools auto-resolve toapprovewith a[dim]auto-approved <tool> (allowlisted this session)[/dim]note. When the user picksapprove_always, the tool name is added to the allowlist and a[green]Will auto-approve …[/green]confirmation is rendered.auto_deny(read-only mode) always wins over the allowlist — a remembered approval doesn't unlock read-only.quoriv.app._decision_payloadmapsapprove_always→{"type": "approve"}on the wire. DeepAgents only speaksapprove/reject/edit/respond; the allowlist promotion is a Quoriv UX layer.quoriv.app._interactive_loopcreates oneSessionAllowlistperrun_chatinvocation and threads it through_drive_turn→_collect_decisions. Survives/clear(the user promoted these tools deliberately; rotating the thread shouldn't silently un-promote them).- 21 new tests:
tests/unit/permissions/test_allowlist.py::TestSessionAllowlist(7) — empty default,allowadds, idempotency,__contains__tolerates non-strings,tools()returns an immutable snapshot,clear, independent multi-tool tracking.tests/unit/ui/test_prompts.py::TestParseChoice::test_approve_always_aliases(1 parametrized × 6 inputs) — coversA,aa,always,Always,ALWAYS, and whitespace-padded forms. Thetest_approve_aliasesparametrize lost the bare"A"entry, since capitalAis now reserved forapprove_always; the comment in-place documents the split.tests/unit/test_app_decisions.py(9) —TestDecisionPayload(3): approve passthrough, reject keeps message, approve_always → approve.TestCollectDecisionsAllowlist(6): allowlisted tool skips prompt; non-allowlisted still prompts; approve_always promotes; second call uses the promoted entry (single prompt across two calls);auto_denywins over allowlist;Noneallowlist preserves legacy "always prompt" behavior.
Test count: 555 → 576 (+21). All gates green.
Phase 2 Slice 4 — Per-task model routing (built-in subagents)¶
quoriv.core.subagents— new module: three built-inSubAgentspecs (researcher,debugger,reviewer) with fixed system prompts and abuild_subagents(config)helper that resolves each role's configured model token ("default"/"fast"/"strong"/ a literal"provider:name") into a model instance viaquoriv.models.factory.get_model. Going through the Quoriv factory rather than letting DeepAgents callinit_chat_modeldirectly keeps the keychain-aware key lookup consistent across the main agent and every subagent.quoriv.config.schema— newSubAgentRoleConfig(model: str = "default") andSubAgentsConfig(researcher/debugger/reviewer, with defaults"fast"/"strong"/"strong").extra="forbid"on both — invented role names and stray fields fail fast at validation. Added toQuorivConfigas thesubagentssection.quoriv.core.agent.build_agentnow callsbuild_subagents(config)and passes the result tocreate_deep_agent(subagents=...). Returned shapes are typed as DeepAgents'SubAgentTypedDictviacast(imported underTYPE_CHECKINGso the runtime import surface stays small).config.example.tomldocuments the new[subagents.*]blocks with the token taxonomy and per-role defaults.- 16 new tests (and 1 integration-test fix):
tests/unit/core/test_subagents.py::TestResolveModelToken(5) —default/fast/strongresolve through[model]; literalprovider:namepasses through; overridden[model]section flows to all tokens.tests/unit/core/test_subagents.py::TestBuildSubagents(6) — three roles in fixed order; every role carries name/description/system_prompt/model; researcher usesmodel.fastby default, debugger/reviewer usemodel.strong(verified by monkeypatchingquoriv.core.subagents.get_modeland capturing the requested ids); user can redirect a role to a literal model; user can redirect a role to"default"; role descriptions mention their job (the only signal the main agent has when routing).tests/unit/core/test_subagents.py::TestSubAgentsConfigSchema(4) — default routing, partial-override preserves other roles, unknown role rejected, extra-field-within-role rejected.tests/unit/core/test_subagents.py::TestBuildAgentSubagentsWiring(1) —build_agentpasses a list of three subagents named[researcher, debugger, reviewer]intocreate_deep_agent(uses the samecreate_deep_agentcapture pattern as the memory-wiring test).tests/integration/test_e2e_stubbed_chat.pyupdated to monkeypatch bothquoriv.core.agent.get_modelandquoriv.core.subagents.get_model, since each subagent now resolves its own model. Existing 4 integration tests still pass unchanged.
Test count: 576 → 592 (+16). All gates green.
Phase 2 Slice 5 — Python plugin API (entry-point loader)¶
quoriv.plugins.loader— new module. Third-party packages register tools by declaring an entry point under thequoriv.pluginsgroup ([project.entry-points."quoriv.plugins"]in theirpyproject.toml). The named callable takes no arguments and returns an iterable of LangChainBaseToolinstances.discover_plugin_tools(disabled=...)is called frombuild_agentat session start and merges the returned tools afterQUORIV_TOOLSintocreate_deep_agent(tools=...).list_plugins()exposes aPluginRecordper entry point (name, target, captured tool names, load error) for introspection.- Defensive on purpose: an entry point that fails to import, a factory that raises, a non-iterable return, or a mixed list with non-tool items each logs a
loguruwarning and is dropped rather than failing the session. Users should be able to start a chat even if one plugin's tree is currently broken. quoriv.config.schema.PluginsConfig(disabled: list[str],extra="forbid") added toQuorivConfig. Lets a user opt a specific plugin out without uninstalling its package — match by entry-point name.quoriv.core.agent.build_agentcallsdiscover_plugin_tools(disabled=config.plugins.disabled)and merges the result afterQUORIV_TOOLSbefore handing the list tocreate_deep_agent(tools=...).config.example.tomldocuments the new[plugins]block with a worked example showing thepyproject.tomlsnippet a plugin author writes.- 19 new tests:
tests/unit/plugins/test_loader.py::TestDiscoverPluginTools(4) — empty registry → empty list; one plugin → one tool; tools from multiple plugins merge; generator-returning factory normalises to a list.tests/unit/plugins/test_loader.py::TestDisabledFiltering(3) — disabled list filters by entry-point name;disabledaccepts any iterable (set works); empty disabled loads everything.tests/unit/plugins/test_loader.py::TestDefensiveLoading(4) — import error → working plugin still loads; factory raising → working plugin still loads; non-iterable return → silently dropped; mixed valid+invalid items → valid kept.tests/unit/plugins/test_loader.py::TestListPlugins(3) — one record per entry point; load failure captured witherrorstring; factory failure captured witherrorstring.tests/unit/config/test_schema.py::TestPluginsConfig(5) — empty default;QuorivConfig.plugins.disabled == []on empty input; explicit disabled list round-trips;extra="forbid"; full round-trip throughQuorivConfig.model_validate.
Test count: 592 → 611 (+19). All gates green.
Phase 2 Slice 6 — MCP client (Model Context Protocol)¶
quoriv.config.schema.MCPServerConfig(+MCPConfig) — new sections. Each server picks a transport (stdioorsse, defaultstdio) and the per-transport fields:command/args/envfor stdio,url/headersfor sse. Amodel_validatorenforces both the "required field per transport" rule (stdioneedscommand;sseneedsurl) and the "no cross-transport fields" rule — a stdio server with aurlis clearly a typo and fails loudly.extra="forbid"throughout. Added toQuorivConfigas themcpsection, withmcp.servers: dict[str, MCPServerConfig]keyed by user-supplied server name.quoriv.plugins.mcp.client— new module.load_mcp_tools(servers)is an async loader: translates eachMCPServerConfiginto the connection dict shapelangchain_mcp_adapters.client.MultiServerMCPClientexpects, instantiates the client, and callsawait client.get_tools(). The lazy import forlangchain_mcp_adaptersis wrapped intry/except ImportErrorso users without the[mcp]install extra get a logged warning and an empty list rather than a hard crash. Every other failure path (client construction,get_tools()) is also caught + logged + returns[]— defensive throughout, same pattern as the entry-point plugin loader.quoriv.plugins.mcp.__init__re-exportsload_mcp_tools.quoriv.core.agent.build_agentgained a keyword-onlyextra_tools: list[Any] | None = Noneparameter. The tools list passed tocreate_deep_agentis now[*QUORIV_TOOLS, *plugin_tools, *extra_tools]— MCP-discovered tools land last so any future tool-shadow rule (a user override) stays explicit by position.quoriv.app.run_chatcallsawait load_mcp_tools(config.mcp.servers)before constructing the agent and threads the result through_interactive_loop→build_agentand the live/mode-rebuild call path. The async load happens inrun_chat(the only async point in the chat startup) precisely becausebuild_agentstays sync.pyproject.toml[mcp]install extra addslangchain-mcp-adapters>=0.2.0alongside the existingmcp>=1.0.0.config.example.tomldocuments the new[mcp.servers.NAME]blocks with both stdio and sse worked examples (the referencemcp-server-fetchstdio server and a hypothetical GitHub SSE endpoint with a bearer token).- 27 new tests:
tests/unit/plugins/test_mcp_schema.py::TestMCPServerConfigStdio(5) — minimal stdio, args + env, default transport, missingcommandrejected, cross-transporturlrejected.tests/unit/plugins/test_mcp_schema.py::TestMCPServerConfigSse(4) — minimal sse, headers, missingurlrejected, cross-transportcommandrejected.tests/unit/plugins/test_mcp_schema.py::TestMCPServerConfigStrict(2) —extra="forbid"and unknown transport rejection.tests/unit/plugins/test_mcp_schema.py::TestMCPConfig(4) — empty default;QuorivConfig.mcp.servers == {}round-trip; mixed stdio + sse servers map round-trip;extra="forbid"at theMCPConfiglevel.tests/unit/plugins/test_mcp_client.py::TestConnectionDict(5) — stdio minimal, env included when set, env omitted (notNone) when not set, sse minimal, headers included when set.tests/unit/plugins/test_mcp_client.py::TestLoadMcpToolsEdgeCases(2) — empty servers map → empty list (fast-path, no import); missinglangchain-mcp-adapters(simulated by monkeypatchingbuiltins.__import__) → logged warning + empty list.tests/unit/plugins/test_mcp_client.py::TestLoadMcpToolsHappyPath(2) — monkeypatchedMultiServerMCPClientcaptures the translated connection dict (verifies fetch stdio + github sse round-trip correctly); discovered tools flow back through.tests/unit/plugins/test_mcp_client.py::TestLoadMcpToolsFailures(2) —MultiServerMCPClient.__init__raising → empty list;await client.get_tools()raising → empty list.tests/unit/core/test_agent.py::TestBuildAgentMemoryWiring::test_extra_tools_appended_to_tool_list(1) —extra_toolslands at the end of the tools list passed tocreate_deep_agent(afterQUORIV_TOOLSand plugin tools). This is the samecreate_deep_agentcapture pattern used for the memory tests.
Test count: 611 → 638 (+27). All gates green. Phase 2 is complete.
Phase 3 Slice 1 — Anthropic provider¶
quoriv.models.anthropic— new provider module mirroringquoriv.models.openai. Builds alangchain_anthropic.ChatAnthropicinstance withmodel=spec.name, resolves the API key throughquoriv.config.keychain.get_api_key("anthropic")(env varANTHROPIC_API_KEYfirst, then OS keychain), raisesMissingAPIKeyErrorwith the correct env-var name when both are absent. Requires the[anthropic]install extra.quoriv.models.factory._PROVIDERSregistersanthropic.list_providers()now returns["anthropic", "openai"](sorted). The four remaining Phase 3 providers (Gemini / Ollama / vLLM / OpenRouter) stay commented out — they ship in later slices.- CI install line in
.github/workflows/test.ymland the mypy job in.github/workflows/lint.ymladdanthropicto the extras:pip install -e ".[dev,ast,mcp,anthropic]". Same pattern as the Slice 6 fix that pulled in[mcp]— keeps the test suite and mypy able to import the new provider's deps. - 7 new tests:
tests/unit/models/test_anthropic.py::TestMissingKey(1) — provider + env-var name on the raisedMissingAPIKeyError.tests/unit/models/test_anthropic.py::TestBuildFromEnv(1) —ANTHROPIC_API_KEYenv var →ChatAnthropicwith the model name preserved.tests/unit/models/test_anthropic.py::TestBuildFromKeyring(1) — keychain fallback works when env is unset.tests/unit/models/test_anthropic.py::TestIdentifierShapes(1) — modern dash-versioned ids likeanthropic:claude-opus-4-7flow throughModelSpec.parseunchanged.tests/unit/models/test_anthropic.py::TestKwargsForwarded(2) —temperatureandmax_tokensforwarded to the underlyingChatAnthropic(withmax_tokens_to_samplefallback for older lib versions).tests/unit/models/test_factory.py::TestListProviders::test_anthropic_registered_in_phase_3(1) — sanity that the factory dispatch picks it up.
Test count: 638 → 645 (+7). All gates green.
Phase 3 Slice 2 — Ollama provider (local, no API key)¶
quoriv.models.ollama— new provider module. Buildslangchain_ollama.ChatOllamawithmodel=spec.name. Unlike the OpenAI / Anthropic providers, Ollama runs locally (or on a user-controlled host), so no API key is required — the provider is intentionally absent fromPROVIDER_ENV_VARS. No network call at construction time, sobuild_agentcan succeed even when the Ollama server isn't running yet (connection errors surface at first invocation, not at startup). Requires the[ollama]install extra.quoriv.models.factory._PROVIDERSregistersollama.list_providers()now returns["anthropic", "ollama", "openai"](sorted).- The CI install line in
.github/workflows/test.ymland the mypy job in.github/workflows/lint.ymladdollamato the extras:pip install -e ".[dev,ast,mcp,anthropic,ollama]". Same pattern as P3-1. - Identifier shape preserved through
ModelSpec.parse: Ollama tags carry an embedded colon (qwen2.5-coder:32b), and the parser splits on the first colon only — already documented and tested intest_base.py, now re-asserted at the provider level forqwen2.5-coder:32bandllama3.1:70b-instruct-q4_0. - 7 new tests:
tests/unit/models/test_ollama.py::TestBuild(2) — returnsChatOllamawith the model name preserved; no network call at construction (build succeeds for an unpulled model).tests/unit/models/test_ollama.py::TestModelNameWithTag(2) — second colon (the tag) stays attached tospec.name; real-worldllama3.1:70b-instruct-q4_0shape round-trips.tests/unit/models/test_ollama.py::TestKwargsForwarded(2) —base_urlforwarded (for non-default hosts like Docker / remote boxes),temperatureforwarded.tests/unit/models/test_factory.py::TestListProviders::test_ollama_registered_in_phase_3(1) — factory dispatch picks it up.
Test count: 645 → 652 (+7). All gates green.
Phase 3 Slice 3 — Google Gemini provider¶
quoriv.models.gemini— new provider module. Buildslangchain_google_genai.ChatGoogleGenerativeAIwithmodel=spec.nameandgoogle_api_key=...(the LangChain wrapper's chosen kwarg name; differs fromapi_keyused by OpenAI / Anthropic). Resolves the API key throughget_api_key("gemini")—quoriv.config.keychainalready routes bothgeminiandgoogleprovider names to theGOOGLE_API_KEYenv var, so the precedence rules match the other cloud providers. RaisesMissingAPIKeyError("gemini", "GOOGLE_API_KEY")when both env and keychain are absent. Requires the[gemini]install extra.quoriv.models.factory._PROVIDERSregistersgemini.list_providers()now returns["anthropic", "gemini", "ollama", "openai"](sorted).- CI install line in
.github/workflows/test.ymland the mypy job in.github/workflows/lint.ymladdgeminito the extras:pip install -e ".[dev,ast,mcp,anthropic,ollama,gemini]". - 7 new tests:
tests/unit/models/test_gemini.py::TestMissingKey(1) —MissingAPIKeyError("gemini", "GOOGLE_API_KEY")shape.tests/unit/models/test_gemini.py::TestBuildFromEnv(2) — env-var key resolves to aChatGoogleGenerativeAIwith the model preserved;gemini-1.5-proround-trips alongsidegemini-1.5-flash.tests/unit/models/test_gemini.py::TestBuildFromKeyring(1) — keychain fallback when env is unset.tests/unit/models/test_gemini.py::TestKwargsForwarded(2) —temperatureforwarded;max_output_tokensforwarded (note: differs from OpenAI'smax_tokens/ Anthropic'smax_tokens_to_sample— Gemini uses its own name).tests/unit/models/test_factory.py::TestListProviders::test_gemini_registered_in_phase_3(1) — factory dispatch picks it up.
Test count: 652 → 659 (+7). All gates green.
Phase 3 Slice 4 — vLLM provider (OpenAI-compatible)¶
quoriv.models.vllm— new provider module. vLLM serves an OpenAI-compatible HTTP API, so under the hood the provider builds alangchain_openai.ChatOpenAIinstance pointed at the user's vLLM endpoint viabase_url. No extra SDK needed (uses the OpenAI SDK already in core deps).- Defaults that match local-first deployment:
base_url: explicit kwarg >$VLLM_BASE_URLenv var >http://localhost:8000/v1(the vLLM server's default OpenAI-compatible endpoint).api_key: explicit kwarg >$VLLM_API_KEYenv (already inPROVIDER_ENV_VARS) > keychain > placeholder"EMPTY". Unlike OpenAI / Anthropic / Gemini, vLLM never raisesMissingAPIKeyError— most local vLLM deployments don't enforce auth andChatOpenAIaccepts any string asapi_key, so falling through to a placeholder keeps the provider building cleanly. The keychain map already listsvllm.quoriv.models.factory._PROVIDERSregistersvllm.list_providers()now returns["anthropic", "gemini", "ollama", "openai", "vllm"](sorted).- No CI extras change needed — vLLM uses the OpenAI SDK already in the core dependency list.
- 10 new tests:
tests/unit/models/test_vllm.py::TestDefaults(3) — returnsChatOpenAIwith model preserved; default base_url applied when nothing is configured; placeholder"EMPTY"api_key when neither env nor keychain has one (read viaSecretStr.get_secret_value()since LangChain stores it secret-style).tests/unit/models/test_vllm.py::TestEnvVarPrecedence(3) —$VLLM_BASE_URLoverrides the built-in default;$VLLM_API_KEYis honored; keychain wins when env is absent.tests/unit/models/test_vllm.py::TestKwargOverrides(2) — explicitbase_url=kwarg beats$VLLM_BASE_URL; explicitapi_key=kwarg beats$VLLM_API_KEY.tests/unit/models/test_vllm.py::TestKwargsForwarded(1) — additional kwargs liketemperatureflow through toChatOpenAI.tests/unit/models/test_factory.py::TestListProviders::test_vllm_registered_in_phase_3(1) — factory dispatch picks it up.
Test count: 659 → 669 (+10). All gates green.
Phase 3 Slice 5 — OpenRouter provider¶
quoriv.models.openrouter— new provider module. OpenRouter routes hundreds of models from different vendors through a single OpenAI-compatible API, so the provider builds alangchain_openai.ChatOpenAIinstance pointed athttps://openrouter.ai/api/v1(overridable viabase_url=kwarg for OpenRouter proxies / staging endpoints). API key resolved viaget_api_key("openrouter")(envOPENROUTER_API_KEYfirst, then keychain — both already inPROVIDER_ENV_VARS). Unlike vLLM, the key is required — OpenRouter is a paid service — soMissingAPIKeyErroris raised when both sources are empty.- Identifier shape:
openrouter:<vendor>/<model>(e.g.openrouter:anthropic/claude-3.5-sonnet,openrouter:meta-llama/llama-3.1-405b-instruct). The vendor/model slash is part of the name half;ModelSpec.parsepreserves it because the split is on the first colon only. quoriv.models.factory._PROVIDERSregistersopenrouter.list_providers()now returns["anthropic", "gemini", "ollama", "openai", "openrouter", "vllm"](sorted) — all 6 Phase 3 providers wired.- No CI extras change — OpenRouter uses the OpenAI SDK already in core dependencies.
- 8 new tests:
tests/unit/models/test_openrouter.py::TestMissingKey(1) —MissingAPIKeyError("openrouter", "OPENROUTER_API_KEY")shape.tests/unit/models/test_openrouter.py::TestBuildFromEnv(2) — env-var key resolves toChatOpenAIwith model preserved; defaultbase_urlpoints at OpenRouter's cloud endpoint.tests/unit/models/test_openrouter.py::TestBuildFromKeyring(1) — keychain fallback when env is unset.tests/unit/models/test_openrouter.py::TestIdentifierShapes(1) — vendor/model slash survives throughModelSpec.parseand reachesChatOpenAI.model_name.tests/unit/models/test_openrouter.py::TestKwargOverrides(2) — explicitbase_url=overrides the default;temperatureforwards through.tests/unit/models/test_factory.py::TestListProviders::test_openrouter_registered_in_phase_3(1) — factory dispatch picks it up.
Test count: 669 → 677 (+8). All gates green. All 6 Phase 3 providers (Anthropic, Ollama, Gemini, vLLM, OpenRouter, plus the pre-existing OpenAI) are now wired.
Phase 3 Slice 6 — web_fetch tool¶
quoriv.tools.web— new module shippingweb_fetch(url, max_chars): a small@toolthat wrapshttpx.Client.getso the agent can pull text from a URL during a turn. Returns a structured dict (status_code,content_type,text,truncated,url) on success or{"error": ..., "url": ...}on network failure — mirrors thedict[str, Any]+"error"key contract every other Quoriv tool uses.- Bounded output:
max_charsdefaults to 10_000 (~2.5-3k tokens) — enough for most pages without blowing the model's context window. Larger bodies get truncated with an explicit"… (truncated, +N chars)"marker so the agent knows more content exists. Pages at the exact cap are not marked truncated. - Defensive:
httpx.HTTPError(covers connect failures, DNS, timeouts, redirect-cap exceeded) is caught and turned into the error dict — a bad URL never propagates an exception up through the tool call.follow_redirects=Trueandtimeout=30sare baked in. - Sync, not async — matches every other Quoriv tool. The agent's
ToolExecutorruns it in a thread pool if needed. - Registered as the 12th entry in
QUORIV_TOOLS.quoriv.tools.__init__re-exportsweb_fetch. web_searchis intentionally deferred to a later slice — picking the search backend (Tavily, Brave, SerpAPI, …) deserves its own scope; shipping the fetch tool first keeps this slice self-contained and free of new API-key dependencies.- 11 new tests in
tests/unit/tools/test_web.py: TestWebFetchHappyPath(4) — expected dict keys, short body untouched,status_code+content_typeround-trip, URL forwarded to the client.TestTruncation(3) — body past cap truncated with+N charsmarker, body at exact cap not truncated, default cap applies when not specified.TestNetworkFailure(2) —httpx.ConnectErrorandhttpx.ReadTimeoutboth surface as error dicts with the URL preserved and notext/status_codefields.TestToolSurface(2) —web_fetchlives inQUORIV_TOOLSso the agent sees it; description mentions "fetch" + "url" so the model can route to it.
Test count: 677 → 688 (+11). All gates green.
Phase 3 Slice 7 — web_search tool (Tavily backend)¶
quoriv.tools.web.web_search(query, max_results, include_domains, exclude_domains)— new@toolthat hits Tavily's LLM-friendly search API and returns ranked result snippets. Each result row is normalised to{title, url, content, score}— Tavily's heavierraw_contentfield is intentionally dropped so the list stays compact in the agent's context.- Defensive everywhere: the
tavily-pythonSDK lives in a new[search]install extra. A user without the extra still gets a working session; the tool just returns{"error": "Tavily SDK not installed (...)..", "query": ...}. Same shape if theTAVILY_API_KEYis missing or if the upstream API raises — the agent should be able to recover instead of dying mid-turn. quoriv.config.keychain.PROVIDER_ENV_VARSgains atavily → TAVILY_API_KEYentry. Key resolution flows through the existingget_api_key("tavily")helper (env var first, then keychain) — same precedence as the model providers.pyproject.tomladds a new[search]install extra:search = ["tavily-python>=0.7.0"]. CI install lines intest.ymland the mypy job inlint.ymladdsearchto the extras:pip install -e ".[dev,ast,mcp,anthropic,ollama,gemini,search]". Thetavily.*module pattern is added to mypy'signore_missing_importsoverrides since the SDK ships withoutpy.typed.quoriv.tools.__init__re-exportsweb_searchand registers it as the 13th entry inQUORIV_TOOLS.- Annotation note: the tool's
include_domains/exclude_domainsuselist[str] | Nonerather thanSequence[str] | None. LangChain's@tooldecorator evaluates annotations at runtime viaget_type_hints(), soSequence(importable only fromcollections.abc) would either need to live at module top-level and draw aruff TC003warning, or sit underTYPE_CHECKINGand break tool registration.listis a builtin, which sidesteps both problems. - 7 new tests in
tests/unit/tools/test_web.py: TestWebSearchMissingKey(1) — empty env + empty keychain produces a structured error dict (not an exception), and the error string namesTAVILY_API_KEY.TestWebSearchHappyPath(4) — env-var key resolves and returns normalised results; keychain fallback when env is unset (fake_keyringcaptures the key the fake client was constructed with);max_resultsforwards;include_domains+exclude_domainsforward through to the SDK call.TestWebSearchFailure(2) — Tavily client raising at search time becomes{"error": "Search failed: ..."}; non-dict responses from the SDK fall through to an emptyresultslist (defensive guard against weirdly shaped upstream returns).- Existing
TestToolSurface::test_registered_in_quoriv_toolsextended to also assertweb_searchis inQUORIV_TOOLS.
Test count: 688 → 695 (+7). All gates green.
Phase 3 Slice 8 — Themes (light / dark / auto)¶
quoriv.ui.themes— new module exposing three palettes for theconfig.ui.themesetting that's been on the schema since Phase 0 Day 2 but until now never propagated past the doctor table.make_console(theme)returns a configuredrich.console.Console;resolve_theme(name)collapsesautoto a concretedark/lightbased on the terminal's$COLORFGBGbackground hint (indexes 7 and 15 = light);RICH_THEMES["dark"]is explicitlyNoneso the factory has no special-case branch for "use Rich defaults".- Palette content kept small on purpose: the only Rich styles overridden by
lightaredim(Rich's default low-contrast grey on dark becomes near-invisible on white — bumped togrey39) andpanel.border(cyan washes out on light, switched toblue). Inline color tags like[green]ok[/green]and[red]err[/red]keep their Rich-default rendering because their contrast holds across both backgrounds. quoriv.app.run_chatandquoriv.cli._consolenow callmake_console(config.ui.theme)instead of bareConsole(). The CLI helper wraps the lazyload_configcall intry/exceptso a malformed config still gets a working console (falls back toauto).- The plain
from rich.console import Consoleimports inapp.pyandcli.pymove intoTYPE_CHECKINGblocks now that the runtime entry point ismake_console— Console is only used as a type annotation in both files, so the runtime import is dead weight (ruff TC002). - 19 new tests in
tests/unit/ui/test_themes.py: TestPaletteMap(2) —darkis explicitlyNone,lightis a realrich.theme.Themeinstance.TestLightBackgroundDetection(5) — no$COLORFGBG→ no light detection;0;7and0;15→ light;15;0/15;1/15;8→ not light; malformed values fall through; a single field without thefg;bgshape isn't enough signal.TestResolveTheme(4) —darkandlightround-trip;autoresolves tolightwith the light-bg env var,darkwithout it.TestMakeConsole(5) —darkreturns a plain Console;lightcarries the greydimoverride (verified by reading the style off the resolved Console);autowithout signal falls through to dark;force_terminal=Falsekwarg forwards; an unknown theme name silently falls back to defaults rather than crashing.
Test count: 695 → 714 (+19). All gates green.
Phase 3 Slice 9 — Fallback chains¶
quoriv.config.schema.ModelConfig.fallbacks: list[str]— new ordered list ofprovider:modelidentifiers to try when the primary model raises a transient error (rate limit, 5xx, network failure). Empty default disables fallbacks.quoriv.models.factory.with_fallbacks(primary, fallback_ids)— builds each fallback viaget_modeland wraps the primary with LangChain'sRunnable.with_fallbacks(...). Defensive: a fallback id that fails to build (missing key, unknown provider, malformed id) is logged and skipped rather than aborting agent startup. If every fallback fails, the primary is returned unwrapped (no chain at all) so the user still gets a working session.quoriv.core.agent.build_agentwraps the primary model viawith_fallbacks(primary_model, config.model.fallbacks)before passing it tocreate_deep_agent. The return type can be eitherBaseChatModel(no fallbacks) orRunnableWithFallbacks(chain assembled); acastat the call site keeps mypy happy without weakening the public typing ofwith_fallbacks.- 6 new tests in
tests/unit/models/test_fallbacks.py: TestEmptyFallbacks(2) — empty iterable returns primary unchanged with noget_modelcalls; an all-failing fallback list also returns primary unchanged (every fallback raises at build time).TestChainAssembly(2) — single fallback wraps primary via.with_fallbacks(...)andget_modelis called with the expected id; multiple fallbacks preserve user-supplied order all the way through.TestPartialFailure(1) — middle fallback raises at build time, first and third still land in the chain (skip + continue, not abort).TestIntegrationSmoke(1) —with_fallbacksagainst a realRunnableLambdaprimary returns a realRunnableWithFallbacks, catching future API drift in LangChain.
Test count: 714 → 720 (+6). All gates green.
Phase 3 Slice 10 — Hook registry (foundational)¶
quoriv.hooks— new module shippingHookRegistry, a tiny in-memory event bus forpre_tool/post_tool/on_messageevents. Callbacks are stored per-event in registration order, fire sequentially, and a callback that raises has its exception logged vialoguruand dropped — one broken hook never breaks a turn. The registry is intentionally per-session (constructed once perrun_chat), not a module-level singleton, so tests get clean isolation.- API surface:
register(event, callback)(rejects unknown event names early so a typo doesn't silently disable instrumentation),fire(event, **kwargs)(forwards kwargs to every registered callback),handlers(event)(immutable snapshot tuple for/hooks-style introspection),clear()(test isolation). - Scope note: this slice ships the registry only. Wiring
_stream_eventsto actually firepre_tool/post_toolonon_tool_start/on_tool_endLangGraph events lands in a follow-up — the registry is a complete unit on its own (full test coverage, foundational shape locked in) and the integration is a separable concern that touches the chat-loop plumbing. - 10 new tests in
tests/unit/test_hooks.py: TestRegister(4) — empty registry has no handlers;registeradds; multiple handlers preserve registration order when fired; unknown event names rejected with aValueErrornaming the valid set.TestFire(4) — kwargs forward verbatim; firing with no handlers is a silent no-op; firing an unknown event name is silent (programming error in the emitter, not user input); a callback that raises is logged and the remaining handlers still run.TestClear(1) —clear()drops handlers across all three events.TestHandlersSnapshot(1) — the returned tuple is a true snapshot; laterregistercalls don't retroactively appear in held references.
Test count: 720 → 730 (+10). All gates green.
Phase 3 Slice 11 — Hooks integration (consumer side)¶
quoriv.app._stream_eventsnow fires three events against the per-sessionHookRegistryshipped in Slice 10:pre_toolonon_tool_start— kwargs:tool_name,args.post_toolonon_tool_end— kwargs:tool_name,output.on_messageonon_chat_model_end— kwarg:message(the finalAIMessagefrom LangChain).run_chatconstructs oneHookRegistry()per session and threads it through_interactive_loop→_drive_turn→_stream_eventsas a new keyword-onlyhooksparameter. All three layers defaulthookstoNoneso existing test entry points (including the Phase 1 Slice 9b stubbed-LLM integration test) keep working unchanged.- The registry's defensive
fire()swallows callback exceptions and logs them — combined with this integration, one broken hook still can't break a turn. The integration adds no failure modes that the registry doesn't already handle. _stream_eventspicked up a# noqa: PLR0912because adding three optional hook-fire branches pushed the flat event-kind dispatch from 12 to 15 branches. The function is intentionally a flat switch with one branch per LangGraph event kind, so splitting it would just hide the dispatch.- No new tests this slice — the registry has full unit coverage from Slice 10 (10 tests) and the wiring is three mechanical
if hooks is not None: hooks.fire(...)lines. The existing 730-test suite catches regressions on every other code path that touches_stream_events.
Test count: 730 → 730 (no new tests; wiring-only slice). All gates green.
Phase 3 Slice 12 — Replay mode (post-mortem viewer)¶
quoriv.replay— new module shippingreplay_thread(console, trace_path). Reads a per-thread JSONL trace viaTraceLogger.read_eventsand renders each event with a Rich-formatted prefix and short label:▶turn_start (withuser_input+ mode),·model_complete (withmodel+input_tokens+output_tokens),↳tool_start (withtool_name+ truncated args),◀tool_end (withtool_name+ truncated output_preview),■turn_end. Unknown event kinds fall through to a compact JSON dump so nothing is silently dropped.- Read-only by construction: no model is invoked, no tool is executed, no permissions are checked. The viewer purely re-renders the JSONL log. Safe to point at any past session, including ones that ran in
yolomode against a paid provider. quoriv replay <name-or-id>CLI command added. Resolves<name>through the per-cwdSessionRegistry(the same one/save//loaduse); falls back to treating the argument as a raw thread id when no name matches.--cwdoverride mirrors thechatcommand. Missing-trace path raisestyper.Exit(code=1)with a friendly nudge so a CI script catches the failure.- 9 new tests in
tests/unit/test_replay.py: TestFormatRecord(6) — one assertion per event kind (turn_start carriesuser_input, model_complete shows token counts, tool_start shows args, tool_end shows preview, turn_end renders minimally, unknown kinds fall back to JSON).TestReplayThread(3) — missing file prints"No events"and returns0; full trace renders every event with its fingerprint and reports the count in the header; malformed lines mid-file are silently dropped (theTraceLogger.read_eventscontract from Slice 9).
Test count: 730 → 739 (+9). All gates green.
Phase 3 Slice 13 — Eval harness foundation¶
quoriv.eval— new module shipping the pure scoring layer for a regression eval suite:EvalCase(name, prompt, expected_substrings)— one prompt + the substrings the agent's final response must contain.EvalResult(case_name, passed, failed_substrings, output)— what scoring produces.score_case(case, output) -> EvalResult— pure function. Each expected substring must appear in the output (case-sensitive, unanchored). A case with no expectations always passes (smoke-test mode).summarize(results) -> {"total", "passed", "failed"}andpassed_fraction(results) -> floatfor CI thresholds.SAMPLE_CASES— three minimal sanity cases (echo a distinctive token, basic arithmetic, file:line citation format) using assertions distinctive enough that random LLM drift is very unlikely to satisfy them by accident.- Scope note: this slice ships the scoring + bundled cases only. The runner that drives each case through
_drive_turnagainst a chosen model is a follow-up — the scoring layer is a complete, foundational unit on its own. Builds on the stubbed-LLM integration pattern shipped in Phase 1 Slice 9b. - 16 new tests in
tests/unit/test_eval.py: TestScoreCase(5) — all substrings present passes; one missing fails with that substring listed; case-sensitive matching ("Hello"vs"hello"); empty expectations always pass (smoke mode); unanchored substring match.TestSummarize(3) — empty / all-passing / mixed result lists.TestPassedFraction(4) — empty is 1.0 (vacuous), all-pass / all-fail / half-pass.TestSampleCases(4) — bundle non-empty; every case carries name + prompt + expectations; case names are unique (lookups don't collide); the assertions are satisfiable (synthetic outputs hitting each case's substrings pass).
Test count: 739 → 755 (+16). All gates green. Phase 3 is feature-complete — runner / CLI for the eval harness can ship as a follow-up enhancement.
Phase 4 Slice 1 — Telemetry opt-in scaffold¶
quoriv.config.schema.TelemetryConfig(enabled: bool = False, endpoint: str | None = None)— new top-level config section. Off by default.extra="forbid"keeps typos from silently flipping the flag. Added toQuorivConfigas thetelemetrysection.quoriv.observability.telemetry— new module shipping the gating surface for Phase 4 telemetry:is_enabled(config)— returnsTrueonly when the user explicitly opted in. Accepts a fullQuorivConfig, a bareTelemetryConfig, orNone(always disabled).report(event_name, config=None, **fields)— no-op stub that short-circuits early when telemetry is disabled. When enabled, it logs to loguru at debug level for now — the network sink ships in a follow-up when we pick a backend (Sentry / PostHog / OpenTelemetry / self-hosted).- No backend, no transmission. Users who flip the flag in
config.tomlwon't see traffic; the contract is "your opt-in is captured and respected; nothing transmits yet". When the backend lands, the gating check is already in place at every future call site. quoriv.observability.__init__re-exportstelemetry_enabledandtelemetry_reportso emitters canfrom quoriv.observability import telemetry_reportwithout reaching into the submodule.- 13 new tests in
tests/unit/observability/test_telemetry.py: TestTelemetryConfigDefaults(5) —enabled=Falseby default,endpoint=Noneby default,QuorivConfig.telemetryexposes the section with safe defaults, explicit opt-in round-trips throughmodel_validate, extra fields rejected.TestIsEnabled(4) —Noneconfig returns False, default config returns False, explicitenabled=Truereturns True, accepts a bareTelemetryConfig(callers that drilled down).TestReport(4) — disabled config emits no log;Noneconfig no-ops without raising; enabled config doesn't raise (loguru sink is implementation detail); arbitrary structured kwargs accepted (future backend validates at the sink).
Test count: 755 → 768 (+13). All gates green.
Phase 4 Slice 2 — Eval-harness runner¶
quoriv.eval.run_case(case, *, config=None, cwd=None, mode="yolo", model_override=None, agent=None)— drives oneEvalCasethroughquoriv.app._drive_turnagainst a fresh agent (or a caller-supplied one for tests), then extracts the final assistant text from LangGraph state viaagent.aget_state(...)and feeds it toscore_case. Per-casethread_id = f"eval-{case.name}"keeps cases isolated in the checkpointer. Discards rendered output to an in-memoryConsoleso eval runs quiet.quoriv.eval.run_suite(cases, ...)— sequential runner over a tuple/list of cases. A per-case exception is captured as a failedEvalResult(output =<error: ...>, every expected substring marked missing) so one bad case can't poison the rest of the run. Case order preserved in the returned list.quoriv.eval._final_ai_text(messages)— walks a LangGraph message list in reverse looking for the latest non-emptyAIMessage. Handles both string content and the list-of-dict-chunks shape some providers emit. Returns""when no suitable message is found (score then reports every expected substring as missing).quoriv eval [--model] [--cwd]— new Typer command. Resolves the model from config (with--modeloverride), runsSAMPLE_CASESviarun_suite, prints a Rich table summarising pass / fail per case + missing substrings, and exits non-zero if any case failed.- 18 new tests in
tests/unit/test_eval_runner.py: TestFinalAIText(5) — string content, list-of-dict-chunks, skips empty AIMessages, no-AIMessage list, empty list.TestRunCase(5) — passing case, failing case (missing substrings reported), per-case thread id isolation, requiresconfigoragent, smoke case with empty expectations.TestRunSuite(3) — one result per case in order, per-case exception isolation (stubbed_drive_turnraise), empty suite returns empty list.TestEvalCLI(5) — zero exit on all-pass, non-zero exit on any-fail, case names rendered in table,--modelforwarded asmodel_override,--cwdforwarded as resolved Path.- Uses the same
GenericFakeChatModel+bind_toolsoverride pattern astests/integration/test_e2e_stubbed_chat.py— no real LLM calls in the test suite.
Test count: 768 → 786 (+18). All gates green.
Phase 4 Slice 3 — Release CI workflow¶
.github/workflows/release.yml— tag-triggered release pipeline:buildjob — checks out the repo, installsbuild+twine, verifies the pushed tag matchesproject.versioninpyproject.toml(a stale tag now fails loudly instead of publishing the wrong artifact), builds wheel + sdist viapython -m build(hatchling backend), validates the dists withtwine check, and uploads them as adistartifact.publishjob —needs: build, gated topushevents onrefs/tags/v*so aworkflow_dispatchbuild doesn't accidentally publish. Usespypa/gh-action-pypi-publish@release/v1with OIDC trusted publishing (permissions: id-token: write) — no PyPI API token is stored in repo secrets. Bound to thepypideployment environment so manual approval can be required before each release.github-releasejob — downloads thedistartifact and attaches the wheel + sdist to the GitHub release page viasoftprops/action-gh-release@v2, auto-generating release notes from the changelog and commit history.- Trigger:
pushon tags matchingv*.*.*(semver only).workflow_dispatchis also wired so the build job can be smoke-tested without cutting a tag. - 18 new tests in
tests/unit/test_release_workflow.py: TestWorkflowFile(2) — file exists, parses as YAML with the rightname.TestTriggers(2) — triggers onv*.*.*tags, supportsworkflow_dispatch. Handles PyYAML's quirk of mapping the bareon:key to PythonTrue.TestBuildJob(5) — build job exists, usespython -m build, validates withtwine check, verifies tag matchespyproject.tomlversion, uploads the dist artifact.TestPublishJob(6) — publish job exists, hasid-token: writefor OIDC, usespypa/gh-action-pypi-publish,needs: build, gated to tag pushes (won't publish fromworkflow_dispatch), targets thepypienvironment.TestGithubReleaseJob(3) — exists, usessoftprops/action-gh-release, hascontents: write.- One-time setup before first release — configure trusted publisher on PyPI at https://pypi.org/manage/account/publishing/ binding the (repository, workflow name, environment) triple. After that, every
git tag v… && git push --tagsships a release.
Test count: 786 → 804 (+18). All gates green.
Phase 4 Slice 4 — MkDocs documentation site¶
mkdocs.yml—mkdocs-materialconfig with light/dark palette toggle, navigation features (instant, tracking, top-button, sections), search, code-copy buttons, and "Edit this page" → GitHub. Plugins:search,include-markdown,mkdocstrings[python](pointed atsrc/). Markdown extensions cover admonitions, code highlighting, tables, tabbed blocks, task lists, footnotes, and pymdownx emoji.docs/index.md,docs/changelog.md,docs/contributing.md,docs/security.md,docs/project-plan.md— thin shims that pullREADME.md/CHANGELOG.md/CONTRIBUTING.md/SECURITY.md/PROJECT_PLAN.mdfrom the repo root via the include-markdown directive. Avoids duplicating content; root files stay canonical for GitHub viewers.docs/architecture.md— short intro + includes the existingdocs/DEEPAGENTS_REFERENCE.mdso the architecture reference is available in the rendered site.not_in_nav: DEEPAGENTS_REFERENCE.md— silences the orphaned-file warning for the reference doc, which is consumed via include-markdown rather than linked directly..github/workflows/docs.yml— GitHub Pages deploy workflow:- Triggers on push to
main(path-filtered to docs / source / root markdown) +workflow_dispatch. buildjob — installsdocsextras + the include-markdown plugin, runsmkdocs build, uploads the renderedsite/directory as a Pages artifact viaactions/upload-pages-artifact@v3.deployjob — gated torefs/heads/main, usesactions/deploy-pages@v4with OIDC (pages: write+id-token: write). Bound to thegithub-pagesenvironment so the URL is surfaced on the workflow page.- Non-strict build for now — included root files carry pre-existing relative links (
../PROJECT_PLAN.md,../LICENSE) that MkDocs flags as warnings. Site renders correctly; tightening to--stricthappens once those links are rewritten to be site-aware. pyproject.toml [project.optional-dependencies.docs]— addedmkdocs-include-markdown-plugin>=6.2.0sopip install -e ".[docs]"produces a workingmkdocs build.- 24 new tests in
tests/unit/test_docs_setup.py: TestMkdocsConfig(6) — file exists,site_name,repo_url/edit_uri, material theme,include-markdownplugin enabled,searchplugin enabled. A customSafeLoadersubclass stubs out!!python/name:tags so the test runs without mkdocs-material being installed.TestNavStructure(2 + 6 parametrized) — nav present, core pages mapped, every nav target resolves to an actual file underdocs/.TestDocsWorkflow(9) — workflow file present, triggers onmainpush +workflow_dispatch, haspages: write+id-token: write+contents: read, runsmkdocs build, uploads pages artifact viaactions/upload-pages-artifact, deploys viaactions/deploy-pages, deploy gated tomain, targets thegithub-pagesenvironment.TestPyprojectDocsGroup(1) —mkdocs-include-markdown-pluginandmkdocs-materialare listed in thedocsextra.- One-time setup before first deploy — repo Settings → Pages → Source: "GitHub Actions". After that, every push to
mainupdates https://burhanhussain1.github.io/quoriv/.
Test count: 804 → 828 (+24). All gates green.
Phase 4 Slice 5 — PyInstaller binaries¶
pyinstaller.spec— spec file used bypyinstaller pyinstaller.spec(CI invokes this on each matrix runner):- Entry point:
src/quoriv/__main__.pywithpathex=["src"]so thequorivpackage resolves under PyInstaller's bootloader. - Explicit
hiddenimportsfor every provider module underquoriv.models.*— they're loaded viaimportlib.import_moduleinquoriv.models.factoryand the static analyser doesn't see them. Also lists the LangChain provider packages each one imports from (langchain_openai,langchain_anthropic,langchain_google_genai,langchain_ollama) pluslangchain.agents.factoryandlangchain_anthropic.middleware.prompt_caching(DeepAgents pulls these in dynamically). collect_submodulescoversquoriv+langchain+langchain_core+langgraph+deepagentsso chains / tracers / hub modules that evade analysis still ship.excludes: pytest / mypy / ruff / mkdocs — dev-only stuff that shouldn't bloat the binary.upx=Falsedeliberately — UPX-packed Windows binaries get flagged by Defender SmartScreen..github/workflows/binaries.yml— cross-platform release pipeline:- Triggers on
v*.*.*tag pushes (release path) andworkflow_dispatch(manual smoke build). buildjob — 3-row matrix (ubuntu-latest,macos-latest,windows-latest) withartifact+binarycolumns so the smoke / upload steps stay declarative. Installsbinaryextra + runtime provider extras, runspyinstaller pyinstaller.spec --clean --noconfirm, renames the produced binary to a per-OS name (quoriv-linux-x86_64,quoriv-macos-arm64,quoriv-windows-x86_64.exe), smoke-tests it by invokingversion, and uploads as a workflow artifact.attach-to-releasejob —needs: build, gated topushevents onrefs/tags/v*. Downloads all three matrix artifacts and attaches them to the GitHub release viasoftprops/action-gh-release@v2. Manual dispatches stop at the workflow-artifact stage so a smoke build never accidentally publishes.pyproject.toml [project.optional-dependencies.binary]— new extra installingpyinstaller>=6.10.0.pip install -e ".[binary,…]"is now enough to build a binary locally.- 23 new tests in
tests/unit/test_binaries_workflow.py: TestSpecFile(6) — exists, entrypoint,collect_submodules("quoriv"), every provider listed in hiddenimports,console=True,upx=False.TestWorkflowTriggers(2) —v*.*.*tag push +workflow_dispatch.TestBuildMatrix(3 + 3 parametrized) — every OS present, each matrix row namesartifact+binary, Windows row carries.exesuffix.TestBuildSteps(4) — installsbinaryextra, invokespyinstaller pyinstaller.spec, smoke-tests the binary withversion, uploads per-OS artifact.TestAttachToRelease(5) — job exists,needs: build, gated to tag pushes,contents: write, usessoftprops/action-gh-release.TestPyprojectBinaryExtra(1) — pyinstaller appears in thebinaryextra.- First release run will likely surface platform-specific missing hidden imports; expect a short iteration loop after tagging
v0.1.0-rc1to refine the spec beforev1.0.0.
Test count: 828 → 851 (+23). All gates green.
Phase 4 Slice 6 — Telemetry backend¶
quoriv.observability.telemetry.reportnow actually transmits when the user has opted in and configured anendpoint. The Slice 1 stub (debug-log only) is replaced by a sync HTTP POST viahttpx:- Envelope shape:
{"event": <name>, "fields": {<kwargs>}, "client": {"name": "quoriv", "version": __version__, "platform": sys.platform, "python": "<maj.min>"}, "timestamp": <ISO-8601 UTC>}. - When
api_keyis set on the config, sent asAuthorization: Bearer <api_key>. Lets self-hosted sinks distinguish clients. - Timeout is 2 seconds (exported as
_DEFAULT_TIMEOUT) — a misbehaving sink can stall a chat turn by at most that amount, no matter what. - Every transport error is swallowed.
httpx.ConnectError, DNS failures, 5xx responses, broken TLS — all caught, logged at debug, never re-raised. Telemetry must never break the agent. quoriv.observability.telemetry._build_envelope— new pure helper exposing the envelope shape so tests (and any future backend swap) can assert on it without touching the network.quoriv.observability.telemetry._resolve_telemetry— internal helper that pulls the bareTelemetryConfigout of either aQuorivConfigor a leafTelemetryConfig. Keepsreportsymmetrical withis_enabledon accepted inputs.quoriv.config.schema.TelemetryConfig.api_key: str | None = None— new field for the optional bearer token.extra="forbid"is unchanged so a typo (enabled→enable) still fails validation loudly.- Endpoint docstring rewritten — it's no longer "ignored until a backend ships"; it now drives the HTTP transport.
- No new dependencies.
httpxis already a runtime dep for tool-side use; we reuse it here. - 14 new tests in
tests/unit/observability/test_telemetry.py: TestApiKeyField(2) — defaultNone, round-trips throughmodel_validate.TestBuildEnvelope(4) — event name + fields preserved, client metadata present, timestamp is ISO-8601 UTC (+00:00suffix), envelope is JSON-serialisable.TestReportTransport(8) — uses a_PostRecordercallable that monkeypatcheshttpx.postto capture call args without hitting the network:- No endpoint configured → no POST (even when enabled).
- Disabled config → no POST (even when endpoint is set).
- Enabled + endpoint → POST with correct URL, JSON envelope,
Content-Typeheader. api_keyset →Authorization: Bearer <key>header added.- Timeout is bounded to
_DEFAULT_TIMEOUT(≤ 5s safety check). httpx.ConnectErrorswallowed (no re-raise).- 5xx response swallowed (no re-raise).
- Bare
TelemetryConfigaccepted directly (noQuorivConfigwrapper required).
Test count: 851 → 865 (+14). All gates green.
Changed¶
Architecture revision (post-DeepAgents audit)¶
- Adopted DeepAgents-reuse model after a deep read of the installed
deepagents0.6.1 SDK - Added
docs/DEEPAGENTS_REFERENCE.md— internal working reference for every DeepAgents feature Quoriv builds on - Rewrote
PROJECT_PLAN.md: updated folder tree, architecture diagram, Phase 1 / Phase 2 scope, and the permission-mode → DeepAgents-config mapping - Narrowed scope of
src/quoriv/core/__init__.py(DeepAgents IS the runtime; we just wrapcreate_deep_agent) - Narrowed scope of
src/quoriv/tools/__init__.py(only Quoriv-specific tools — AST, git, tests, web — DeepAgents owns files/shell/grep/todo) - Narrowed scope of
src/quoriv/permissions/__init__.py(mode translation only;FilesystemMiddlewareenforces) - Narrowed scope of
src/quoriv/repo/__init__.py(tree-sitter symbol layer powering AST tools)
Removed¶
src/quoriv/memory/subpackage — DeepAgents'MemoryMiddlewareloadsPROJECT.md/~/.quoriv/memory.mddirectly via thememory=[...]parameter. No custom loader needed.
Coming next (Phase 4 — kickoff)¶
- Phase 3 is feature-complete. Phase 4 (release polish — MkDocs site, PyPI publish, PyInstaller binaries, CI matrix release pipeline, security policy, telemetry opt-in,
v1.0.0tag) starts in the next milestone — seePROJECT_PLAN.md. - v1.0.0 tag + announcement — once the release pipeline, docs, and binaries are in place.