553 Commits

Author SHA1 Message Date
Alexandre Teixeira
65231f2ba1 refactor(tests): reuse import-state helper in auth manager tests
Test-only refactor continuing #2523. Replaces inline core.auth cache eviction in two _fresh_auth_manager tests with clear_module, preserving behavior.
2026-06-05 11:24:55 +01:00
Alexandre Teixeira
4f0133b8c3 refactor(tests): reuse import-state helper in auth tests
Test-only refactor continuing #2523. Replaces a repeated core.auth cache eviction pattern in three auth tests with the shared clear_module helper, preserving behavior.
2026-06-05 11:10:41 +01:00
spooky
f9e1d38cc2 fix: diagnose vllm serve runtime issues (#1198) 2026-06-05 11:03:04 +01:00
Kenny Van de Maele
0a2adc9c96 Add ask_user tool: agent-posed multiple-choice questions (#2111)
Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.

Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.

- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
  Validates a non-empty question + 2..6 options, normalizes string/object
  options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
  the turn ends and waits for the user's selection. Stream the question as
  assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
  FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
  user can be asked); the structured args serialize via the default
  json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
  free-text Other + dismiss x; removed once answered). Reuses CSS vars and
  the .modal-close button; emoji go through the monochrome-SVG pipeline.
  Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
  cap, validation errors, serializer round-trip, registration.
2026-06-05 11:49:11 +02:00
Alexandre Teixeira
621885ac06 fix(tests): restore Python CI baseline regressions
Test-only fix continuing #2523. Updates two stale regression tests so the current broad Python pytest baseline is restored without changing production code.
2026-06-05 10:31:38 +01:00
Alexandre Teixeira
30173f3909 fix(tests): make archived session filter test multipart-independent
Test-only fix continuing #2523. Makes the archived-session model-filter test independent of optional multipart packages. The red broad pytest status was classified as unrelated current dev baseline drift before merge.
2026-06-05 10:12:47 +01:00
Lucas Daniel
f5d834b0c5 fix(cookbook): surface backend diagnosis when serve fails in background (#1636)
* refactor(cookbook): move _diagnose_serve_output to module level in cookbook_helpers

Extracts the nested _diagnose_serve_output function from inside
setup_cookbook_routes() and moves it to module level in cookbook_helpers.py,
alongside the other helper functions it logically belongs with.

No behaviour change — the function is now importable directly for testing
and by other callers without going through the route factory closure.

* fix(cookbook): surface backend diagnosis when serve fails in background

The background poll (_pollBackgroundStatus) already received `diagnosis`
and `cmd` from /api/cookbook/tasks/status but discarded both. When a serve
job died while the Cookbook modal was closed, reopening it showed only a
red error badge with no context.

- Persist live.diagnosis into task._backendDiagnosis in localStorage so it
  survives modal close/reopen and page refresh
- Persist live.cmd into task.payload._cmd for agent-spawned tasks so the
  crash report includes the actual command
- After _renderRunningTab(), walk rendered cards and call _showDiagnosis()
  for any that have a stored _backendDiagnosis but no panel yet
- In _renderTaskCard(), use _backendDiagnosis as a fallback when the
  client-side _terminalServeDiagnosis() finds nothing

* test(cookbook): add coverage for _diagnose_serve_output error patterns

10 tests verifying the 16 serve-failure patterns:
- CUDA OOM, port-in-use, vLLM missing, gated model
- Traceback fallback fires without startup success marker
- Traceback suppressed when server actually started
- Clean/empty output returns None
- trust-remote-code and no-GGUF patterns
2026-06-05 09:52:07 +01:00
Vykos
b19e5693af Constrain embedding model cache paths (#2849) 2026-06-05 10:46:48 +02:00
Vykos
11ba46505b Constrain generated-image paths to image root (#2837) 2026-06-05 10:33:47 +02:00
Vykos
d4d168f972 Harden emoji SVG proxy responses (#2842) 2026-06-05 10:31:58 +02:00
Vykos
194985b5e1 Constrain gallery filenames to image root (#2828) 2026-06-05 10:29:11 +02:00
Alexandre Teixeira
0dc051dea3 refactor(tests): reuse import-state helper in session tests
Test-only refactor continuing #2523. Reuses the shared import-state helper in session-related tests, removes duplicated local save/restore logic, and preserves existing test behavior.
2026-06-05 09:25:52 +01:00
nubs
8b386a172e fix(calendar): route read requests to agent (#2452) 2026-06-05 09:24:04 +01:00
Vykos
2cae5a681d Sanitize calendar export filenames (#2840) 2026-06-05 10:18:09 +02:00
Alexandre Teixeira
46f128b9df fix(tests): make conftest DB import clean-worktree safe
Test-only fix continuing #2523. Sets an in-memory DATABASE_URL default before tests/conftest.py imports core.database, preserving explicit DATABASE_URL values and avoiding ./data artifacts in clean worktrees.
2026-06-05 09:14:51 +01:00
Isak
ec7691956b fix: add threading lock to AuthManager config mutations (#1226) 2026-06-05 10:04:37 +02:00
1jsjs
3ef73013eb Fix session cleanup cutoff timezone (#2488) 2026-06-05 09:52:34 +02:00
tanmayraut45
17b62a3dba Research CLI: alias --status complete to the stored done value (#2515)
`odysseus-research list --status complete` returns an empty result on
any real corpus. The CLI accepts `complete` as a `--status` choice (the
user-facing label), but the writer in
`services/research/research_handler.py` stores `status="done"` when a
run finishes (and the legacy `src/research_handler.py` copy does the
same). The list filter at `scripts/odysseus-research` was a literal
string compare:

    if args.status and (data.get("status") or "") != args.status:
        continue

so `--status complete` filtered every finished record out, and the user
saw nothing — even though `odysseus-research list` (no filter) listed
them fine and `show RP_ID` worked on the same files. The other
documented choices — `running`, `cancelled`, `error` — are stored
verbatim by the writer, so the surface mismatch is just on `complete`.

Add a small `_STATUS_CLI_TO_STORED = {"complete": "done"}` map and run
`data.get("status")` through `_status_matches(...)` before comparing.
The other CLI choices fall through unchanged, so the filter still
matches them verbatim. A `None` or non-string `status` (corrupt JSON)
is coerced to `""` and never matches `complete`, so a half-written
record can't sneak past the filter.

`tests/test_research_cli_status_filter.py` covers all four documented
choices, the non-string / missing status case, and pins that the
verbatim choices are NOT rewritten — a blanket mapping that turned
every CLI choice into a stored variant would just re-introduce the
empty-result bug on the running/cancelled/error paths.

Part of #2122.
2026-06-05 08:50:33 +01:00
ghreprimand
e0097c9c48 Strip tz in _parse_dt dateutil fallback (naive-datetime contract) (#2557)
_parse_dt documents that it returns naive datetimes (CalendarEvent.dtstart is
naive) and every return path strips tz — except the last-resort dateutil
fallback, which returned dateutil's value verbatim. An offset-bearing non-ISO
input (e.g. RFC-2822 'Mon, 05 Jan 2026 14:00:00 +0900', which fromisoformat
rejects but dateutil parses) leaked a tz-aware datetime into the naive dtstart
column via create_event/update_event -> _parse_dt_pair. On read-back,
_expand_rrule compares ev.dtstart against naive window bounds and raised
'can't compare offset-naive and offset-aware datetimes' (500 / no events).

Normalize the fallback to UTC-naive, mirroring the fromisoformat branch. Naive
inputs are unchanged.

(cherry picked from commit b03b6b91df21c1a3ad3c447f23f35b8b19e6d1b1)

Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-05 08:18:26 +01:00
Alexandre Teixeira
9ffa87e394 fix(tests): make webhook SSRF test clean-worktree deterministic
Test-only fix continuing #2523. Makes the webhook SSRF test deterministic in clean worktrees without creating ./data or repo-local DB artifacts.
2026-06-05 08:16:28 +01:00
ghreprimand
cfb2d17a2d Word-boundary match for snippet and subject-term ranking (#1473 follow-up) (#2556)
#1473 converted the title and sports-hint matches in services/search/ranking.py
to word boundaries but left two raw substring tests:

  - snippet_score: 'term in snippet.lower()' — query term 'port' hits
    'transport'/'support', inflating a result's relevance.
  - news_quality_adjustment: 't in text or t in netloc' for the subject term —
    query 'us' substring-matches 'business'/'music', so an off-topic page
    wrongly escapes the off-topic penalty on a country/subject news query.

Add a _has_word helper (the same \b...\b pattern title_score already used) and
route all three word checks (title, snippet, subject) through it, so the file
stays consistent and a future partial fix can't reintroduce the same bug class.
Pure ranking refinement: scores change only for spurious substring matches; no
API or schema change.

(cherry picked from commit 22bd23f044f191bb30e43f6b68386552817f4cc3)

Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-05 08:04:31 +01:00
nubs
5271d529d6 fix(tool-schemas): preserve web_search time_filter through native tool-call conversion (#2757) 2026-06-05 08:00:59 +01:00
Alexandre Teixeira
a9c1c698b0 refactor(tests): add import-state isolation helper
Test-only refactor continuing #2523. Adds a shared import-state isolation helper with focused coverage and migrates two pilot tests that manually preserved sys.modules and parent package attributes.
2026-06-05 07:30:14 +01:00
Alexandre Teixeira
43a101d305 refactor(tests): finish shared CLI loader adoption
Test-only refactor continuing #2523. Replaces remaining obvious CLI/script loader boilerplate with tests.helpers.cli_loader.load_script while preserving existing stubs and assertions.
2026-06-05 06:00:05 +01:00
Nicholai
1f40fbe140 Fix auto-memory vector dedup across tenants
Ensure vector dedup only suppresses a memory when the matched JSON memory belongs to the same owner or is legacy unowned.

Cross-owner vector hits now fall through to the existing owner-scoped text/fuzzy dedup path, preventing one user's memory from blocking another user's similar fact.

Fixes #2114.
2026-06-04 20:26:02 -06:00
Alexandre Teixeira
51e668ce60 refactor(tests): reuse CLI loader in more tests (#2571) 2026-06-05 02:42:10 +01:00
nubs
ae48ea7064 fix(mcp): sanitize and cap rendered MCP tool param hints (#2682) 2026-06-05 03:00:22 +02:00
nubs
b9a0586edc fix(markdown): avoid autolinking dotted imports (#2295) 2026-06-05 02:57:20 +02:00
nubs
19a3fc59c9 fix(model-context): key context-window cache by (endpoint, model) (#2614)
get_context_length() cached the resolved context window by model id alone,
so two different remote endpoints serving the same model id (e.g. a capped
proxy at 8k vs. the full provider at 200k) collided: the first to resolve
won process-wide and the other endpoint was served the wrong window. That
silently over-trims conversations on the larger-window endpoint (it feeds
context_compactor) or overflows the smaller one (provider 400s).

Key the cache on (endpoint_url, model). Local endpoints already always
re-query, so they are unaffected.

Fixes #2603
2026-06-05 02:50:56 +02:00
L1
f8cf791491 fix(caldav): don't prune locally-created events on sync (#2706)
The CalDAV pull prunes events in the synced calendar+window whose UID the
server didn't just return, to propagate upstream deletions. But CalendarEvent
had no field distinguishing a server-pulled row from a locally-created one, so
the prune also deleted events that were never on the server: events created by
the agent / email triage (which never write back to the server) and UI events
whose best-effort write-back failed. Result: silent, unrecoverable loss of the
user's appointments (hard db.delete, no soft-delete).

Add an 'origin' column to calendar_events (lightweight idempotent migration,
mirroring _migrate_add_calendar_is_utc), set origin='caldav' on rows the sync
inserts/updates, and gate the prune on origin == 'caldav'. Locally-created
events carry origin NULL and are never pruned. On the first sync after the
migration nothing is pruned (all rows NULL until re-marked), erring toward
keeping data.

Fixes #2704
2026-06-05 02:48:03 +02:00
Abylaikhan Zulbukharov
1d80bf5e65 feat(mcp): add Streamable HTTP transport with OAuth 2.0 (#1033)
* feat(mcp): add Streamable HTTP transport with OAuth 2.0

  Odysseus could only reach MCP servers over stdio and SSE, so modern
  remote servers like https://mcp.higgsfield.ai/mcp (Streamable HTTP,
  gated behind OAuth) could not be connected.

  Add an `http` transport that connects via the SDK's
  streamablehttp_client and authenticates with the SDK's
  OAuthClientProvider: RFC 9728 protected-resource discovery, RFC 8414
  authorization-server metadata, Dynamic Client Registration,
  authorization-code + PKCE, and token refresh. A small bridge
  (src/mcp_oauth.py) connects the SDK's blocking callback to the existing
  web callback route via an asyncio.Future keyed by the OAuth `state`,
  and the dynamic client registration plus tokens persist per-server in a
  new encrypted `oauth_tokens` column.

  The connect runs as a bounded background task so the "Add server"
  request returns immediately; redirect_handler publishes needs_auth +
  auth_url to connection state as soon as discovery/DCR completes (which
  can exceed the bounded wait), and the UI polls until connected. Remote
  users finish via the existing paste-back flow. The Google OAuth path is
  left unchanged.

  - core/database.py: encrypted oauth_tokens column + migration
  - src/mcp_oauth.py: OAuth provider, DB-backed TokenStorage, state registry
  - src/mcp_manager.py: http dispatch, background connect, _connect_http
  - routes/mcp_routes.py: http validation, needs_auth/auth_url, callback bridge
  - static/js/settings.js: Streamable HTTP option + OAuth flow with polling
  - tests: 5 new unit tests (transport dispatch, registry, token storage)

  Verified against the live Higgsfield server: discovery, DCR (client_id
  issued), loopback redirect accepted, and a PKCE authorization URL with
  needs_auth status. No regressions (full suite delta is only the 5 added
  passing tests).

* fix(mcp): address PR #1033 review feedback

  - mcp_oauth: derive redirect URI from OAUTH_REDIRECT_BASE_URL/APP_PUBLIC_URL
    (default http://localhost:7000) instead of hardcoding the port
  - mcp_oauth: leave OAuth scope unset so the SDK derives it from the server's
    WWW-Authenticate/protected-resource metadata; hardcoding an OIDC scope broke
    non-OpenID MCP servers (verified: Higgsfield still gets its server-derived
    scope)
  - mcp_oauth: prune abandoned OAuth flows (_prune_stale + _pending_ts) so the
    module-level registries can't grow unbounded
  - mcp_oauth: persist tokens/client-info in a single DB session/commit
    (_update) instead of a load+save double round-trip
  - mcp_manager: cancel and drop the background connect task in
    disconnect_server so a deleted server stops publishing status
  - database: document why the oauth_tokens migration uses TEXT while the model
    declares EncryptedText (encryption is applied at the Python layer)
  - settings.js: surface persistent OAuth-poll failures and an explicit timeout
    message instead of silently swallowing errors
  - tests: cover the stale-flow pruning

* static/js/settings.js now shows an in-flight loading state on the buttons that fire requests:
2026-06-05 02:40:52 +02:00
Zeus-Deus
85334e8f3d Render emoji shortcodes as icons in chat (#345) (#629)
Chat models often emit GitHub/Slack-style :shortcode: text (e.g. 😊,
🎤) instead of the actual emoji. The renderer only converted real
Unicode emoji to the monochrome line icons, so shortcodes rendered as literal
text.

Add a pure, browser-free shortcode->Unicode map (emojiShortcodes.js) and run it
inside svgifyEmoji ahead of the existing Unicode->SVG pass, skipping <code>/<pre>
so code stays literal. Covers ~430 common shortcodes plus common aliases
(+1/thumbsup, etc.).

Keep the conversion from touching anything it shouldn't:
* Scope it to chat. mdToHtml/svgifyEmoji take a { shortcodes } option (default
  on); document and email body rendering (compose, export, preview) pass it as
  false so author-typed :shortcode: text stays literal. The Unicode->SVG pass
  still runs there exactly as before.
* Only convert a :shortcode: that stands on its own. A word-boundary guard
  leaves embedded colon runs alone, so "1:100:2", "10:30:45", "16:9" and
  host:fire:port are never rewritten.

Tests: extend the node-driven unit test with the boundary/false-positive cases,
and fix the markdown-rendering test loader to resolve the new emojiShortcodes
import.
2026-06-05 02:28:42 +02:00
anduimagui
f9c81f3c8d fix(email): scope AI caches by owner (#2695) 2026-06-05 02:21:50 +02:00
afonsopc
9be2862e4e Stub llm_core via monkeypatch.setitem so the cross-tenant test does not leak its fake into later test modules 2026-06-05 00:04:15 +01:00
afonsopc
1801ba9a0d Update degraded-vector dedup test for owner-scoped vector match 2026-06-04 23:45:13 +01:00
afonsopc
28b296a712 Fix auto-memory vector dedup dropping a user's fact on cross-tenant match
extract_and_store dedups each extracted fact against the vector store
before the (owner-scoped) text fallback. The vector store is a single
shared ChromaDB collection storing only {"source": "memory"} — no
owner — and find_similar queries it with no owner filter, so it can
return a memory_id belonging to a different tenant. The old code
continue'd (skipped storing) on any vector hit without checking
ownership, so when ChromaDB is healthy (the common path) a user's
freshly-extracted fact was silently dropped because it was merely
semantically similar to another user's memory — the text fallback that
IS owner-scoped never ran. Gate the skip on the matched memory being
this user's own (or legacy unowned), mirroring the text dedup predicate;
cross-tenant or stale matches fall through. Same bug class as #1743.
2026-06-04 23:45:13 +01:00
Alexandre Teixeira
23fb5e169a fix(tests): make cookbook venv fallback test deterministic
Makes the cookbook venv fallback-chain test deterministic by simulating the inside-venv shell state directly instead of depending on the GitHub runner Python environment. Final focused #2580 CI-baseline cleanup.
2026-06-04 23:35:34 +01:00
Alexandre Teixeira
795782917f fix(tests): call live tool_execution module in edit-file gate test
Calls execute_tool_block through the live src.tool_execution module in the edit-file admin-gate test so the monkeypatched _owner_is_admin seam and the called function belong to the same module object. Fixes the scoped #2580 CI-order edit-file failure. Remaining Python failure is the unrelated cookbook fallback-chain environment test.
2026-06-04 23:22:02 +01:00
Isaiah Gardner
134c608466 fix: degrade missing/None content key in system messages to empty string (#2570) 2026-06-05 00:10:11 +02:00
Kenny Van de Maele
2be3779e6e feat: Add workspace: confine agent tools to a folder (#1103)
* feat: Add workspace: confine agent tools to a folder

Pick a server folder as the agent's workspace so its file/shell tools work
there and don't touch files outside it. File tools are hard-confined; bash/
python run with cwd set to the folder.

Includes a slash command: `/workspace` (alias `/ws`) — show / `set <path>` /
`clear` / `pick` (open the directory browser).

- routes/workspace_routes.py: GET /api/workspace/browse (admin-only).
- src/tool_execution.py: hard path confinement for read_file/write_file;
  bash/python cwd. Threaded route → stream_agent_loop → execute_tool_block.
- src/agent_loop.py: workspace note prepended to the system prompt.
- static/: overflow menu item, input-bar pill, directory-browser modal, and
  the /workspace slash command.
- tests/test_workspace_confine.py.

* Wire workspace confinement into tools that landed after this PR

edit_file (#1239) and grep/glob/ls (#1670) merged after workspace-confine was
written, so they bypassed the workspace boundary. Thread the workspace through:
  - edit_file: _do_edit_file resolves via _resolve_tool_path_in_workspace
  - grep/glob/ls: _resolve_search_root confines to the workspace (root + paths)
  - bash/python/bg cwd: workspace or _AGENT_WORKDIR (keep the #2586 data-dir
    default when no workspace is set)
Tests cover edit_file + grep/ls confinement (inside ok, outside rejected).

* Workspace picker: editable path bar + modal style cohesion + cross-platform hardening

- Make the current-folder strip an editable address bar: type/paste a full
  path and press Enter to navigate (also reaches other Windows drives and
  hidden dirs the up-only browser cannot).
- Reuse shared modal CSS: drop bespoke .workspace-modal-content/.workspace-btn*
  in favour of base .modal-content/.modal-body and the .confirm-btn button
  family; separators/hover use var(--border). Net -31 CSS lines.
- Fix the path field overflowing the modal right edge (flex stretch + margin
  vs an overflow:auto scrollbar-feedback loop): full-bleed, no h-margin.
- Cross-platform confinement: normcase the workspace commonpath check so
  containment holds on case-insensitive filesystems (Windows/macOS).
- Make tests OS-portable: sibling temp dirs instead of /etc, python os.getcwd()
  instead of pwd. 5 pass.
2026-06-05 00:06:37 +02:00
Alexandre Teixeira
fb852bd62e fix(tests): restore webhook manager after review test import
Restores src.webhook_manager after a review-regression test imports it against a fake src.database. Fixes one focused #2580 CI-baseline pollution bucket.
2026-06-04 22:28:00 +01:00
Michiel Van de Velde
7ddc5eaef4 Merge pull request #2529 from NubsCarson/codex/2509-mcp-tool-input-params
fix(mcp): expose MCP tool input parameters to the agent
2026-06-04 23:07:42 +02:00
Alexandre Teixeira
70812955d1 fix(tests): restore core module attrs in session owner test
Restores core.database/core.models/core.session_manager parent package attributes after session-owner test import stubs. Fixes one focused #2580 CI-baseline pollution bucket.
2026-06-04 21:43:25 +01:00
Kenny Van de Maele
64d65b73c1 feat: round-limit handling — Continue affordance at the cap + configurable cap (#1999)
* feat: round-limit handling — Continue affordance at the cap + configurable cap

When the agent loop runs out of rounds (per-message step cap, default 20)
while still actively using tools, it stopped silently mid-task. Now:

1. The loop emits a `rounds_exhausted` SSE event at the cap, and the UI shows
   a "Continue" pill at the bottom of the chat that resumes the task from where
   it left off. Repeated cap-hits each get a fresh Continue (multiple continues
   in a row).
2. The cap is configurable in Settings → Agent ("Max steps per message"),
   validated on the client, at the save endpoint, and at the read site.

- src/agent_loop.py: track `_exhausted_rounds` (set only when a full
  tool-executing round completes on the last allowed round — i.e. the agent
  wanted to keep going); emit `{"type":"rounds_exhausted","rounds":N}` (logged).
- routes/chat_routes.py: read `agent_max_rounds` (clamped 1..200), pass as
  `max_rounds`; forward the new event through the SSE relay.
- routes/auth_routes.py: validate numeric settings on save (int + clamp;
  agent_max_rounds 1..200, agent_max_tool_calls 0..1000; 400 on non-int).
- src/settings.py: default `agent_max_rounds = 20`.
- static/: Settings input + client-side clamp; the Continue pill (reuses the
  existing .stopped-indicator / .continue-btn classes and theme vars
  --border/--fg/--bg/--accent); appended to the chat container so it survives
  the message re-render at stream finalize. chat.js cache version bumped.

* test: cover rounds_exhausted emission (cap-hit vs normal finish)

Drives the real stream_agent_loop with mocked LLM stream / tool exec / settings:
a tool block every round exhausts the cap and must emit rounds_exhausted; a
plain answer hits the done-break and must not. Guards the for/else logic.
2026-06-04 22:36:05 +02:00
Alexandre Teixeira
a54f41037d fix(tests): restore src.database after webhook import
Restores both sys.modules and parent src.database package state after the webhook SSRF tests import src.webhook_manager against the real database module. Fixes one focused #2580 CI-baseline pollution bucket.
2026-06-04 21:21:51 +01:00
Alexandre Teixeira
3426e0cb5e fix(tests): isolate session route import stubs
Keeps src.request_models real and restores both sys.modules and parent routes.session_routes package attributes after temporary test stubs. Restores one focused part of the Python CI baseline tracked in #2580.
2026-06-04 21:05:52 +01:00
Ocean Bennett
e69298888b fix(history): block compact during active runs (#2635) 2026-06-04 21:50:16 +02:00
Kenny Van de Maele
67782e684e fix: exclude slash-command/setup messages from LLM context (#2634) (#2640)
Slash-command replies and the echoed /setup command are persisted to session
history so they render in the transcript, but they are UI chatter the user
never meant as conversation. They were sent to the model on the next turn,
which then commented on '/setup ...' and exposed transient values (e.g. the
Copilot device user_code) to the LLM.

- get_context_messages() (the LLM-API view) now skips messages tagged
  metadata.source == 'slash'. Display/history-load paths use raw history and
  are unaffected.
- slashCommands.js tags the echoed user command with source:'slash' too (the
  assistant replies already carried it); the user line was the one untagged
  path that still reached context.

Fixes #2634.
2026-06-04 21:42:23 +02:00
Afonso Coutinho
baf9179d94 Fix truncate_messages persisting an inflated message_count (#2052)
truncate_messages deletes db_messages[keep_count:] (a no-op when
keep_count >= the real message total) then unconditionally wrote
db_session.message_count = keep_count. When keep_count exceeds the
number of messages that actually exist — e.g. the manage_session AI
tool defaults keep_count to 10, and the HTTP truncate endpoint passes
any client value — the persisted count is set too high (10 on a
3-message session), diverging from the real row count. That column
gates lazy DB-hydration in get_session (message_count > 0) and is
surfaced to the history UI, so it is correctness-relevant. Clamp to
min(keep_count, len(db_messages)); the in-memory slice already caps
naturally.
2026-06-04 21:19:16 +02:00
Kenny Van de Maele
1cd0aa2b8c feat(provider): add GitHub Copilot provider with device-flow auth (#1480)
* feat(provider): add GitHub Copilot provider with device-flow auth

Adds GitHub Copilot as a model provider, so Copilot models (gpt-4o/4.1/5,
Claude, Gemini, …) work through the normal chat + agent loop, incl. native
tool calling and vision.

Auth is one-click via the GitHub OAuth device flow; the access token is stored
as the endpoint's (encrypted) api_key and sent directly as `Authorization:
Bearer` (no Copilot-token exchange, no refresh — matching how editors talk to
the Copilot API). Copilot is a normal ModelEndpoint detected by host; the only
provider-specific behaviour is a small set of required request headers,
injected centrally.

Sign-in is available from Settings → model endpoints ("Connect GitHub
Copilot") and from chat via `/setup copilot`.

- src/copilot.py (new), routes/copilot_routes.py (new): constants, header
  builders, device-flow start/poll, model discovery, owner-scoped endpoint
  provisioning.
- src/llm_core.py, src/endpoint_resolver.py: detect `copilot`, inject headers,
  per-request x-initiator/vision.
- src/agent_loop.py: allowlist api.githubcopilot.com for native tool schemas.
- src/model_context.py: known context windows for Copilot (no unauthenticated
  /models probe).
- static/, README, tests/test_copilot*.py.

* Tidy copilot_routes: clarify supports_tools, note _PENDING is per-process
2026-06-04 21:13:14 +02:00