22 Commits

Author SHA1 Message Date
Kenny Van de Maele
0a2adc9c96 Add ask_user tool: agent-posed multiple-choice questions (#2111)
Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.

Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.

- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
  Validates a non-empty question + 2..6 options, normalizes string/object
  options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
  the turn ends and waits for the user's selection. Stream the question as
  assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
  FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
  user can be asked); the structured args serialize via the default
  json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
  free-text Other + dismiss x; removed once answered). Reuses CSS vars and
  the .modal-close button; emoji go through the monochrome-SVG pipeline.
  Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
  cap, validation errors, serializer round-trip, registration.
2026-06-05 11:49:11 +02:00
Kenny Van de Maele
2be3779e6e feat: Add workspace: confine agent tools to a folder (#1103)
* feat: Add workspace: confine agent tools to a folder

Pick a server folder as the agent's workspace so its file/shell tools work
there and don't touch files outside it. File tools are hard-confined; bash/
python run with cwd set to the folder.

Includes a slash command: `/workspace` (alias `/ws`) — show / `set <path>` /
`clear` / `pick` (open the directory browser).

- routes/workspace_routes.py: GET /api/workspace/browse (admin-only).
- src/tool_execution.py: hard path confinement for read_file/write_file;
  bash/python cwd. Threaded route → stream_agent_loop → execute_tool_block.
- src/agent_loop.py: workspace note prepended to the system prompt.
- static/: overflow menu item, input-bar pill, directory-browser modal, and
  the /workspace slash command.
- tests/test_workspace_confine.py.

* Wire workspace confinement into tools that landed after this PR

edit_file (#1239) and grep/glob/ls (#1670) merged after workspace-confine was
written, so they bypassed the workspace boundary. Thread the workspace through:
  - edit_file: _do_edit_file resolves via _resolve_tool_path_in_workspace
  - grep/glob/ls: _resolve_search_root confines to the workspace (root + paths)
  - bash/python/bg cwd: workspace or _AGENT_WORKDIR (keep the #2586 data-dir
    default when no workspace is set)
Tests cover edit_file + grep/ls confinement (inside ok, outside rejected).

* Workspace picker: editable path bar + modal style cohesion + cross-platform hardening

- Make the current-folder strip an editable address bar: type/paste a full
  path and press Enter to navigate (also reaches other Windows drives and
  hidden dirs the up-only browser cannot).
- Reuse shared modal CSS: drop bespoke .workspace-modal-content/.workspace-btn*
  in favour of base .modal-content/.modal-body and the .confirm-btn button
  family; separators/hover use var(--border). Net -31 CSS lines.
- Fix the path field overflowing the modal right edge (flex stretch + margin
  vs an overflow:auto scrollbar-feedback loop): full-bleed, no h-margin.
- Cross-platform confinement: normcase the workspace commonpath check so
  containment holds on case-insensitive filesystems (Windows/macOS).
- Make tests OS-portable: sibling temp dirs instead of /etc, python os.getcwd()
  instead of pwd. 5 pass.
2026-06-05 00:06:37 +02:00
Kenny Van de Maele
64d65b73c1 feat: round-limit handling — Continue affordance at the cap + configurable cap (#1999)
* feat: round-limit handling — Continue affordance at the cap + configurable cap

When the agent loop runs out of rounds (per-message step cap, default 20)
while still actively using tools, it stopped silently mid-task. Now:

1. The loop emits a `rounds_exhausted` SSE event at the cap, and the UI shows
   a "Continue" pill at the bottom of the chat that resumes the task from where
   it left off. Repeated cap-hits each get a fresh Continue (multiple continues
   in a row).
2. The cap is configurable in Settings → Agent ("Max steps per message"),
   validated on the client, at the save endpoint, and at the read site.

- src/agent_loop.py: track `_exhausted_rounds` (set only when a full
  tool-executing round completes on the last allowed round — i.e. the agent
  wanted to keep going); emit `{"type":"rounds_exhausted","rounds":N}` (logged).
- routes/chat_routes.py: read `agent_max_rounds` (clamped 1..200), pass as
  `max_rounds`; forward the new event through the SSE relay.
- routes/auth_routes.py: validate numeric settings on save (int + clamp;
  agent_max_rounds 1..200, agent_max_tool_calls 0..1000; 400 on non-int).
- src/settings.py: default `agent_max_rounds = 20`.
- static/: Settings input + client-side clamp; the Continue pill (reuses the
  existing .stopped-indicator / .continue-btn classes and theme vars
  --border/--fg/--bg/--accent); appended to the chat container so it survives
  the message re-render at stream finalize. chat.js cache version bumped.

* test: cover rounds_exhausted emission (cap-hit vs normal finish)

Drives the real stream_agent_loop with mocked LLM stream / tool exec / settings:
a tool block every round exhausts the cap and must emit rounds_exhausted; a
plain answer hits the done-break and must not. Guards the for/else logic.
2026-06-04 22:36:05 +02:00
Massab K.
594775dc4b Fix issue 135 chat context bleed (#281)
* Fix issue 135 chat context bleed

* Guard task delivery metadata access
2026-06-04 13:27:46 +01:00
Alexander Kenley
7b45a94b6d Fix calendar routing and user-local time context (#408)
* fix(chat): add user-local time context

* fix(chat): route calendar follow-up phrasing

* refactor(chat): log tool intent routing reasons

* test(chat): align user time prompt shim

---------

Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-04 13:20:04 +01:00
Afonso Coutinho
290d398900 fix: rewriting a message is lost on reload due to a non-existent DB column (#1729) 2026-06-03 13:31:19 +09:00
Vykos
4771d80eb2 Harden session endpoint owner scope (#1308) 2026-06-03 02:40:22 +09:00
Paulo Victor Cordeiro
97f855b40d fix: pass owner to start_research in chat stream path (#1265)
* fix: pass owner to start_research in chat stream path

Research launched from the chat stream omits the owner parameter,
causing those research sessions to never appear in the user's
research library (which filters by owner). All other start_research
call sites in this file already pass owner=_user.

* test: assert all start_research calls in chat_routes pass owner

Uses AST inspection to verify every start_research() call site
includes the owner= keyword argument, preventing regressions where
new call sites forget to scope research by user.
2026-06-03 02:32:38 +09:00
Leo
6c15dc7d33 Chat metrics: surface backend generation speed
* Chat metrics: show backend's true generation t/s, not tokens÷wall-clock

The per-message tokens/sec read low and felt wrong because it was computed as
output_tokens / total_duration, where total_duration is wall-clock including
prefill, tool calls, and network — not pure decode time. llama.cpp already
reports the correct gen speed in its stream (timings.predicted_per_second), but
it was being dropped.

- llm_core.py: when parsing the OpenAI-compatible usage chunk, also read the
  sibling `timings` block llama.cpp includes — pass predicted_per_second through
  as gen_tps and prompt_per_second as prefill_tps on the usage event.
- agent_loop.py: capture backend_gen_tps/backend_prefill_tps from usage events;
  in _compute_final_metrics prefer backend_gen_tps over the wall-clock division
  when present (fall back to computed for cloud APIs that omit timings). Tag the
  result with tps_source ("backend" vs "computed") and surface prefill_tps.

Result: the displayed t/s now matches the model's real decode speed and is
stable regardless of prompt length (a long prefill no longer deflates it).

Checks: py_compile passes; verified extraction against a real llama.cpp final
chunk (gen 79 t/s surfaced vs the deflated wall-clock figure shown before).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Chat metrics: surface true t/s on the direct-chat path too

Follow-up to the gen-tps work: the non-agent direct-chat stream path in
chat_routes turned the raw `usage` event straight into a metrics event but only
copied token counts — it never set tokens_per_second or response_time. So simple
(non-tool) replies showed "Speed: n/a" / "Time: undefineds" and the chip fell
back to a bare token count ("27 tok") instead of t/s.

Map the usage event's gen_tps (llama.cpp timings.predicted_per_second, added in
the prior commit) into tokens_per_second here too, tag tps_source=backend, and
set response_time from wall-clock for the stats popup.

Checks: py_compile passes; verified llama.cpp emits usage+timings on the final
stream chunk (gen ~90 t/s) that this path consumes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Tests: backend gen/prefill t/s passthrough and preference

Cover the two pieces of the true-t/s metric so it can be reviewed on its own:
- stream_llm surfaces llama.cpp's timings.predicted_per_second /
  prompt_per_second as gen_tps / prefill_tps on the usage event (captured
  llama.cpp final-chunk fixture), and omits them when the backend reports no
  timings.
- _compute_final_metrics prefers backend_gen_tps over output/wall-clock,
  tags tps_source ("backend" vs "computed"), and surfaces prefill_tps.

Reuses the fake-client stream harness from test_llm_core_streaming.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 20:52:08 +09:00
ghreprimand
4cec31d988 Chat: route image sessions only to matching image endpoints
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:52:03 +09:00
Yavor Ivanov
7cc8fdb2f5 Models: avoid hidden models in default fallback
Both get_default_chat and _recover_empty_session_model picked the
first model from cached_models[0] without checking hidden_models.
If the first cached model was hidden (e.g. minimax-m3), it was
returned as the default or used to repair empty session models,
even though the model list endpoints already filter hidden_models.

- Add _visible_models() helper that filters cached_models by
  hidden_models (mirrors the filtering in list_model_endpoints)
- Use _visible_models() in get_default_chat fallback (when no
  explicit default_model is saved)
- Use _visible_models() in _recover_empty_session_model (when
  repairing a session whose model field is empty before chat send)
- Add regression tests for hidden-model filtering in default chat
  resolution, and unit tests for _visible_models helper
2026-06-02 20:37:14 +09:00
mechramc
493c815371 Chat: scope active document fallbacks by owner 2026-06-02 20:29:27 +09:00
Rasmus
e73f3edc06 fix: scope chat active-document lookup to the session owner (#569) 2026-06-02 11:46:40 +09:00
James Arslan
6776c7d691 Surface silent model fallback instead of masking it (#868)
When the selected model fails before producing output, stream_llm_with_fallback
quietly switches to the next candidate and the reply is shown under the
originally selected model's name, so a misconfigured provider looks like it
works. (Concretely: a Bedrock gateway that 400s every Anthropic/Claude request
appears fine because another model silently answers under the Claude label.)

Emit a `fallback` SSE event ({selected_model, answered_by, reason}) the first
time a non-primary candidate produces output, forward it through the agent loop
and both chat-route paths, stamp the response metrics with the model that
actually answered, and show a notice + relabel the reply in the UI.

Tested: python -m pytest tests/test_llm_core_fallback.py (3 pass);
python -m py_compile src/llm_core.py src/agent_loop.py routes/chat_routes.py;
node --check static/js/chat.js.
2026-06-02 11:37:25 +09:00
tanmayraut45
0e31c38be0 Support in-place endpoint updates and recover empty-model sessions (#786)
The "don't wipe endpoint_url/model on endpoint delete" half of #587 landed
in 6a78b02 (Fix endpoint model preservation for tasks). The three remaining
follow-up pieces from the original PR — flagged in the review on #786 —
are:

- routes/model_routes.py: toggle_model_endpoint (PATCH) now accepts
  api_key and base_url, so the admin UI can rotate a key or fix a typo'd
  URL without going through delete+recreate. base_url is normalized the
  same way the POST handler does (strip /models, /chat/completions,
  /completions, /v1/messages, then _normalize_base). Cache invalidation
  matches the POST/DELETE paths and the response includes base_url so the
  frontend can confirm what was saved.

- routes/chat_routes.py: new _recover_empty_session_model picks
  cached_models[0] from the endpoint that matches sess.endpoint_url and
  persists it onto the Session row before the LLM call goes out. Wired
  into both /api/chat and /api/chat_stream after the existing
  _clear_orphaned_session_endpoint guard, so the order is: drop
  truly-orphaned sessions first, then heal the "picker showed it, session
  never knew" case.

- routes/chat_routes.py: when recovery fails (no endpoint, no cached
  models) raise HTTP 400 with a clear message instead of letting
  model="" reach the upstream as 401/503.

Closes #587.
2026-06-02 11:26:38 +09:00
Mahdi Salmanzade
4a84a895a0 Keep reasoning (thinking) tokens out of the saved chat reply (#856)
Streamed deltas flagged thinking:true (reasoning-model traces) were being folded
into full_response and persisted as part of the assistant message, so saved
replies were polluted with the model's chain-of-thought. Forward those deltas to
the client (for a live thinking indicator) but exclude them from the accumulated
saved reply, in both chat and research-stream paths. Mirrors the existing rewrite
path's handling.
2026-06-02 11:17:41 +09:00
tanmayraut45
2d7d7b2412 Fix TOCTOU race in chat stream status endpoint
The /api/chat/stream_status handler did a membership test against
_active_streams followed by an indexed read of the same key. Between
those two ops, a sibling stream's finally block (or a stop / cleanup
path) can pop the entry, turning the indexed read into a KeyError that
bubbles up as a 500. The race is the exact one _stream_set was already
written to avoid; the comment on the helper at the top of the module
spells out why a single .get() is the right pattern here too.

Collapse the two-step into a single .get() call so the lookup either
returns the live record or None, and report 'detached' / 404 based on
that single read. No behavior change on the happy path; the failure
mode under concurrent stream cleanup is now handled deterministically.

Closes #658.
2026-06-02 03:02:30 +05:30
Rifqi Akram
5b1e56407b Add SSRF-guarded web fetch agent tool
* feat(web-fetch): add web_fetch tool to read a specific URL's content

* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution

Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.
2026-06-01 16:57:28 +09:00
Alexander Kenley
cb8a0b268d Route calendar action requests to tools
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 14:32:41 +09:00
Collin
0a7de1fdf4 fix: stop leaking DB connections when persisting session mode (#64)
chat_routes.py persisted a session's "mode" in three best-effort spots —
reading the current mode, writing the effective mode, and setting
research_pending on the stream path. Each opened a session with SessionLocal()
and called .close() as the LAST statement inside a try/except, so if anything
before close() raised (e.g. a SQLite "database is locked" under concurrent chat
streams) the except only logged and the connection was never returned to the
pool.

DATABASE_URL defaults to file-backed SQLite, whose engine uses SQLAlchemy's
default QueuePool (5 connections + 10 overflow). Repeated leaks on these hot
paths exhaust the pool; later requests then block for pool_timeout and fail
with "QueuePool limit ... reached", taking the app down until restart.

Move the logic into two best-effort helpers in core.database, next to the
existing session helpers (update_session_last_accessed, get_session_by_id):

  - get_session_mode(session_id) -> Optional[str]
  - set_session_mode(session_id, mode) -> bool

Both route through the existing get_db_session() context manager, which commits
on success, rolls back on error, and always closes in a finally, so the
connection is returned to the pool on every path. chat_routes.py now calls
these instead of hand-rolling sessions, also removing three copies of the same
try/except.

Add tests/test_session_mode_helpers.py: the helpers commit+close on success
and, on a mid-operation DB error, swallow + roll back + close (no leak). The
error-path tests fail against the old close()-inside-try pattern.
2026-06-01 13:57:48 +09:00
pewdiepie-archdaemon
fc7f107b22 Improve Ollama setup and model endpoint handling 2026-06-01 10:00:15 +09:00
pewdiepie-archdaemon
e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00