odysseus

Author	SHA1	Message	Date
SurprisedDuck	62f06ab740	Docs: respect path boundary when clearing exclusions add_directory cleared exclusions with a raw path.startswith(directory) test, which also matched sibling directories sharing a name prefix — adding /docs would silently un-exclude files under /docs2. Match the directory itself or paths under it (directory + os.sep) instead.	2026-06-02 20:35:44 +09:00
SurprisedDuck	78747b56ca	Documents: strip PDF marker without corrupting text _process_pdf prepends "\n\n[PDF content]:" to extracted text, and two call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:"). str.lstrip(chars) treats its argument as a set of characters, so it keeps eating into the page text that follows the marker — e.g. a body starting with "to the board" loses its leading "to" because 't'/'o' are in the marker's character set. Replace both sites with a shared strip_pdf_content_marker() helper that uses str.removeprefix.	2026-06-02 20:35:27 +09:00
Ernest Hysa	996a2027dd	Cookbook: surface pip install failures in logs _pip_install_fallback_chain silently discarded pip stderr via 2>/dev/null on every attempt. When pip failed (network error, venv mismatch, disk full), the wrapper exited 0 and the Cookbook UI showed the download as running — the silent-failure mode from #354. Extract _pip_install_attempt() which wraps each pip invocation in a bash -c subshell that captures output to a temp file, prints tail -5 on failure, cleans up, and exits with pip's real exit code. This avoids the \| tail pipefail masking (the first blocker on #363) while surfacing the last 5 lines of pip output in the tmux log so users can see what went wrong. Both local wrapper and remote SSH runner use the same helper through _pip_install_fallback_chain, so the fix is symmetric.	2026-06-02 20:34:52 +09:00
Hayk Arzumanyan	514050d098	Models: rewrite Docker loopback endpoints to host gateway In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio default http://localhost:1234/v1) targets the Odysseus container itself, not the host running the server, so the probe gets a connection error and the endpoint is rejected with a misleading 'No models found for that provider/key'. Rewrite loopback to host.docker.internal (which compose already maps to host-gateway) for the probe and the saved URL, mirroring the existing Ollama handling. Gated on actually being in a container with the gateway reachable, so native installs and gateway-less deploys are untouched. Fixes #25 Co-authored-by: Claude <noreply@anthropic.com>	2026-06-02 20:34:40 +09:00
SurprisedDuck	4307cac966	Research: report empty search provider results clearly Deep Research surfaced 'Error: unknown error' whenever every search provider returned an empty result set without raising (e.g. SearXNG is reachable but all its engines fail internally). _last_search_error was only set on exceptions, so the empty-but-no-exception path left it unset and the caller fell back to 'unknown error'. Record an actionable reason on that path naming the providers that were tried, so users can tell it's a search-backend problem rather than a model problem. The provider-raised path is unchanged. Re: #344. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 20:34:25 +09:00
Tatlatat	67517eaed1	Gallery: match image endpoint URLs with exact v1 suffix The image-edit endpoint lookup compared stored vs incoming base URLs with `.rstrip("/v1")`. `str.rstrip(chars)` treats its argument as a character set, not a suffix, so any URL ending in '/', 'v', or '1' is over-stripped (e.g. `http://host1/v1` -> `http://host`). Two endpoints that are not the same can then compare equal, or the real endpoint fails to match its own stored record, leaving `api_key` unset and sending the upstream image call unauthenticated. Use `.removesuffix("/v1")` (exact-suffix removal) with surrounding `.rstrip("/")` on both sides so only a genuine trailing `/v1` is dropped. Adds a focused test that parses the actual comparison expression out of gallery_routes.py via AST and evaluates it — it fails if the fix is reverted and uses no mocking.	2026-06-02 20:34:05 +09:00
tanmayraut45	4e440a9fd5	Hwfit: estimate params from config.json fallback `add_hwfit_models.py` infers `parameter_count` and `parameters_raw` by regexing the HF repo name for a `<num>B` token, optionally with an `-A<num>B` MoE active-param suffix. Repos that don't encode a size in their name at all (e.g. `zai-org/GLM-4.5`, where the "4.5" is a version not a parameter count) fall through to the safetensors element-count path. That path works for unquantized FP16 / BF16 repos but is brittle in two cases the catalog hits often: 1. Author-bulk runs (`AUTHORS = ["cyankiwi"]`) pull pre-quantized AWQ / GPTQ / MLX repos. The safetensors metadata stores the packed I32 tensors and a per-dtype `parameters` map, which the script unpacks via a per-quant pack factor. When the upload doesn't populate that map (older repos, custom shards), `st.total` is used raw and the parameter count is off by 4-8x. 2. Repos where the safetensors block is absent from `model_info()` entirely. The current code returns `None` and silently drops the model, which then has to be added to `EXTRA_REPOS` by hand with a literal `parameter_count` string. Both are exactly what the issue calls out — the regex / safetensors combo can't size GLM-4.5 by itself because the name has no `<num>B` and the upstream repo's safetensors block doesn't carry a usable param total either. Add a config.json fallback in front of the safetensors path: - `_fetch_config_json(repo_id)` downloads `config.json` via `hf_hub_download` (so the standard HF on-disk cache handles deduplication across runs, no extra cache layer needed). Network / 404 / gated-repo errors return `None` and the caller proceeds to the safetensors fallback. An in-process `_CONFIG_CACHE` dedupes the base-model vs. source-repo lookups within a single run. - `_params_from_config(cfg)` first honours explicit `num_parameters` / `n_params` / `total_params` fields when present. Otherwise it sums embeddings + attention (GQA-aware via `num_key_value_heads` and `head_dim`) + dense MLP (`3 * hidden_size * intermediate_size`, covering SwiGLU / GeGLU). For MoE configs it picks up both naming conventions in the wild — `num_experts` / `num_experts_per_tok` (Qwen3-MoE) and `n_routed_experts` / `n_shared_experts` (GLM-4-MoE, DeepSeek-V3) — uses `moe_intermediate_size`, and respects `first_k_dense_replace` so the first N layers stay dense. Active parameters come out as `num_experts_per_tok + n_shared_experts` of the routed experts, which matches how each architecture reports its active count. - In `_entry_from_modelinfo`, try config.json on the source repo first (works for unquantized models) and then on the `base_model:` parent (covers AWQ / GPTQ children whose own config is just a quantization manifest). Both lookups run only when regex + override + base_model tag all failed, so the normal author-bulk run still resolves sizes from names without touching the Hub. Spot-checks against the three architecture families this script actually pulls — within ~5% of the documented param counts, which is well inside the `parameter_count` rounding (one decimal of "B") and the `min_vram_gb` downstream bucket: Qwen2.5-7B-Instruct 7.62B (HF card: 7.6B) Qwen3-30B-A3B 30.5B / 3.34B active (card: 30.5B / 3.3B) GLM-4.5 352.7B / 33.6B active (card: 355B / 32B) The safetensors path is unchanged and remains the last resort, so repos with neither a parsable name nor a fetchable config.json behave exactly as before. Closes #955.	2026-06-02 20:33:25 +09:00
SurprisedDuck	d06b6d87d3	Models: prefer longest known context match KNOWN_CONTEXT_WINDOWS lists 'o1' (200k) before 'o1-mini' (128k), and _lookup_known returned on the first substring hit — so "o1-mini" matched 'o1' and reported 200000 instead of 128000. Track the longest matching key instead, so the most specific entry wins regardless of table order.	2026-06-02 20:33:09 +09:00
mist	0b0be3c339	Email: recognize forwarded message dividers `_ORIG_RE` (and its JS mirror `_TALON_ORIG_RE`) already recognised the Japanese forward marker `転送` alongside the "Original Message" delimiters, but not the English "Forwarded message" one. So Gmail-style forwards — including the ones Odysseus itself emits (`---------- Forwarded message ----------`, static/js/emailInbox.js) — were not treated as a quote boundary: - with a following Outlook From:/Date: header block, the divider line leaked into the level-0 reply bubble as noise; - with only the divider marking the forward (no header block), the body was not split into turns at all. Add `Forwarded\s+message` to the same `[-_=]{3,}`-delimited alternation in both the server-side parser and the JS mirror, so forward dividers are consumed as an attribution boundary like "----- Original Message -----". Locale variants of "Forwarded message" can follow the existing pattern. Tests cover both manifestations plus a negative control (the bare words "forwarded message" without `[-_=]{3,}` delimiters must not split). Checks: python -m pytest tests/test_forwarded_message_divider.py (3 passed), python -m py_compile src/email_thread_parser.py, node --check static/js/emailLibrary/utils.js, git diff --check.	2026-06-02 20:32:56 +09:00
ghidras	6ea8fec896	Cookbook: fix Windows NVIDIA VRAM detection Co-authored-by: ghidras <ghidras@users.noreply.github.com>	2026-06-02 20:32:53 +09:00
mist	e249fa4557	Tools: match keyword hints on word boundaries `get_tools_for_query` force-includes whole tool families when the query mentions an intent keyword, but matched with a raw substring test (`kw in ql`). Short hints therefore fired inside unrelated words, bloating the tool set with irrelevant tools: - "fix" matched "prefix" -> document tools - "line" matched "deadline"/"online" -> document tools - "serve" matched "observe"/"reserve" -> cookbook serve tools - "reply" matched "replying" -> all email tools - "unread" matched "unreadable" -> all email tools Match each keyword on word boundaries instead (`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to the keyword matcher in topic_analyzer.py. Genuine intent keywords ("reply to this email", "edit the document", "serve the model") still match. This only removes substring-inside-a-word matches; it does not change whole -word matches (so e.g. an unrelated whole word like "tell" is a separate keyword-choice question, left untouched here). Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed; 3 of them fail on the pre-fix substring code), python -m py_compile src/tool_index.py, git diff --check.	2026-06-02 20:32:20 +09:00
mist	8f0518c0ae	Presets: fill missing built-in defaults on load PresetManager.load already heals a forward-incompatible presets.json: the block just above repairs the legacy `custom` shape and re-saves the file. But if the file exists and is missing a whole built-in preset (e.g. an older install written before `reason` existed), load returned it as-is, so that built-in stayed permanently absent — silently missing from the picker that GET /api/presets feeds, with no way for the user to get it back. Extend the same self-heal: after the legacy migration, fill in any built-in presets the loaded file is missing, defaults-first so user edits win, and persist the result. This never clobbers an intentional removal — there is no delete path for the built-in keys (only user_templates entries can be deleted), and presets are hidden via an `enabled: False` flag, not removal. Checks: python -m pytest tests/test_preset_fill_missing_defaults.py (3 passed; 2 fail on the pre-fix code), the existing preset cases in tests/test_review_regressions.py still pass, python -m py_compile src/preset_manager.py, git diff --check.	2026-06-02 20:32:08 +09:00
Mahdi Salmanzade	280c29d572	Security: owner-scope v1 chat endpoint fallback The sync-chat endpoint's Case 3 fallback selected a ModelEndpoint with an unscoped `query(ModelEndpoint).filter(is_enabled == True).first()` and then used that row's decrypted `api_key` for the LLM call. ModelEndpoint is a per-user resource (owner non-null = private to that user), so a chat-scoped API token for user A that sent no session and no api_key could fall back onto user B's PRIVATE endpoint — spending B's API key/quota and reaching whatever internal base_url B configured. This is the same multi-tenant owner-scoping class already fixed for the session gate on this very endpoint (_caller_owns_session) and for companion/models. Scope the fallback to the token owner's own rows plus legacy null-owner (shared) rows via the existing owner_filter helper, matching routes/model_routes.py and companion/routes.py. A null/empty owner stays a no-op, preserving single-user/legacy behaviour. Add regression tests pinning the scoped fallback (cross-owner, shared-only, no-visible-row, disabled-owned, and the legacy null-owner no-op).	2026-06-02 20:31:35 +09:00
tanmayraut45	b5747e3979	Sessions: ignore list keydown while typing The list keyboard handler (_onSessionListKeydown) treats Backspace and Delete as "delete the focused session". When the user double-clicks a chat to rename it, an <input class="session-rename-input"> is mounted inside the .list-item row. Backspace on the input bubbles up to the list container, the handler walks closest('.list-item[data-session-id]') from e.target, finds the parent row and DELETEs the session via the API — so a single typo correction nukes the whole conversation. Bail out at the top of the handler when e.target is an INPUT, TEXTAREA, or contentEditable element. Arrow / Enter / Delete navigation still works for rows themselves (the row is the focused element then, not the input). Mirrors the guard pattern already used in ui.js, notes.js, tasks.js, calendar.js, emailLibrary.js and galleryEditor.js. Closes #1007.	2026-06-02 20:30:16 +09:00
Refuse	323f027865	Security: sanitize export and gallery filenames Co-authored-by: RefuseOdd <refuseodd@users.noreply.github.com>	2026-06-02 20:29:56 +09:00
Refuse	4218bfe71e	Tools: restrict app_api and serve_preset to admins Co-authored-by: RefuseOdd <refuseodd@users.noreply.github.com>	2026-06-02 20:29:47 +09:00
Lohinth	12ba535c7d	Companion: fix pairing admin guard import Co-authored-by: Lohinth <lohinth25@proton.me>	2026-06-02 20:29:37 +09:00
mechramc	493c815371	Chat: scope active document fallbacks by owner	2026-06-02 20:29:27 +09:00
Tatlatat	cd247ed107	Skills: delete owner-scoped skills with owner The DELETE /api/skills/{skill_id} handler resolves the caller, loads the skill with skills_manager.load(owner=user), and verifies ownership with _verify_owner(match, user) — but then calls skills_manager.delete_skill(match.get("name")) without the owner. SkillsManager.delete_skill filters candidates with `(sk.owner or "") != (owner or "")`, so when owner is None an owner-scoped skill is skipped and the method returns False. The route then raises a spurious 404 "Skill not found" — meaning a logged-in user can never delete their own skills through the API. Pass the resolved owner through to delete_skill so the skill is matched and removed. tests/test_skills_delete_owner.py drops a real owner-scoped SKILL.md on disk and (1) checks the manager directly: delete_skill without owner returns False (regression lock) while delete_skill(owner="alice") returns True and removes the dir; (2) drives the real DELETE route handler and asserts it returns {"ok": True} and deletes the file. The route test fails before this change (404). Real SkillsManager + real filesystem, no mocking.	2026-06-02 20:28:36 +09:00
Tatlatat	9389cabed0	API keys: skip undecryptable entries on load APIKeyManager.load() decrypts every stored key with a dict comprehension and no error handling. If the .key file no longer matches the ciphertext in api_keys.json — key rotated, a partial/!mismatched data restore, or a corrupted .key — Fernet.decrypt raises cryptography.fernet.InvalidToken. app_initializer.py calls api_key_manager.load() during startup, so a single undecryptable entry takes down the whole app at boot, and the user can't reach the UI to fix it. Decrypt each key in a loop and, on InvalidToken/ValueError, log a warning and skip that one entry while still returning every key that decrypts cleanly. One bad/stale key no longer blocks startup. tests/test_api_key_manager_resilience.py saves a valid key, then injects an entry encrypted under a different Fernet key (InvalidToken) and a malformed token (ValueError), and asserts load() returns the good key and skips the bad ones without raising. Fails before this change.	2026-06-02 20:28:26 +09:00
Tatlatat	da3876c168	Webhook: block IPv6 SSRF bypasses The webhook URL guard's _ip_is_private() only checks a hardcoded _PRIVATE_NETWORKS list, which misses several addresses that route internally. validate_webhook_url() therefore ALLOWED: - http://[::]/ (IPv6 unspecified, reaches localhost) - http://[::ffff:127.0.0.1]/ (IPv4-mapped IPv6 loopback = 127.0.0.1) - http://[::ffff:169.254.169.254]/ (IPv4-mapped cloud metadata endpoint) The last one is the dangerous case: a webhook pointed at the mapped 169.254.169.254 can pull cloud instance credentials (SSRF -> credential theft). Harden _ip_is_private(): first unwrap IPv4-mapped IPv6 to its embedded IPv4 (addr.ipv4_mapped), then reject via the stdlib address properties (is_private, is_loopback, is_link_local, is_reserved, is_multicast, is_unspecified) in addition to the existing network list. Public addresses still pass. tests/test_webhook_ssrf_resilience.py asserts validate_webhook_url raises for the three IPv6 bypasses plus 127.0.0.1 and 0.0.0.0, and still accepts a public IP literal. The IPv6 cases fail before this change.	2026-06-02 20:28:12 +09:00
ghreprimand	431b98525b	Email: persist bulk read state to provider Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 20:28:01 +09:00
tanmayraut45	6c654fb0ef	Models: detect bare Ollama URLs as online _ping_endpoint() is the reachability fallback the model-endpoint POST handler invokes when _probe_endpoint() returns no model ids. It GETs base + "/models" and, on any sub-500 response, returns immediately with `reachable = (status < 400)`. That early return runs before the Ollama-native /api/version / /api/tags fallback below it. For an Ollama URL without /v1 (the quickstart accepts both http://localhost:11434 and http://127.0.0.1:11434, and the reporter on #1025 explicitly tried both), the OpenAI-style probe target is http://127.0.0.1:11434/models. Ollama returns 404 there because /models only lives under /v1. _ping_endpoint then returned reachable=False and the picker showed "Added (offline — will retry on next load)" on an install that was running fine. /api/version was never tried. Same shape for http://127.0.0.1:11434/api (the native Ollama root): /api/models is also 404, same premature offline verdict. _probe_endpoint() does fall through to /api/tags on a 4xx (the response raises via raise_for_status), so the endpoint quietly recovers once cached_models becomes non-empty on the next background refresh — matching the second commenter's "had to disconnect manually then reconnect for it to be detected" note. The bug is most visible while no models are pulled yet (cached_models stays empty, _ping_endpoint keeps voting offline). Fix: - Hoist the Ollama-shaped-URL test (port == 11434 or "ollama" in hostname — the same condition _probe_endpoint already uses) to the top of the function so both code paths share it. - Stop short-circuiting on 4xx when the URL looks like Ollama: fall through to the existing /api/version + /api/tags reachability loop so an alive Ollama gets recognised even when its OpenAI surface has the wrong prefix for the user's input. - Fix the `root` computation in that loop to strip a trailing /api as well as /v1, so http://127.0.0.1:11434/api no longer gets probed at /api/api/version. - 4xx on non-Ollama hosts keeps the current semantics: a 401 from api.openai.com/v1/models is still a definitive offline verdict, not a reason to GET /api/version on OpenAI. Closes #1025.	2026-06-02 20:27:41 +09:00
Ernest Hysa	a8a34bd22a	Ollama: pass discovered num_ctx in chat requests _build_ollama_payload sends options.temperature and options.num_predict to /api/chat, but never options.num_ctx. Ollama defaults num_ctx to 2048 when the option is omitted, so prompts going to any Ollama backend are silently truncated there regardless of the model's actual capability. Thread the discovered context length through the three call sites (llm_call, llm_call_async, stream_llm) and emit options.num_ctx when it is known and positive. The builder filters out the DEFAULT_CONTEXT fallback (128000) so we don't lie to Ollama about models whose window we couldn't actually discover. The issue's literal 'when > 2048' heuristic is dropped: a model with a real context smaller than 2048 would OOM if Ollama used its default, so we pass the real value regardless of size. Matches how src/context_compactor.py uses the same helper. Sister fix to PR #753 — that PR teaches the compactor the right budget, this one tells Ollama to actually use that budget on the way in.	2026-06-02 20:27:24 +09:00
Alexandre Teixeira	f6b0dcbe58	Tests: companion model JSON resilience	2026-06-02 13:15:22 +09:00
mechramc	9d0a18a5b5	Email: add explicit SMTP security mode	2026-06-02 13:15:06 +09:00
Wes Huber	ccc0b9ab0c	Setup: prompt for first-run admin credentials * feat(setup): prompt for admin credentials interactively on first run When setup.py runs in a terminal (TTY) without env vars set, it now asks the user to choose a username and password instead of generating a random one that scrolls off-screen. Includes confirmation prompt to catch typos. Existing behavior is preserved: - ODYSSEUS_ADMIN_USER + ODYSSEUS_ADMIN_PASSWORD env vars take priority - Non-interactive contexts (Docker, CI) still get a random password - ODYSSEUS_SKIP_ADMIN_PROMPT=1 opts out of the interactive prompt - Re-runs still skip if auth.json already exists Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(macos): use venv Python for pip install and uvicorn launch On PEP 668 systems (newer Homebrew Python), pip install outside a venv is rejected. The script creates a venv but then called the system $PY for pip and uvicorn. Switch to ./venv/bin/python for both. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert "fix(macos): use venv Python for pip install and uvicorn launch" This reverts commit 7a1be956659d86183da2edcde2114eb363efd3e4. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-02 13:14:37 +09:00
danielxb	5268a546bc	Model picker: group models by provider Rebased on current main. Integrates with the new Recent/Favorites system — provider groups appear below Recent and Favorites in browse mode for large catalogs (>12 models). Changes: - Models grouped by canonical provider with collapsible sections - Chevron animation consistent with sidebar sections - Domino cascade on expand (only on just-opened group) - Provider display names (deepseek-ai -> DeepSeek, meta -> Llama, etc.) - Alias merging (meta + meta-llama -> one Llama group) - Search includes provider display names for filtering - Collapsed state persists in localStorage - No screenshot binary committed Co-authored-by: danielxb <5981902+danielxb@users.noreply.github.com>	2026-06-02 13:14:22 +09:00
spooky	cd4f496cb4	Fix native Cookbook quant classification	2026-06-02 13:07:20 +09:00
MohammadYusif	65b5d65059	fix(agent): extract web search sources from output key tool_execution.py returns web search results as {"output": ..., "exit_code": 0}. The sources-extraction block in stream_agent_loop only checked result.get("results") and result.get("stdout"), so _src_text was always "" for every tool-call-mode web search. Two consequences: 1. The SOURCES marker was never parsed and the web_sources SSE event was never emitted -- the sources panel never appeared after agent-mode searches. 2. The marker (a large JSON blob) was left in result["output"] and forwarded verbatim to the LLM in round 2 via format_tool_result, confusing some local models into producing no tokens. Fix: prepend result.get("output") to the lookup chain, and update the cleanup assignment so result["output"] is overwritten with the stripped text. Adds six regression tests in tests/test_agent_loop.py documenting the before/after behaviour and verifying backward compat with the legacy results/stdout paths. Co-authored-by: MohammadYusif <MohammadYusif@users.noreply.github.com>	2026-06-02 13:06:09 +09:00
Stephen Yue	d46c406bd8	Fix Cookbook fit column sorting The Fit column shared the Score column's sort key, so clicking the Fit header sorted by Score instead of by hardware fit. There was also no fit option in the hidden sort <select> and no fit branch in the client-side comparator. - Give the Fit column its own sort key (fit). - Add a fit option to the sort select (kept Score as the default so first-load ordering is unchanged). - Sort by the categorical fit_level rank (perfect > good > marginal > too_tight), tie-broken by score, honoring the ascending/descending toggle. Fixes #842 Co-authored-by: SabixMaru <285860855+SabixMaru@users.noreply.github.com>	2026-06-02 13:05:53 +09:00
Alexandre Teixeira	e129378014	Clarify private deployment hardening docs Document safer defaults and deployment guidance for network-accessible Odysseus installs. The guidance emphasizes keeping auth enabled, disabling localhost bypass outside development, using secure cookies for HTTPS/reverse-proxy deployments, and exposing only the authenticated Odysseus entrypoint through a trusted proxy or private access layer. Also clarify that bundled services, databases, vector stores, notification services, and raw model/provider APIs should remain internal-only. This is documentation and config-example only. It does not change runtime behavior.	2026-06-02 13:01:12 +09:00
Juan Pablo Jiménez	eda99360d1	Fix Cookbook dependency install completion state * Fix Cookbook dependency install completion state Mark Cookbook dependency installs as complete when the background runner exits successfully, even when HuggingFace-specific download markers are absent. * Add focused regression coverage for cookbook dependency completion. Keep the fix narrowly scoped while carrying env_path through dependency tasks and locking the completion reconciliation behavior with targeted tests.	2026-06-02 12:59:29 +09:00
Tatlatat	acfdcf346c	fix(agent): map native google_search and surface empty rounds Models (notably Gemini) emit a native 'google_search' function call, but the agent loop had no mapping for it, so the call failed to convert, the round produced 0 chars and 0 tool blocks, and generation died silently — the web client hung on 'waiting for first token' with no error (also #443). - Map google_search / google_search_retrieval / google_search_grounding to the web_search tool, and read Gemini's 'queries' array (falling back to 'query'). - In stream_agent_loop, when a round yields no response text and no tool events, emit a visible fallback message instead of leaving the user hanging. - Give the unknown-tool execution branch an explicit exit_code=1 so the failure is logged as an error rather than 'n/a'. Unknown/unconvertible tool names still return None (unchanged) so they are dropped safely rather than executed. Added tests covering the google_search mapping, the queries array, and unknown/invalid-JSON returning None.	2026-06-02 12:57:45 +09:00
Alexandre Teixeira	5607db85d4	tests: cover companion models route filtering	2026-06-02 12:57:32 +09:00
Boody	97528be0f4	Add custom web search result count * fixed confusing credentials prompt * fix(setup): return status from create_default_admin function * fix(setup): initialize admin creation status in main function * fix(setup): enhance admin creation feedback and status handling * Enhance admin user login messages with conditional feedback based on creation status * Refine admin user creation feedback messages for clarity and actionability and formatted code * Add fallback error message for admin creation failure in setup script * Add run script for Uvicorn with dotenv integration * Refactor server runner to use argparse for host and port configuration * Remove captured output print statement from server runner * Fix server runner to ensure cross-platform compatibility and improve log handling * Remove run.py script to match main repo * feat: add custom option for search result count in settings * fix: enforce minimum and maximum values for custom search result count	2026-06-02 12:55:15 +09:00
Sheikh Rahat Mahmud	e2ba068cbc	Add provider endpoint resolver tests The existing test_endpoint_resolver.py copies the pure functions to avoid import side effects, so its assertions can silently drift from the shipped src/endpoint_resolver.py (the copies already lag: no OpenRouter headers, no anthropic.com host matching). This adds a sibling module that imports the REAL resolver and locks in behavior for every provider named in ROADMAP.md's "Provider setup/probing audit" — Anthropic, Gemini, Groq, xAI, OpenRouter, OpenAI, DeepSeek — plus Ollama (local + cloud) and the Tailscale self-host fallback in resolve_url. Covers build_chat_url, build_models_url, build_headers, normalize_base, _first_chat_model, _anthropic_api_root, _ollama_api_root, and resolve_url. conftest.py already stubs the heavy deps, so the import is side-effect free. Test-only; no behavior change. 55 new tests, all passing. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 12:53:50 +09:00
ooovenenoso	1a7b90623c	Prefer Python 3.11+ in Windows launcher	2026-06-02 12:50:58 +09:00
spooky	0f3280ee05	Expose advanced llama.cpp serve controls	2026-06-02 12:46:16 +09:00
Mahdi Salmanzade	05fb48e9d5	Add admin-only companion pairing Split 3/4 of the companion bridge (#863, #871 landed 1/4 and 2/4). Adds admin-only device pairing to the companion router. - GET /api/companion/pair -- renders a form; never mints (a GET must not mint a credential: SameSite=Lax session cookies ride top-level GET navigations, so GET-minting would be CSRF-triggerable via a link/<img>) - POST /api/companion/pair -- mints a one-time chat-scoped token. Admin-cookie only; CSRF-safe because a SameSite=Lax cookie is not sent on a cross-site POST, the same protection POST /api/tokens relies on. ?format=json returns the pairing payload for an in-app screen. Minting invalidates the auth middleware's token cache so the code works on the next request with no restart. companion/pairing.py holds the mint/LAN/QR helpers; the token is shown once and stored only as a bcrypt hash + prefix (mirrors routes/api_token_routes.py). Tests (tests/test_companion_pairing.py): - a bearer/'api' caller and a non-admin user are rejected by require_admin (403); an admin passes - the token is returned once and persisted only as a hash - minting invalidates the cache (works without restart) - minting is exposed on POST, never GET (CSRF)	2026-06-02 12:43:50 +09:00
Zeus-Deus	19a4f823a4	Rename Character copy to Persona Issue #234: the "Character" tab and its "Style of response" label made it unclear that this is where a system prompt is set. Rename the user-facing labels for clarity: - "Character" tab + section heading -> "Persona" - "Style of response" -> "System prompt" - supporting strings: select placeholder, name placeholder, button/title text, toasts, confirm/notice text, the chat-bar indicator tooltip, the settings visibility toggle, and the assistant personality picker ("Characters" optgroup -> "Personas"). Used "Persona" rather than the issue's suggested "Preset" because the app already has a distinct, user-facing "Presets" concept (built-in presets like Code Analyze/Brainstorm/Reason, shown as their own group in the assistant picker). "Persona" matches what this tab actually creates -- a named persona with its own memories -- without colliding with that term. Internal identifiers (element IDs, data-chartab attributes, function names) and the character_name backend field are intentionally left unchanged so existing saved presets and JS wiring keep working.	2026-06-02 12:42:15 +09:00
Collin	c90a7a19a5	Add dialog accessibility semantics Screen readers got no signal that a dialog opened — not one modal carried role="dialog" — and several close buttons had no accessible name. - The 6 static tool windows (Brain, Theme, Prompt, Rename session, Cookbook, Settings) now carry role="dialog" + an accessible name. They are dockable, tiling windows, so they are non-modal dialogs (intentionally no aria-modal). - The four unlabelled close buttons (theme, prompt, cookbook, settings) get an aria-label so they no longer read as just "heavy multiplication x". - styledConfirm / styledPrompt ARE blocking modals: they get role="dialog" + aria-modal="true" + aria-labelledby/aria-describedby, and now manage focus — restore focus to the triggering element on close and trap Tab within the dialog (they already moved focus in on open). tests/test_dialog_aria.py pins the roles, labels, and focus management.	2026-06-02 12:41:25 +09:00
ghreprimand	77611f0491	Scope memory consolidation by owner group Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 12:40:28 +09:00
Mihail Filippov	3d109cbaca	Add explicit open-signup state endpoint * Refactor open registration state switching * Rename endpoint to open-signup	2026-06-02 12:35:54 +09:00
Leo	6fca7e86b7	Cookbook serve profiles and engine filter * Cookbook: Engine filter + intelligent hardware-computed serve profiles Two related Cookbook serving improvements for accurate, hardware-aware model serving (especially on consumer GPUs that can only run GGUF/llama.cpp). Engine filter - New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant picker. Pure client-side view filter over the fetched list via the same _detectBackend() the serve commands use, so what you filter to is exactly what would launch. Re-renders from cache (no refetch). Empty-state message + the instant-cache-paint path account for it too. Intelligent serve profiles (Quality / Balanced / Speed) - services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM + model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type, context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU instead of failing; a model that fits stays fully on GPU; quant tracks profile intent; vision models keep image-encoder headroom. Reuses models.py VRAM math so filtering and serving agree on what fits. Pure/deterministic (no t/s claims — partial-offload speed isn't reliably predictable; fit is what's computed). - /api/hwfit/profiles endpoint returns the profiles + the model's trained context limit, with loose name matching (strips org/ prefix, -GGUF suffix, quant tag) so a local GGUF folder name resolves to its catalog entry. - _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn / --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It previously only set -ngl/-c, which is why it OOM'd or ran slow. - Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV Cache / Flash Attn fields. Context is clamped to the model's trained limit (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch — fixes a crash where a stale 256k/16M preset + quantized KV cache caused an amdgpu ErrorDeviceLost. Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed VRAM, context cap, launchable flags, vision headroom, no-GPU empty. Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k, matching hand-tuning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook: make column-header sorting discoverable (incl. Newest) Sorting in Cookbook is via clickable column headers (pewds' design), but the headers had no visual cue that they're interactive — so sorting in general, and the Newest sort on the Model header specifically, was undiscoverable. - Style sortable headers as interactive: pointer cursor, hover underline, and the active sort column bolded/highlighted. There was no CSS for .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort, not just Newest. - The Model column header sorts by release_date (newest first), reusing the existing header-click sort wiring and the "newest" SORT_KEY. No new sort control — uses the existing column-header paradigm. Checks: node --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2) In the Serve tab the model is a specific GGUF file already on disk, so its quant can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K" as if you could re-quantize it. That's meaningless when serving a fixed file. - compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE mode), the quant is locked to the file's and profiles differ only in the real serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode (no override) still varies the quant to show download options. - /api/hwfit/profiles accepts serve_weights_gb & serve_quant. - The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from the repo/file name) and passes them, so profiles match what's actually served. Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k ncm15) — no nonsensical quant changes. Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor Two serve-panel additions: 1. Vision toggle. A "Vision" checkbox that serves the model with its multimodal projector so it can read images. The mmproj path is resolved at runtime (find mmproj-.gguf next to the model), so dropping an mmproj file in the model folder makes the toggle just work; `--mmproj … --image-max-tokens 1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found. 2. Live GPU-memory monitor.* A readout that polls /api/cookbook/gpus every 4s while the panel is open and shows VRAM used/total/%, free, and — crucially on a discrete card — RAM spillover (AMD gtt_used_mb), with a plain-language health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint (previously read for total only and discarded for 'used'). Lets you see at a glance whether a config fits VRAM (fast) or is paging to system RAM over PCIe (slow) instead of guessing. Checks: node --check + py_compile pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:34:42 +09:00
spooky	8b3c0d8ad4	feat: select cached gguf artifacts for serve (#891 )	2026-06-02 12:32:40 +09:00
Alexandre Teixeira	8455b88643	Improve Docker GPU setup diagnostics (#705 ) * Improve Docker GPU setup diagnostics Add a Docker GPU preflight script for NVIDIA users. The script is read-only by default, checks host NVIDIA drivers, Docker availability, and container GPU passthrough, and prints actionable next steps. Add explicit opt-in modes to print install commands, install NVIDIA Container Toolkit on Ubuntu/Debian, and enable the NVIDIA Compose overlay in .env after passthrough is verified. Document common NVIDIA Docker failure modes, ignore generated .env backups, and clarify that Cookbook can only detect GPUs exposed to the Odysseus container. * Clarify Docker GPU diagnostic limits	2026-06-02 12:30:40 +09:00
Sirsyorrz	517aa593e0	Cookbook: clearer tooltips on saved-config badge and GPU chip (#850 ) Two small polish items in the Cookbook Serve panel. Saved-config badge The little count badge next to the Save button ("3 ▾" etc.) had a generic "Saved launch configs" tooltip, so the number reads like a notification dot. Make it spell out what it is and what clicking does: "3 saved launch configs for <model> — click ▾ to load or delete" (and "No saved launch configs for <model> yet — click Save to add one" when empty). Tooltip stays in sync via _updateSavedToggleLabel so save/delete updates both the count and the hint. GPU chip on mixed-GPU boxes (#711) The chip label was `${gpuCount}x ${gpu_name}`, where gpu_name is just gpus[0].name — so a 4090 + 3060 reads as "2x RTX 4090". The backend already emits gpu_groups (identical cards grouped, used by the serve flow to pin CUDA_VISIBLE_DEVICES) and a per-card gpus[] array, so use them: - Label renders each homogeneous pool: "1× RTX 4090 + 1× RTX 3060". Homogeneous setups keep the existing "2× RTX 4090" form. - Tooltip lists each GPU with its index + VRAM, useful for picking the right device when launching. Refs #711.	2026-06-02 12:30:24 +09:00
Dustin	bd3204fe96	Diagnose vLLM device detection failure with actionable suggestion (#778 ) Adds a diagnosis pattern for the 'Failed to infer device type' error vLLM raises when no CUDA or ROCm GPU is found (e.g. systems with only integrated or Intel Xe graphics). The existing pattern only caught 'No CUDA GPUs are available' which fires later in startup; this new entry catches the earlier device-probe failure and the NVML/amdsmi library-not-found messages that precede it. Surfaces in the Cookbook serve card as: "vLLM could not find a supported GPU — switch to llama.cpp or Ollama" instead of a raw Python traceback. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-02 12:30:07 +09:00
IBR-41379	385c3c3cf3	fix: use sys.executable for Cookbook model cache scan on Windows (#627 ) Windows has 'App Execution Aliases' that can make shutil.which('python3') and shutil.which('python') resolve to a Microsoft Store stub instead of real Python -- even when Python is properly installed. The stub outputs: 'Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Apps > Advanced app settings > App execution aliases.' and exits 9009, producing empty stdout. The JSON parse of the local model cache scan then fails with 'Expecting value: line 1 column 1 (char 0)', and the Cookbook model list shows nothing. Fix: prefer sys.executable as the interpreter for the local scan. Odysseus already runs inside its own venv, so sys.executable always points to the real venv Python and bypasses PATH / Store alias lookup entirely. which_tool() is kept as a fallback. Cross-platform: sys.executable works identically on Linux and macOS (returns the real interpreter path), so this change is safe everywhere.	2026-06-02 12:29:40 +09:00

1 2 3 4 5 ...

328 Commits