Commit Graph

53 Commits

Author SHA1 Message Date
Afonso Coutinho
3d00c85636 fix: hwfit native quant labels miss the cost maps and over-estimate VRAM (#1690) 2026-06-03 14:22:42 +09:00
Stephen Purdue
85bc18b7d8 fix: fixed minor consistency issues within MemoryManager (#1353) 2026-06-03 14:12:24 +09:00
Shaw
d38fb4bc46 fix(tts): tolerate a malformed tts_speed instead of 500-ing (#1450)
synthesize() and get_stats() parsed the stored tts_speed with a bare
float(settings.get("tts_speed", "1")). The manage_settings agent tool maps
"speech speed"/"voice speed" to tts_speed and, because the setting's default is
a string, writes the value through unvalidated — so an agent (or a hand-edited
settings.json) can store "fast" or "". After that, GET /api/tts/stats and POST
/api/tts/synthesize both 500 with ValueError until the JSON is corrected by hand.

Parse defensively via a _safe_speed() helper (non-numeric/empty/<=0 -> 1.0),
mirroring the settings layer's tolerance of corrupt config.

Adds tests/test_tts_speed_malformed.py (stats + synthesize) — both raise
ValueError before this change and pass after.
2026-06-03 14:12:03 +09:00
Afonso Coutinho
04f8aa1833 fix: _lookup_bandwidth crashes on a truthy non-string gpu_name (#1641) 2026-06-03 14:11:10 +09:00
red person
d7a6cadbe2 Skip invalid memory extractor rows (#1535) 2026-06-03 14:07:00 +09:00
red person
ee8c049f9e Skip invalid skill extractor rows (#1546) 2026-06-03 14:06:53 +09:00
Afonso Coutinho
f29c827e6e Merge search analytics defaults in services copy
Make services.search.analytics tolerate missing counters in older or partial analytics files by merging loaded data over defaults, with regression coverage.
2026-06-03 13:45:07 +09:00
Mubashir R
61d62a3cb8 Fix memory bullet extraction in service copy
Fix services.memory bullet-list extraction by grouping the bullet/number regex before the capture, and cover both memory manager copies in the regression test.
2026-06-03 13:41:46 +09:00
Afonso Coutinho
13f0171ce8 fix: extract_youtube_id crashes on a non-string url instead of returning None (#1689) 2026-06-03 13:38:11 +09:00
Rolly Calma
933c461f38 fix: use running loop for shell stream deadlines (#1694) 2026-06-03 13:37:46 +09:00
Afonso Coutinho
9dd9bb8a3f fix: memory recall crashes on a non-dict row from the vector store (#1705) 2026-06-03 13:35:09 +09:00
Afonso Coutinho
86d3af743a fix: docs RAG query crashes on a non-dict row from the index (#1706) 2026-06-03 13:35:01 +09:00
Mubashir R
535d05c142 fix: SearchService.search() calls comprehensive_web_search incorrectly (broken public API) (#1720)
SearchService.search() did:

    raw_results = await comprehensive_web_search(
        query, max_results=10 * depth, fetch_content=fetch_content)

comprehensive_web_search is a synchronous function whose count knob is
`max_pages` (not `max_results`) and which has no `fetch_content` parameter, so
the call raised TypeError on argument binding; `await` on its non-coroutine
return would also fail. It returns a context string, or a (context, sources)
tuple with return_sources=True — not the list of dicts the wrapper iterates.

The method is exported in services/search/__init__.py and services/__init__.py
with a usage example in its docstring, so any caller of the documented public
API hit an immediate crash. Call it correctly via asyncio.to_thread with
max_pages + return_sources=True and use the returned source list as the rows.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:33:56 +09:00
Afonso Coutinho
c5bc39de88 fix: _extract_entities crashes on a non-string query (#1724) 2026-06-03 13:30:28 +09:00
Afonso Coutinho
0c37943267 fix: search service crashes on a non-dict result row (#1725) 2026-06-03 13:30:19 +09:00
Afonso Coutinho
6e38d3f2ef fix: youtube (services) comment formatter crashes on a non-dict comment (#1746) 2026-06-03 13:29:01 +09:00
Afonso Coutinho
d2f6e8068d fix: is_youtube_url (services) crashes on a non-string url (#1753) 2026-06-03 13:24:24 +09:00
Ethan
33bf975597 Stop GET /api/search/config from leaking the Brave API key (#1661) (#1750)
get_search_config returned SEARCH_CONFIG.copy(), and update_search_config
cached the decrypted Brave key into that shared global at startup
(app_initializer), so the unauthenticated /api/search/config route exposed
the operator's key. The cache was dead weight: brave_search reads its key
via _get_provider_key (settings/env), never SEARCH_CONFIG.

- update_search_config: no longer stores the api_key in the shared global
  (accepted for backward compat; provider keys are read on demand).
- get_search_config: scrub any string-valued credential field before
  returning, preserving the has_api_key presence flag.

No schema change; brave_search/_get_provider_key untouched. Adds regression
tests.

Fixes #1661

Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
2026-06-03 13:24:17 +09:00
pewdiepie-archdaemon
8e2b9baf19 Rebuild memory vector index from the full saved set, not just the audited owner (#1747)
audit_memories saves final_entries merged with other owners' entries
(correct), but then rebuilt the shared vector collection from
final_entries alone — wiping every other owner from semantic search
until they happened to run their own audit. Keyword fallback masked
it, so it degraded silently. Capture saved_entries once and rebuild
from that.

Caught by #1747.
2026-06-03 11:36:24 +09:00
red person
0ad5cd783b Skip invalid research service sources (#1583) 2026-06-03 08:57:09 +09:00
Afonso Coutinho
77313170c6 fix: search query helpers crash on a non-string query (#1604) 2026-06-03 08:36:01 +09:00
Shaw
b54468291e fix(hwfit): detect unified-memory NVIDIA (Grace Blackwell GB10 / DGX Spark) instead of 'No GPU' (#1340) (#1372)
_detect_nvidia parsed nvidia-smi --query-gpu=memory.total,name and did
float(memory.total) per row, dropping the row on ValueError. Grace Blackwell
GB10 (DGX Spark, sm_121) reports memory.total as '[N/A]'/'Not Supported'
because the GPU shares the system LPDDR pool rather than carrying discrete VRAM
— so the only GPU row was dropped and a real GB10 (even with vLLM running on it)
was reported as 'No GPU', breaking Cookbook recommendations and model switching.

Keep a named device whose memory.total is non-numeric: when there are no
discrete-VRAM rows but such unified devices exist, report a unified-memory CUDA
GPU backed by the system RAM pool (has_gpu, name, backend=cuda, count,
unified_memory=True) — mirroring how Apple Silicon and AMD APUs are already
handled. Discrete GPUs are unchanged, and a box with a real discrete GPU keeps
the discrete path.

Adds tests/test_hwfit_unified_nvidia.py with a GB10 nvidia-smi fixture: the
device is detected (not dropped), surfaces through detect_system with
unified_memory propagated, discrete GPUs stay non-unified, and a discrete GPU
takes precedence over an N/A-memory row.

Co-authored-by: NubsCarson <nubs@nubs.site>
2026-06-03 03:19:39 +09:00
Vykos
5ee30cc144 Scope skills usage by owner (#1312) 2026-06-03 02:27:43 +09:00
Afonso Coutinho
f62d6ea3d7 fix: research query misclassifies 'whatsapp'/'however' as questions (#1247)
* fix: detect question words as whole words, not prefixes

* fix: same question-word prefix bug in the services search copy

* test: question-word detection rejects prefix lookalikes
2026-06-03 01:10:06 +09:00
red person
cc6e43da44 Report provider-specific search API keys correctly (#1202)
* fix(search): report provider-specific API keys

* fix(search): include provider env keys in status
2026-06-02 23:37:15 +09:00
pewdiepie-archdaemon
ff93a6c63b Polish email and cookbook flows 2026-06-02 22:42:07 +09:00
Afonso Coutinho
2e2da2aefe fix: extract_statistics drops large numbers and trailing % signs (#1153)
* fix: extract_statistics misses comma-less numbers and drops trailing %

* fix: same extract_statistics number/percent bug in services copy

* test: extract_statistics captures full numbers and percent signs
2026-06-02 22:35:30 +09:00
Afonso Coutinho
2b2943a7b7 fix: extract_quotes accepts mismatched opening/closing quotes (#1113)
* fix: only extract quotes whose closing quote matches the opening one

* fix: same mismatched-quote bug in the services search copy

* test: extract_quotes requires matching open/close quotes
2026-06-02 22:34:52 +09:00
Leo
de92bbe47a Cookbook fit: steer consumer AMD to GGUF recommendations
* Cookbook fit: consumer-AMD GGUF recommendations + accurate estimates (core logic)

Split of #746 — the estimate/ranking MATH only, so it can be reviewed with tests
first (UI changes follow separately). Backend files only: no static/js here.

services/hwfit/fit.py, services/hwfit/hardware.py:
- Recommend GGUF/llama.cpp on consumer AMD (RDNA, gfx10/11/12) instead of
  formats that don't run on consumer Radeon — vLLM-only AWQ/GPTQ/FP8 AND
  vendor-specific NVFP4 (NVIDIA) / MLX (Apple). Datacenter Instinct (CDNA) and
  CUDA are left untouched.
- More accurate speed estimates across more GPUs (adds RDNA bandwidth data).
- Detect AMD/RDNA GPUs (gpu_family from rocminfo) so fit/serve can branch on it.

tests/test_hwfit_amd.py: AMD recommendation path, quant/bit matching, estimate
realism, gfx RDNA-vs-CDNA classification.

Rebased onto current main (analyze_model gained a scoring_use_case param there;
kept it). Vision detection intentionally NOT added here — main already ships a
"Vision" type filter + multimodal use-case handling; duplicating it was dropped.

Checks: py_compile clean; pytest tests/test_hwfit_amd.py + hwfit/serve suites
= 28 passed; full suite 0 new failures vs main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Tests: assert NVFP4/MLX/FP8 formats are filtered on consumer RDNA

Backs the #972 claim with an explicit regression: no NVIDIA NVFP4, Apple MLX,
or vLLM-only FP8/AWQ/GPTQ repos are recommended on a consumer Radeon, and guards
against vacuity by asserting such repos exist in the catalog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:01:42 +09:00
Tushar-Projects
c3228f8b59 Background tasks: respect active session model fallback 2026-06-02 20:57:42 +09:00
ghreprimand
aa0a9e8b5a Search: align service content extraction
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:53:07 +09:00
ghreprimand
eddb9ce6db Search: align service provider guards
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:52:13 +09:00
Tatlatat
51cf63009e TTS: include mp3 files in cache stats
TTSService._put_cache writes .mp3 for MP3 audio (ID3/MPEG-framed bytes) and
.wav otherwise, and the rest of the class treats both as cache entries
(_get_cache iterates (".mp3", ".wav"); eviction globs "*.*"). But
get_stats() enumerated the cache with `glob("*.wav")` only, so both
cache_entries and cache_size_mb undercounted — reporting 0 whenever the
cache held MP3 files, which is the common case for most TTS providers.

Glob both extensions so the reported stats match what's actually cached.

tests/test_tts_cache_stats.py writes an MP3-headed blob via _put_cache and
asserts get_stats() reports one entry with non-zero size. Fails before this
change.
2026-06-02 20:43:29 +09:00
Tatlatat
3885f9fa90 STT: clean temp audio files on transcription failure
STTService._transcribe_local writes the audio to a NamedTemporaryFile
(delete=False) and only unlinks it on the success path, before the except.
If model.transcribe() raises (corrupt audio, model/runtime error, etc.) the
function logs, returns None, and leaves the .webm temp file behind — so
every failed local transcription leaks a file in the system temp dir.

Initialize tmp_path = None up front and move the unlink into a finally
block so the temp file is cleaned up whether transcription succeeds or
raises.

tests/test_stt_leak.py stubs the whisper model to raise during transcribe,
runs _transcribe_local, and asserts it returns None and leaves no new .webm
file in the temp dir. Fails before this change.
2026-06-02 20:43:24 +09:00
ghidras
6ea8fec896 Cookbook: fix Windows NVIDIA VRAM detection
Co-authored-by: ghidras <ghidras@users.noreply.github.com>
2026-06-02 20:32:53 +09:00
spooky
cd4f496cb4 Fix native Cookbook quant classification 2026-06-02 13:07:20 +09:00
Leo
6fca7e86b7 Cookbook serve profiles and engine filter
* Cookbook: Engine filter + intelligent hardware-computed serve profiles

Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).

Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
  picker. Pure client-side view filter over the fetched list via the same
  _detectBackend() the serve commands use, so what you filter to is exactly what
  would launch. Re-renders from cache (no refetch). Empty-state message + the
  instant-cache-paint path account for it too.

Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
  model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
  context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
  instead of failing; a model that fits stays fully on GPU; quant tracks profile
  intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
  so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
  — partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
  context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
  quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
  --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
  previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
  Cache / Flash Attn fields. Context is clamped to the model's trained limit
  (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
  fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
  amdgpu ErrorDeviceLost.

Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook: make column-header sorting discoverable (incl. Newest)

Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.

- Style sortable headers as interactive: pointer cursor, hover underline, and
  the active sort column bolded/highlighted. There was no CSS for
  .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
  not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
  existing header-click sort wiring and the "newest" SORT_KEY.

No new sort control — uses the existing column-header paradigm.

Checks: node --check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)

In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.

- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
  mode), the quant is locked to the file's and profiles differ only in the real
  serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
  use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
  (no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
  the repo/file name) and passes them, so profiles match what's actually served.

Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.

Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor

Two serve-panel additions:

1. **Vision toggle.** A "Vision" checkbox that serves the model with its
   multimodal projector so it can read images. The mmproj path is resolved at
   runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
   the model folder makes the toggle just work; `--mmproj … --image-max-tokens
   1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.

2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
   while the panel is open and shows VRAM used/total/%, free, and — crucially on
   a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
   health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
   CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
   (previously read for total only and discarded for 'used').

Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.

Checks: node --check + py_compile pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:34:42 +09:00
pewdiepie-archdaemon
966b53df77 Improve Cookbook serve diagnostics and recommendations 2026-06-02 12:15:47 +09:00
elijaheck
c303a29670 Fix native macOS tailnet launch and Metal GPU probe (#756)
* macOS/Apple Silicon: detect Metal backend, surface MLX models, brew tmux hint

- hardware.py: add _detect_macos() via sysctl/system_profiler; report
  backend=metal + unified_memory on Apple Silicon instead of cpu_arm
- fit.py: add Apple Silicon (M1-M5) unified-memory bandwidths + metal
  FALLBACK_K so throughput estimates use the real bandwidth formula
- setup.py: Mac-specific 'brew install tmux' hint

Verified on M5 Pro 48GB: backend=metal, 273GB/s matched, 6 MLX models now
visible (were hidden), cuda still hides MLX, no new test failures.

* Fix native macOS tailnet launch and Metal GPU probe

---------

Co-authored-by: Elijah (Hermes) <hermes@local>
2026-06-02 11:41:04 +09:00
David Anderson
610968f91e fix: data integrity — deep-research result parsing + memory-extraction durability (#808)
Two independent data-integrity bugs:

- services/research/service.py: ResearchService.research() (the public deep-research
  API, re-exported from services/__init__) treated the handler return value as a
  dict (result.get("sources"/"summary"/...)), but call_research_service() returns a
  formatted markdown STRING -> AttributeError: str has no attribute get on EVERY
  successful call, making the API unusable for any non-error result. Now uses the
  string report as the summary and parses sources from the "### Sources" markdown
  section (section-bounded, URL-deduped), with a defensive dict branch for back-compat.

- services/memory/memory_extractor.py: extract_and_store guarded the vector-store
  find_similar/add calls only with the .healthy flag set ONCE at init. If the
  embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint
  down), those calls raised, the exception escaped the dedup loop, skipped
  memory_manager.save(), and was swallowed by the outer try/except -> EVERY
  validated fact from the session was silently lost (the function docstring
  promises "never raised"). Now falls back to the existing text/fuzzy dedup so
  facts are still saved when the vector index is unavailable at runtime.

Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.
2026-06-02 11:27:31 +09:00
Ernest Hysa
f4aef0dcf7 fix(skills): scope skill reads to caller owner (#777)
read_skill_md and read_skill_reference walk all skill files via
_iter_skill_files and return the first match by slug, regardless
of owner. In a multi-user deployment where two users have skills
with the same slug under different categories, a caller scoped
to owner='alice' can read Bob's skill content.

This is the same cross-tenant leak class as the update_skill /
delete_skill fix (PR #755, merged), but on the read path.

Changes:
- read_skill_md / read_skill_reference accept owner= param (default
  None = match ownerless only, matching the write-path convention).
- 7 callers updated: tool_implementations.py (view, view_ref, patch),
  builtin_actions.py (test_skills), skills_routes.py (audit, source,
  test routes).
- Tests: read scoping (alice reads hers, not bob's), positive update
  scoping (alice can mutate her own), ownerless-match default.
2026-06-02 11:21:27 +09:00
Abeelha
290cd7f1cd fix(stt): make local microphone transcription work without torch (#801)
faster-whisper runs on CTranslate2, not torch, but _get_whisper()
imported torch (only to check cuda availability) inside the same try as
the faster-whisper import. on a torch-less machine that raised
ImportError and reported the misleading 'faster-whisper not installed'
even when it was installed, so local mic transcription silently failed.

probe torch separately and optionally: present -> cuda, absent -> cpu.
also declare faster-whisper in requirements-optional.txt (torch stays an
optional extra for gpu).
2026-06-02 11:16:54 +09:00
mist
5ebe9ee67a Fix invalidate_search_cache using a key that never matches stored entries (#852)
invalidate_search_cache(query) built its cache key as
generate_cache_key(f"{query}|10|None"), but the write path
(searxng_search_results) replaces the caller's default count of 10 with the
admin-configured _get_result_count() (default 5) before building the key.

So a default search for "X" is cached under "X|5|None", while invalidation
looked for "X|10|None" — they never match, and invalidate_search_cache
silently failed to remove anything in the default configuration, violating
its docstring ("invalidate ... just the given query").

Derive the count from _get_result_count() so invalidation matches the
default-search entry the write path actually stores. The same bug (and fix)
applies to both the src/search and services/search copies.

Note: time-filtered variants (e.g. "X|5|day") still aren't reachable from a
query-only signature, since cache keys are opaque SHA-256 hashes with no
stored query; clearing those would need a broader cache-index redesign and is
out of scope here.

Adds tests/test_search_cache_invalidation.py covering the default-count case.
2026-06-02 10:53:33 +09:00
ghreprimand
d44f40b724 Honor disabled speech service toggles (#814)
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 10:44:39 +09:00
BSG-Walter
c0466274ed fix: resolve DuckDuckGo redirect URLs in HTML fallback search
The DuckDuckGo HTML fallback returns redirect URLs (//duckduckgo.com/l/?uddg=...)
instead of actual page URLs. This caused fetch_webpage_content() to reject them
instantly because _public_http_url() requires an http/https scheme, making search
results unfetchable in deep research mode.
Added _resolve_url() to:
- Convert protocol-relative URLs to absolute (https:)
- Convert path-relative URLs to absolute
- Extract the real URL from DuckDuckGo's /l/?uddg= redirect parameters
2026-06-01 19:42:01 -03:00
Ernest Hysa
d42e6a7acc Scope skill mutations to caller owner
SkillsManager.update_skill walks every SKILL.md on disk and matches by
slug only; the 'owner' key in its scalar_keys whitelist meant a caller
could pass updates={'owner': 'attacker', 'description': 'pwned'} and the
first matching file on disk got silently re-owned. Two users with the
same slug under different category directories (which is supported by
the on-disk layout <category>/<name>/SKILL.md) could each stomp the
other's skill via the manage_skills tool or the in-process callers in
tool_implementations.py (edit, patch, publish, delete).

update_skill and delete_skill now require the caller's owner and only
match a file whose parsed owner field matches. The default of None
means 'no scope' and only matches ownerless skills, so an unsafe call
without an explicit owner is now a no-op. 'owner' is also removed from
scalar_keys so the updates dict cannot be used to reassign ownership
even when the manager is called from an in-process path that didn't
supply the owner argument.

The in-process callers in tool_implementations.py are updated to pass
owner=owner (which was already in scope at every call site) so the
HTTP and agent paths both go through the scoped check. The HTTP route
at routes/skills_routes.py:1499 was already owner-scoped via
sm.load(owner=user); the fix brings the in-process path up to the
same standard.
2026-06-02 05:59:43 +09:00
Afonso Coutinho
9b1acf6612 Fix year extraction in research queries
* fix: extract full year in research query entities, not just the century

* fix: same year capture-group bug in the services search copy

* test: research query extracts the full year
2026-06-01 23:09:41 +09:00
spooky
033852ab14 fix: require GGUF sources for llama downloads (#368) 2026-06-01 22:47:47 +09:00
Sirsyorrz
9955f5bc95 Fix VRAM estimates for pre-quantized HF repos
The Cookbook fit scanner was reporting impossibly low VRAM requirements
for some pre-quantized models — e.g. cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit
shown as 7.1 GB ('perfect' on a 12 GB card) when the real load is ~40 GB.

Root cause is in the catalog builder. When _entry_from_modelinfo falls
back to safetensors metadata for the parameter count, it stored
safetensors.total directly. For pre-quantized repos that figure reflects
*packed* element counts: AWQ/GPTQ-Int4 pack 8x 4-bit weights into one
I32, AWQ-8bit/GPTQ-Int8/FP8 pack 4x. The catalog therefore recorded
~1/8 of the real parameter count, and min_vram_gb = packed * bpp
double-applied the quantization.

Fix the safetensors fallback:

* prefer the per-dtype parameters dict when available and unpack only the
  I32/I64 entries (the F16/BF16 scale/zero tensors and embeddings are
  already at their real element counts)
* fall back to total * pack_factor when only total is exposed

Patch the catalog entries that were affected by the old fallback so the
fit ratings reflect reality without waiting for a full catalog rebuild:

* cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit  11.4B -> 79.7B (40.8 GB VRAM)
* stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ  4.6B -> 30.5B
* stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ  5.1B -> 30.5B
* warshanks/Qwen3-8B-abliterated-AWQ  2.2B -> 8.2B
* QuantTrio/sarvam-30b-AWQ  7B -> 30B
* QuantTrio/sarvam-105b-AWQ  19B -> 105B

Closes #377.
2026-06-01 18:32:58 +09:00
Fernando Lazzarin
14e8cffa41 Fail closed on untrusted teacher draft confidence
Follow-up to #275. get_relevant_skills() treats a missing/unparseable
confidence as 1.0, so it always clears the injection threshold. For
teacher-escalation drafts -- auto-written from a possibly untrusted trace
and then injected as authoritative guidance -- that means a draft can be
auto-injected regardless of the configured confidence bar.

Require teacher-escalation drafts to carry an explicit, parseable
confidence that meets min_confidence; fail closed otherwise. Hand-authored
legacy drafts keep the lenient "unset -> keep" behavior so they don't
silently vanish, and published skills are unaffected.

Ran: python -m py_compile services/memory/skills.py + a get_relevant_skills
unit check (teacher drafts with None/garbage/0.8 excluded at min=0.85; 0.9
included; legacy + published unaffected; gate-off control unchanged).

Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 15:20:29 +09:00