Fail closed on untrusted teacher draft confidence

Follow-up to #275. get_relevant_skills() treats a missing/unparseable
confidence as 1.0, so it always clears the injection threshold. For
teacher-escalation drafts -- auto-written from a possibly untrusted trace
and then injected as authoritative guidance -- that means a draft can be
auto-injected regardless of the configured confidence bar.

Require teacher-escalation drafts to carry an explicit, parseable
confidence that meets min_confidence; fail closed otherwise. Hand-authored
legacy drafts keep the lenient "unset -> keep" behavior so they don't
silently vanish, and published skills are unaffected.

Ran: python -m py_compile services/memory/skills.py + a get_relevant_skills
unit check (teacher drafts with None/garbage/0.8 excluded at min=0.85; 0.9
included; legacy + published unaffected; gate-off control unchanged).

Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Fernando Lazzarin
2026-06-01 03:20:29 -03:00
committed by GitHub
parent aad43a050b
commit 14e8cffa41

View File

@@ -577,6 +577,17 @@ class SkillsManager:
def _passes(s):
if s.get("status") == "published":
return True
# Teacher-escalation drafts are auto-written from a (possibly
# untrusted) trace and injected as authoritative guidance, so they
# must EARN injection with an explicit, parseable confidence that
# clears the bar — fail closed on a missing/garbage value instead
# of treating it as 1.0. Hand-authored legacy drafts keep the
# lenient "unset → keep" behavior so they don't silently vanish.
if s.get("source") == "teacher-escalation":
c = s.get("confidence")
if c is None:
return False
return _to_float(c, 0.0) >= min_confidence # unparseable → fail closed
c = s.get("confidence")
if c is None:
return True # unset → don't filter (legacy)