Fail closed on untrusted teacher draft confidence
Follow-up to #275. get_relevant_skills() treats a missing/unparseable confidence as 1.0, so it always clears the injection threshold. For teacher-escalation drafts -- auto-written from a possibly untrusted trace and then injected as authoritative guidance -- that means a draft can be auto-injected regardless of the configured confidence bar. Require teacher-escalation drafts to carry an explicit, parseable confidence that meets min_confidence; fail closed otherwise. Hand-authored legacy drafts keep the lenient "unset -> keep" behavior so they don't silently vanish, and published skills are unaffected. Ran: python -m py_compile services/memory/skills.py + a get_relevant_skills unit check (teacher drafts with None/garbage/0.8 excluded at min=0.85; 0.9 included; legacy + published unaffected; gate-off control unchanged). Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
committed by
GitHub
parent
aad43a050b
commit
14e8cffa41
@@ -577,6 +577,17 @@ class SkillsManager:
|
||||
def _passes(s):
|
||||
if s.get("status") == "published":
|
||||
return True
|
||||
# Teacher-escalation drafts are auto-written from a (possibly
|
||||
# untrusted) trace and injected as authoritative guidance, so they
|
||||
# must EARN injection with an explicit, parseable confidence that
|
||||
# clears the bar — fail closed on a missing/garbage value instead
|
||||
# of treating it as 1.0. Hand-authored legacy drafts keep the
|
||||
# lenient "unset → keep" behavior so they don't silently vanish.
|
||||
if s.get("source") == "teacher-escalation":
|
||||
c = s.get("confidence")
|
||||
if c is None:
|
||||
return False
|
||||
return _to_float(c, 0.0) >= min_confidence # unparseable → fail closed
|
||||
c = s.get("confidence")
|
||||
if c is None:
|
||||
return True # unset → don't filter (legacy)
|
||||
|
||||
Reference in New Issue
Block a user