Chat models often emit GitHub/Slack-style :shortcode: text (e.g. 😊,
🎤) instead of the actual emoji. The renderer only converted real
Unicode emoji to the monochrome line icons, so shortcodes rendered as literal
text.
Add a pure, browser-free shortcode->Unicode map (emojiShortcodes.js) and run it
inside svgifyEmoji ahead of the existing Unicode->SVG pass, skipping <code>/<pre>
so code stays literal. Covers ~430 common shortcodes plus common aliases
(+1/thumbsup, etc.).
Keep the conversion from touching anything it shouldn't:
* Scope it to chat. mdToHtml/svgifyEmoji take a { shortcodes } option (default
on); document and email body rendering (compose, export, preview) pass it as
false so author-typed :shortcode: text stays literal. The Unicode->SVG pass
still runs there exactly as before.
* Only convert a :shortcode: that stands on its own. A word-boundary guard
leaves embedded colon runs alone, so "1:100:2", "10:30:45", "16:9" and
host:fire:port are never rewritten.
Tests: extend the node-driven unit test with the boundary/false-positive cases,
and fix the markdown-rendering test loader to resolve the new emojiShortcodes
import.
* fix: markdown table renders separator row as visible data
The alignment separator (|---|---|) at row index 1 was rendered as a
<td> row with dashes as cell content. Skip it and only open <tbody>
at that point, so tables render as header + data without the garbage
separator row in between.
* test: add regression test for table separator row rendering
Verifies that the markdown table renderer skips the separator row
(|---|---|) instead of rendering it as a visible data row. Also
updates the test harness to handle the splitTableRow import.
`mdToHtml` deliberately stashes literal <details> blocks and <a> tags from
the source text *before* the global HTML-escape pass and restores them
verbatim into the string callers assign to `innerHTML` (e.g. chatRenderer's
`b.innerHTML = ...processWithThinking(text)`). Nothing scrubbed those
fragments, so message/agent content containing
`<details><img src=x onerror=...></details>` or
`<a href="javascript:..." onmouseover=...>` executed arbitrary script in
the authenticated page.
Route both stashed fragments through `sanitizeAllowedHtml()`, which parses
them in an inert <template> (no resource loads, no script execution),
removes script-capable elements, and strips event-handler attributes plus
javascript:/vbscript:/data: URL schemes. Hardening details:
- Compare tag names case-insensitively and drop the SVG/MathML foreign-
content roots. An SVG-namespaced <script> has the lower-case tagName
'script', so an HTML-only upper-case check would miss it — a real bypass.
- Sanitize to a fixpoint (re-parse + re-clean until stable) to blunt
mutation-XSS, where re-serializing/re-parsing reshapes the tree.
Benign anchors and <details> blocks are preserved unchanged.
Verified under jsdom against the obvious vectors plus mutation-XSS probes
(svg/math-namespaced <script>, foreignObject, ns-confusion, comment
breakout, template smuggling): no script/iframe element, event handler, or
javascript:/data: URL survives, and benign markup is kept.
Co-authored-by: Claude <noreply@anthropic.com>
Follow-up to #271. Skip svgifyEmoji when body.text-emojis is set so
deEmojify can strip Unicode from replies; also unwrap existing .emoji
spans from messages rendered before the setting was applied.
Related to #270