Merge remote-tracking branch 'origin/codex/production-intelligence-terminal' into codex/issue-22-source-fetch-instrumentation

# Conflicts: # test/fetch-utils.test.mjs
2026-05-17 20:52:27 +02:00
parent e4834cd3cd 6a9918bc98
commit a590bf62c2
7 changed files with 623 additions and 46 deletions
--- a/README.md
+++ b/README.md
@@ -134,6 +134,9 @@ STALE_ALERT_COOLDOWN_MINUTES=60
 DASHBOARD_URL=https://intelligence.example.internal
 TERMINAL_ACTIONS_ENABLED=true
 SWEEP_TOKEN=
+SSE_HEARTBEAT_INTERVAL_MS=25000
+TERMINAL_ACTION_RATE_LIMIT_WINDOW_MS=60000
+TERMINAL_ACTION_RATE_LIMIT_MAX=10
 BRIEF_VERBOSITY=standard

 LLM_PROVIDER=openrouter
@@ -185,9 +188,66 @@ LLM_MODEL=your-model

 For Pangolin or another reverse proxy, forward HTTP traffic to `intelligence-terminal:3117` (or the `PORT` you set). Missing API keys do not crash sweeps; affected sources are reported as degraded in `/api/health`.

+#### Terminal Action Exposure
+
+`POST /api/action` and `POST /api/sweep` can trigger operational actions such as manual sweeps. The dashboard has a **SET TOKEN** control that stores your `SWEEP_TOKEN` in browser local storage and sends it as the `x-crucix-token` header; do not put action tokens in URLs.
+
+Recommended settings:
+
+| Deployment | Settings |
+| --- | --- |
+| Private local machine | `NODE_ENV=development`, optional `SWEEP_TOKEN`, optional `TERMINAL_ACTIONS_ENABLED=true`. Localhost can run actions without a token for development. |
+| Private LAN / Dockge | Set a strong `SWEEP_TOKEN`, keep `TERMINAL_ACTIONS_ENABLED=true`, expose only to trusted clients. |
+| Pangolin-authenticated reverse proxy | Set a strong `SWEEP_TOKEN`, keep Pangolin auth in front, use the dashboard **SET TOKEN** flow once per browser. |
+| Public internet | Do not expose Terminal Actions directly. If exposure is unavoidable, require `SWEEP_TOKEN`, keep proxy authentication enabled, lower `TERMINAL_ACTION_RATE_LIMIT_MAX`, and monitor server audit logs. |
+
+Action endpoints reject cross-origin POST origins, apply a small in-memory per-IP rate limit, and write sanitized audit lines without logging the token.
+
 When data remains stale past `STALE_DATA_MAX_AGE_MINUTES`, the server sends an operator alert through configured Telegram/Discord channels after failed or degraded sweep attempts. `STALE_ALERT_COOLDOWN_MINUTES` prevents repeated stale alerts from spamming every refresh interval. Set `DASHBOARD_URL` to the Pangolin/public URL you want included in those alerts.

-The dashboard Terminal Actions panel can trigger `status`, `sweep`, and `brief` through `/api/action`. Leave `TERMINAL_ACTIONS_ENABLED=true` for a private home-server deployment. For an internet-exposed deployment, set `SWEEP_TOKEN` and pass it through trusted automation, or set `TERMINAL_ACTIONS_ENABLED=false` to disable browser-triggered actions. If you protect actions with `SWEEP_TOKEN`, the browser can send it from `localStorage.crucix_sweep_token`.
+#### Memory And Prediction Loop
+
+Crucix stores longitudinal memory in `runs/intelligence.db` when the current Node.js build exposes `node:sqlite`. If SQLite is unavailable, the file is created as a harmless placeholder and `/api/health` reports the memory store as unavailable instead of failing the sweep.
+
+The memory layer persists:
+
+| Table | Purpose |
+| --- | --- |
+| `runs` | Sweep timestamps, source health counts, and delta direction summaries. |
+| `entities` | Stable entity IDs for recurring countries, regions, and locations. |
+| `events` | Stable event IDs for conflict, OSINT, urgent news, and new delta signals across sweeps. |
+| `predictions` | Trade/intelligence hypotheses with evidence, confidence, horizon, outcome state, and latest grading. |
+
+Query endpoints:
+
+```text
+GET /api/memory/search?q=iran&limit=25
+GET /api/memory/predictions?state=open&limit=25
+```
+
+Memory endpoints use the same operator authorization gate as Terminal Actions. The dashboard Terminal Actions panel includes a `Memory` action for a quick operator-facing view of recent events and prediction states.
+
+Retention, backup, and privacy expectations:
+
+- Treat `runs/intelligence.db` as operator data. It can contain source excerpts, headlines, generated hypotheses, and URLs from your configured feeds.
+- Back up `runs/` with the rest of your Dockge volume if you want longitudinal learning to survive container replacement.
+- Delete `runs/intelligence.db` to reset SQLite memory; the next sweep recreates the schema.
+- Do not commit `runs/` or `.env`. API credentials stay in `.env`; memory stores derived observations, not secrets.
+- If you expose the dashboard through a reverse proxy, protect Terminal Actions and memory queries behind your normal authentication boundary.
+
+#### Reverse Proxy SSE
+
+The dashboard receives live sweep updates from `GET /events` using Server-Sent Events. The server sends `retry: 10000` reconnect guidance and lightweight heartbeat comments every `SSE_HEARTBEAT_INTERVAL_MS` milliseconds so reverse proxies do not close an otherwise idle stream between 15-minute sweeps.
+
+Recommended proxy settings:
+
+| Proxy | Setting |
+| --- | --- |
+| Pangolin / Traefik-style frontends | Keep response streaming enabled and set idle timeouts above `SSE_HEARTBEAT_INTERVAL_MS`. |
+| Nginx | Disable proxy buffering for `/events`, keep `proxy_read_timeout` above the heartbeat interval, and preserve `Connection: keep-alive`. |
+| Cloudflare-style proxies | Keep the heartbeat below common idle cutoffs; the default 25s is intentionally conservative. |
+
+If you raise the heartbeat interval, keep it shorter than the lowest idle timeout in the proxy chain.

 `/api/metrics` includes network health grouped by host and source/provider. Source modules should use `safeFetch(url, { source: 'SourceName' })`; when omitted, the shared helper infers a stable provider bucket from the URL host instead of grouping normal source traffic under `unknown`. Raw fetch exceptions are documented in [Source Fetch Instrumentation](docs/source-fetch-instrumentation.md).