`add_hwfit_models.py` infers `parameter_count` and `parameters_raw` by regexing the HF repo name for a `<num>B` token, optionally with an `-A<num>B` MoE active-param suffix. Repos that don't encode a size in their name at all (e.g. `zai-org/GLM-4.5`, where the "4.5" is a version not a parameter count) fall through to the safetensors element-count path. That path works for unquantized FP16 / BF16 repos but is brittle in two cases the catalog hits often: 1. Author-bulk runs (`AUTHORS = ["cyankiwi"]`) pull pre-quantized AWQ / GPTQ / MLX repos. The safetensors metadata stores the packed I32 tensors and a per-dtype `parameters` map, which the script unpacks via a per-quant pack factor. When the upload doesn't populate that map (older repos, custom shards), `st.total` is used raw and the parameter count is off by 4-8x. 2. Repos where the safetensors block is absent from `model_info()` entirely. The current code returns `None` and silently drops the model, which then has to be added to `EXTRA_REPOS` by hand with a literal `parameter_count` string. Both are exactly what the issue calls out — the regex / safetensors combo can't size GLM-4.5 by itself because the name has no `<num>B` and the upstream repo's safetensors block doesn't carry a usable param total either. Add a config.json fallback in front of the safetensors path: - `_fetch_config_json(repo_id)` downloads `config.json` via `hf_hub_download` (so the standard HF on-disk cache handles deduplication across runs, no extra cache layer needed). Network / 404 / gated-repo errors return `None` and the caller proceeds to the safetensors fallback. An in-process `_CONFIG_CACHE` dedupes the base-model vs. source-repo lookups within a single run. - `_params_from_config(cfg)` first honours explicit `num_parameters` / `n_params` / `total_params` fields when present. Otherwise it sums embeddings + attention (GQA-aware via `num_key_value_heads` and `head_dim`) + dense MLP (`3 * hidden_size * intermediate_size`, covering SwiGLU / GeGLU). For MoE configs it picks up both naming conventions in the wild — `num_experts` / `num_experts_per_tok` (Qwen3-MoE) and `n_routed_experts` / `n_shared_experts` (GLM-4-MoE, DeepSeek-V3) — uses `moe_intermediate_size`, and respects `first_k_dense_replace` so the first N layers stay dense. Active parameters come out as `num_experts_per_tok + n_shared_experts` of the routed experts, which matches how each architecture reports its active count. - In `_entry_from_modelinfo`, try config.json on the source repo first (works for unquantized models) and then on the `base_model:` parent (covers AWQ / GPTQ children whose own config is just a quantization manifest). Both lookups run only when regex + override + base_model tag all failed, so the normal author-bulk run still resolves sizes from names without touching the Hub. Spot-checks against the three architecture families this script actually pulls — within ~5% of the documented param counts, which is well inside the `parameter_count` rounding (one decimal of "B") and the `min_vram_gb` downstream bucket: Qwen2.5-7B-Instruct 7.62B (HF card: 7.6B) Qwen3-30B-A3B 30.5B / 3.34B active (card: 30.5B / 3.3B) GLM-4.5 352.7B / 33.6B active (card: 355B / 32B) The safetensors path is unchanged and remains the last resort, so repos with neither a parsable name nor a fetchable config.json behave exactly as before. Closes #955.
16 KiB
16 KiB