AI Alert
cve

CVE-2026-7669: Deserialization flaw in SGLang's HuggingFace tokenizer loader

A medium-severity deserialization bug in SGLang's get_tokenizer routine affects all releases up to 0.5.9. The vendor has not responded to the disclosure, and no fixed version is listed.

By AI Alert Desk ·

A deserialization-of-untrusted-data flaw tracked as CVE-2026-7669 affects every release of the SGLang LLM serving framework up to and including 0.5.9. The bug sits in the get_tokenizer function in python/sglang/srt/utils/hf_transformers_utils.py, which is part of SGLang’s HuggingFace Transformer handler. VulDB published the entry on May 2, 2026, and notes that the maintainers were contacted before disclosure but did not respond.

Affected

  • Product: sgl-project SGLang
  • Versions: up to 0.5.9 (the CIRCL Vulnerability-Lookup record lists the 0.5.0 through 0.5.9 line as impacted)
  • Component: HuggingFace Transformer handler — python/sglang/srt/utils/hf_transformers_utils.py, function get_tokenizer
  • CWE: CWE-502 (Deserialization of Untrusted Data), CWE-20 (Improper Input Validation)
  • CVSS v3.1: 5.6 Medium — AV:N/AC:H/PR:N/UI:N/S:U/C:L/I:L/A:L
  • CVSS v4.0: 6.3 Medium — AV:N/AC:H/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L
  • Fixed version: none published at time of writing

The vulnerability

SGLang is a high-throughput inference and serving stack for large language and multimodal models, used as an alternative to vLLM and TGI in many production setups. Its get_tokenizer helper is the canonical entry point for resolving the tokenizer associated with a model identifier or local path, and it is invoked on the request path as well as during server startup.

According to the NVD entry, the function performs deserialization of attacker-controllable input. The classification under CWE-502 is consistent with a pattern that has surfaced repeatedly in ML serving stacks: tokenizer and model-config loaders that ultimately invoke pickle.load, torch.load, or HuggingFace’s from_pretrained flows with trust_remote_code semantics on data that originated from an untrusted directory or network source. Any of those paths can execute arbitrary Python at load time.

The CVSS vector is the most useful piece of operational signal here. The attack is network-reachable, requires no authentication, and needs no user interaction, but is rated High complexity with Low impact across confidentiality, integrity, and availability — and unchanged scope. Practically, that profile is consistent with an exploit that requires an attacker to influence the tokenizer identifier or the contents of a model directory that SGLang will subsequently load. It is not a drive-by RCE on a default install: the attacker needs a way to steer the loader at a malicious source. In deployments where operators expose model-loading parameters to untrusted users, or where SGLang is wired up to pull artifacts from a writable shared cache, the bar is lower.

The “high complexity, low impact, difficult exploitability” framing in the original disclosure should not be read as low risk if your deployment lets any caller pass arbitrary model strings to the SGLang server. In that configuration the path from external request to deserialization is short.

This is not the first deserialization issue reported in SGLang. Earlier 2026 advisories — including CVE-2026-3059 and CVE-2026-3060 catalogued by Snyk and the high-impact GGUF-loader RCE tracked as CVE-2026-5760 — point at a recurring shape across the project’s model and tokenizer loading surface.

Mitigation

No patched release of SGLang is referenced in the NVD or VulDB records as of publication. The project’s GitHub releases page lists 0.5.10.post1 from April 2026 as the most recent tag, but no advisory ties that release to CVE-2026-7669, and the original disclosure explicitly notes that the vendor did not respond.

Until a fixed version ships, operators should:

  1. Restrict model and tokenizer inputs. Do not allow untrusted callers to supply arbitrary model or tokenizer paths to SGLang. Whitelist the set of model identifiers your serving layer is willing to load and reject anything outside it at the gateway.
  2. Lock down the model cache. Treat the HuggingFace cache directory and any shared model store as a trust boundary. Make it read-only to the SGLang process where possible, and audit who can write tokenizer files (tokenizer.json, tokenizer_config.json, special_tokens_map.json, custom tokenization_*.py) into that store.
  3. Disable trust_remote_code paths. If your deployment configures SGLang to load tokenizers or models with remote code execution enabled, disable that and pin to vetted, hashed model snapshots.
  4. Network-segment the inference tier. The CVSS vector is AV:N. Reachability from untrusted networks is part of the exploit precondition; firewalling the serving port to known clients reduces blast radius.
  5. Monitor for new releases. Watch the sgl-project/sglang repository for a release that explicitly references CVE-2026-7669, and pin SGLang versions in your dependency manifests rather than tracking latest.

The combination of an unresponsive maintainer, a recurring class of bug across this project’s loaders, and the central role SGLang plays in production LLM serving makes this advisory worth tracking even at a Medium CVSS score.

Sources

Sources

  1. NVD — CVE-2026-7669
  2. Vulnerability-Lookup (CIRCL) — CVE-2026-7669
  3. sgl-project/sglang on GitHub

Subscribe

AI incidents and vulnerabilities — tracked, sourced, dated. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.