LLM-generated passwords are indefensible. Your codebase may already prove it

Temperature is not a remedy: A reflexive objection from practitioners familiar with LLM configuration holds that increasing sampling temperature would attenuate these distributional biases by flattening the probability landscape from which characters are drawn. Irregular’s empirical results are unambiguous in refuting this intuition. Testing conducted at temperature 1.0, the maximum setting on Claude, produces no statistically meaningful improvement in effective entropy. The character-position biases are encoded in model weights, not in sampling parameters, and temperature modulation operates downstream of those weight-instantiated distributions.Separately, Kaspersky’s Data Science Team Lead Alexey Antonov conducted a complementary investigation analyzing 1,000 passwords generated by ChatGPT, Meta’s Llama, and DeepSeek. The character-frequency histograms disclosed pronounced non-uniformity across all three models: ChatGPT exhibits a systematic preference for the characters x, p, and L; Llama for the hash symbol and the letter p; DeepSeek for t and w. At temperature 0.0, Claude produces the identical string on every invocation. These findings are consistent across different model families and measurement methodologies, corroborating the structural rather than incidental nature of the vulnerability.The practical corollary is that an adversary who has identified the LLM used to generate a target credential need not attempt exhaustive brute-force against a 94^16 keyspace. They can construct a model-specific attack dictionary, ordering candidates by their empirical generation frequency, and execute a probabilistically optimized search against a keyspace several orders of magnitude smaller. Kaspersky’s cracking tests found that 88 percent of DeepSeek passwords and 87 percent of Llama passwords failed to withstand targeted attack, as did 33 percent of ChatGPT passwords, all using standard GPU hardware.

The agentic injection problem: The portion of this problem amenable to user education, practitioners being counselled not to solicit passwords from conversational AI interfaces, represents a fraction of the aggregate exposure. The more consequential and considerably less tractable vector is autonomous credential generation by AI coding agents embedded in professional development toolchains.When an AI coding agent such as GitHub Copilot, Claude Code, or an analogous instrument receives a task specification entailing database initialization, containerized service configuration, or API bootstrapping, it generates credentials as a functional prerequisite of task completion. No explicit instruction to produce a password is required; the agent infers necessity from context. The resulting credential is embedded in a Docker Compose environment variable, a .env configuration file, or a Kubernetes secret manifest and is committed to version control by a developer whose attentional resources are directed at functional correctness, not credential provenance.The OWASP Top 10 for LLM Applications 2025 designates insecure output handling as a critical risk category, one that encompasses precisely this failure mode, wherein LLM-generated content is consumed without appropriate validation by downstream systems and processes. The credential thus introduced is not flagged by Gitleaks or Trufflehog, because those tools employ pattern-matching against known secret formats and have no capacity to evaluate the character-position entropy distribution that distinguishes a CSPRNG-derived credential from an LLM-derived one.

Organizational response priorities: The remediation landscape is tractable for organizations prepared to act methodically. The following priorities are sequenced by immediacy of risk reduction.Conduct a retrospective audit of all AI-assisted repositories dating to early 2023, when agentic coding tools achieved widespread enterprise adoption. Particular scrutiny should be directed at configuration files, Docker Compose YAML, and .env entries. Credentials exhibiting LLM-characteristic distributional signatures, consistent uppercase initialization, medial numeral clustering, terminal special characters, warrant investigation regardless of their apparent complexity.Rotate every credential whose provenance cannot be affirmatively traced to a CSPRNG invocation. The canonical CSPRNG interfaces, Python’s secrets.token_urlsafe(), openssl rand -base64, /dev/urandom, are the only acceptable sources. An audit trail establishing provenance is operationally valuable; absent such a trail, the presumption should favor rotation.Amend AI coding tool system prompts and secure development guidelines to mandate explicit CSPRNG invocation for all credential generation. The instruction must be categorical: The agent generates no password strings; it calls the appropriate platform function. This single-sentence policy amendment, consistently enforced, prevents the class of agentic injection at its origination point.Augment static secret scanning with entropy-aware analysis capable of evaluating character-position distributions rather than merely pattern-matching against known formats. This capability gap is currently the central technical challenge in operationalizing detection for this threat class.Escalate to LLM vendors through enterprise agreement channels. The architectural fix, routing password generation requests to a CSPRNG backend rather than processing them through the autoregressive generation pipeline, is an engineering decision available to AI providers. NIST SP 800-63B Revision 4, released in August 2025, establishes unambiguous guidance on entropy requirements for authentication credentials. Vendor accountability to that standard is a legitimate contractual expectation.

The broader epistemological challenge: The phenomenon of LLM-generated passwords, now being called ‘vibe passwords’ in security community discourse, an appellation that captures the verisimilitude without the substance, is a specific instantiation of a broader epistemological challenge that will recur as AI-generated content becomes more deeply entangled with security-sensitive infrastructure. The training objective that makes large language models extraordinarily capable of producing contextually appropriate, humanistically plausible outputs is structurally incompatible with the mathematical requirements of cryptographic security, which demand genuine unpredictability precisely where pattern and plausibility offer no traction.The diagnostic tools and remediation pathways exist. What the security community requires, with some urgency, is the systematic awareness that the problem has already propagated into production environments at a scale that warrants immediate and deliberate organizational response, not anticipatory policy, but retrospective investigation.This article is published as part of the Foundry Expert Contributor Network.Want to join?

First seen on csoonline.com

Jump to article: www.csoonline.com/article/4155166/llm-generated-passwords-are-indefensible-your-codebase-may-already-prove-it.html

LLM-generated passwords are indefensible. Your codebase may already prove it

also interesting: