An LLM, or Large Language Model, is the neural network architecture that powers most generative AI systems today. Built on the transformer architecture introduced in 2017, LLMs are trained on trillions of tokens of text, code, and structured data to learn language patterns.
Notable LLMs in 2026 include GPT-4o and GPT-4.5 (OpenAI), Claude 3.5 Sonnet and Claude 3 Opus (Anthropic), Gemini 2 Ultra (Google), Llama 3 405B (Meta, open source), Mistral Large (Mistral AI), and Command R+ (Cohere). Each has different trade-offs between cost, speed, and quality.
Per our 2026 data, the "what is LLM" cluster generates 17,000 in traffic potential across related queries with an unusually high CPC of $1.50, signaling commercial intent in enterprise AI buyers.
When selecting an LLM for a production use case, evaluate five factors: context window size, pricing per million input and output tokens, latency, benchmark performance on domain-relevant tasks, and vendor privacy guarantees.
How it works
LLMs use the transformer architecture: input tokens flow through layers of self-attention that compute contextual relationships, feed-forward layers transform representations, and a final projection produces the probability distribution for the next token.
Practical example
An enterprise evaluating Claude versus GPT-4 for a support chatbot compares three variables: monthly cost at expected volume (Claude often cheaper for long contexts), quality in Spanish (both strong), and latency (GPT-4o slightly faster). Decision depends on case specifics.
Definition by Miss Yera, Leading Woman in Technology in Peru · AI Consultant · Favikon 2025.
Version en espanol: /glosario-ia/#what-is-llm