A foundation model is a large-scale AI model, typically trained on broad and diverse datasets, that serves as a base for downstream tasks across domains. The term was coined by Stanford's HAI in 2021 and has become standard in AI strategy discussions.
Foundation models differ from traditional ML models in three ways: scale of training data (trillions of tokens vs thousands), generality (handling many tasks vs one), and emergent capabilities (skills that appear without explicit training as the model grows).
In 2026, the foundation model landscape is dominated by seven organizations: OpenAI (GPT-4o, GPT-4.5), Anthropic (Claude 3), Google DeepMind (Gemini 2), Meta (Llama 3, open source), Mistral AI (Mistral Large), Cohere (Command R+), and xAI (Grok).
Enterprise strategy around foundation models asks three questions: which model for which use case, build versus buy versus fine-tune, and how to handle governance and risk when model behavior can change with each vendor update.
How it works
Foundation models are trained once at massive cost (tens to hundreds of millions of USD) and then adapted for specific tasks via fine-tuning, in-context learning, or retrieval augmented generation. Customers rarely train foundation models from scratch.
Practical example
A consulting firm chooses GPT-4o as its primary foundation model for client work, Claude for long-context analysis, and an open source Llama 3 variant for sensitive internal data that cannot leave their infrastructure.
Definition by Miss Yera, Leading Woman in Technology in Peru · AI Consultant · Favikon 2025.
Version en espanol: /glosario-ia/#foundation-model