Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
GPT-4o
Model family: gpt-4o
GPT-4o — OpenAI's 2024 multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. workhorse (text/image/audio), now legacy but still widely used.
Identity
- Creator
- OpenAI
- Model family
- gpt-4o
- Release date
- 2024-05-12
Technical specs
- Parameter count
- The 2024 multimodal model (text, image, audio) that was OpenAI's mainstream workhorse; now legacy but heavily deployed.
- Context window
- 128K tokens
- Modalities
- Audio Input
- Image Input
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Multilingual
- Reasoning
- Tool Use
- Vision
License
- License
- OpenAI API Terms of Use
- Commercial use
- Allowed
- Terms
- Modification ✗
- Redistribution ✗
- Attribution ✗
Access
- Openness
- Closed Api
- Access methods
- Api First Party
- Api Third Party
- Hosted Chat Ui
- Cost tier
- Paid Api
Sources
- llm
- closed-api
- multimodal
- us-based
- proprietary