Cerebras delivers blazing speed for OpenAI’s new open-model with 3,000 tokens per second

Cerebras Systems Inc., a startup providing ultra-fast artificial intelligence inference, today announced support for OpenAI’s newly released 120 billion-parameter open-weight reasoning model, gpt-oss-120B, which the company can run at record-breaking speeds on its inference service.

According to the company, Cerebras can deploy OpenAI’s new model at around 3,000 tokens per second, calling it a major advance in responsiveness for high-intelligence AI.

A token is how an AI model breaks down text into manageable pieces for analysis and generation. They can be words, parts or words or even punctuation. How quickly an AI model reads or writes tokens can express how fast a model operates.

“OpenAI’s open-weight reasoning model release is a defining moment for the AI community,” said Cerebras co-founder and Chief Executive Andrew Feldman. “With gpt-oss-120B, we’re not just breaking speed records — we’re redefining what’s possible.”

Released today, OpenAI’s new model marks its first open-weight release since GPT-2 in 2019, and the first “thinking” model the company has released under an open-weight license. OpenAI provided two variants: a 120 billion-parameter version and a smaller 20 billion-parameter version, with the latter optimized for less powerful hardware.

According to OpenAI, the 120B model achieves near-parity with o4-mini on core reasoning benchmarks, a popular go-to model across much of the industry. It also performs similarly on intelligence benchmarks to proprietary models such as Google LLC’s Gemini 2.5 Flash and Anthropic PBC’s Claude 4 Opus, Cerebras said.

In a press conference, Feldman noted that running the model at over 3,000 tokens per second enables organizations to unlock super-fast use cases at significantly lower prices than competing closed-source systems. For comparison, Anthropic’s Claude 4 Opus runs at roughly 56 tokens per second, while Cerebras offers the new OpenAI model at 25 cents per million input tokens and 69 cents per million output tokens — compared to Opus 4’s $15 per million input and $75 per million output tokens.

These kinds of “thinking models” typically suffer long wait times when running on traditional GPU infrastructure. Before they start writing, they need time to “think,” where they process queries with multistep reasoning, which can take several seconds or longer depending on query complexity, model size and hardware.

“This model beats Claude 4 Opus. It is on the order of 55 times faster when served on Cerebras hardware than when served by Anthropic — and it’s 60 times less expensive,” Feldman said. “That’s how markets move. When you bring that sort of advantage to the table, use cases that were previously impossible become suddenly possible.”

Cerebras is best known for its dinner plate-sized silicon wafer chips purpose-built for AI, but it also offers complete systems — software, application programming interfaces and cloud or on-premises options — allowing organizations to deploy open-weight models with minimal effort. Deploying on-premises can be especially enticing for companies that need to work with sensitive or regulated data and with access to the new open-weight model have an opportunity to experiment with it at extremely high speeds.

“We’re not a chip company; we’re a system company,” Feldman said.

Thanks to industry-standard APIs, developers can switch from OpenAI’s endpoints to Cerebras’s infrastructure in seconds, without rewriting code. “It will take you about 15 seconds to connect to our API,” Feldman said. “You type in api.cerebras.ai, then add your Cerebras API key, and finally gpt-oss-120b. That’s it.”

Cerebras partnered with Vercel Inc. as the default model provider for the new release, and also supports deployment through Hugging Face, OpenRouter and other providers, part of a broader effort to meet developers wherever they are.

“This is the first time we’ve partnered [with Hugging Face] at launch,” Feldman said. “You get millions of developers banging on the model in a way even the original developers couldn’t imagine.”

Developers interested in running the model can sign up for a free API key at cerebras.ai/openai and begin experimenting today.

Cerebras delivers blazing speed for OpenAI’s new open-model with 3,000 tokens per second

Recent Posts

Future Pre-IPO Interests. Click Here if you wish to know more.

cerebras.ai

sandboxaq.com

odysseyelixir.com

CONTACT US

FUND ADMINISTRATION