Microsoft Wants to Power Copilot With Chinese AI – Here Is What That Actually Means

📖 6 min read

Microsoft’s enterprise AI agent just got a lot more complicated. On June 16, 2026, the company officially launched Copilot Cowork – its flagship AI agent for enterprises – and on the same day disclosed it is actively evaluating DeepSeek V4, a Chinese-origin model, as a cheaper engine to run it. The announcement is a rare admission that even the world’s second-largest company cannot afford the AI agents it is selling to its own customers.

The reason is math. And the math is brutal.

The AI Cost Crisis Nobody Wants to Talk About

A standard chatbot answers one prompt and stops. An AI agent executing a real business task – drafting, reviewing, revising, calling tools, delegating to other agents – can hit a language model dozens of times per request. According to a Microsoft Research study from April 2026, agentic coding tasks consume roughly 1,000x more tokens than standard code-chat interactions, with the same task varying by as much as 30x in total token usage across runs.

The consequences are visible across the industry:

📧 Want more like this? Get our free The 2026 AI Playbook: 50 Ways AI is Making People Rich — Free for a limited time - going behind a paywall soon

  • Uber burned through its entire 2026 AI budget in four months after rolling out agentic coding tools to thousands of engineers.
  • Microsoft itself recently canceled most direct Claude Code licenses in its Experiences and Devices division, redirecting engineers to GitHub Copilot CLI by a June 30, 2026 deadline.
  • EY analysis puts the cost of a complex agentic interaction in 2026 at approximately $30 per task when running on frontier closed models.

Charles Lamanna, Microsoft’s EVP for Copilot, agents, and platform, was unusually candid: “We have users who do hundreds of tasks a week, which is great – they’re way productive – but the consequence is the costs can go very high.”

Enter DeepSeek V4.

What DeepSeek V4 Actually Is

DeepSeek released V4 on April 24, 2026 as two open-weight models under the MIT license:

Model Total Parameters Active Per Token Context Window API Price (per 1M tokens)
DeepSeek-V4-Pro 1.6 trillion 49 billion 1 million tokens ~$0.27
DeepSeek-V4-Flash 284 billion 13 billion 1 million tokens $0.09

The 1.6 trillion parameter headline is a Mixture-of-Experts (MoE) architecture, meaning only 49 billion parameters are actually active during any given inference – similar to how GPT-4 and Claude Sonnet work under the hood. The real innovation is in how it handles long contexts.

Join 2,400+ readers getting weekly AI insights

Free strategies, tool reviews, and money-making playbooks - straight to your inbox.

No spam. Unsubscribe anytime.

At one million tokens, V4-Pro uses only 27% of the inference FLOPs and 10% of the KV cache size of DeepSeek V3.2 through a sparse attention architecture. V4-Flash pushes that to 10% of FLOPs and 7% of cache. In plain terms: processing a very long document or large codebase costs dramatically less than competitors. TechTimes reports this translates to a 73% reduction in inference cost at 1M-token contexts compared to standard attention mechanisms.

Performance: Close Enough to Matter, Not Close Enough to Dominate

DeepSeek V4-Pro’s benchmarks are genuinely impressive for an open-weight model – but they come with a caveat.

Benchmark DeepSeek V4-Pro Claude Opus 4.6 Gap
SWE-bench Verified (coding) 80.6% 80.8% -0.2%
Codeforces Rating 3,206 (top 23 humans) N/A
BenchLM Overall Score 68/100 (#39 of 124) Higher Behind frontier

The honest picture: on coding, V4-Pro is within rounding error of Claude Opus 4.6. On general cross-domain tasks, it’s a different story. An independent evaluation by the U.S. Center for AI Safety found V4-Pro trails US frontier models by approximately 8 months across five domains. On price, it’s 53% cheaper than GPT-5.4 mini on 5 of 7 benchmarks. That’s a meaningful spread in an agentic workflow where every task calls the model repeatedly.

The Problem Microsoft Cannot Fix by Hosting It on Azure

Here is where this story gets genuinely complicated. Microsoft is considering running DeepSeek V4 on Azure infrastructure – its own servers, its own data centers. The argument is that hosting it domestically eliminates the data transfer concern. Security teams disagree, and so does the U.S. government.

The issue is not where the model runs. The issue is where it was built. DeepSeek is a Chinese company subject to China’s National Intelligence Law, which requires Chinese organizations and citizens to cooperate with state intelligence operations. The weights of V4-Pro were trained in China, on Chinese infrastructure, by a Chinese team. Running those weights on Azure does not change what is embedded in the model or what obligations apply to the organization that created it.

NIST’s recent evaluation of V4 flagged specific concerns beyond the headline benchmark gaps – particularly around how the model handles politically sensitive topics and whether its post-training pipeline introduced any systematic biases. The Center for AI Safety evaluation was independent and the results were not flattering on safety dimensions.

Meanwhile, the White House has spent months trying to limit Chinese AI penetration into American enterprise infrastructure. Microsoft evaluating DeepSeek for a product sold to Fortune 500 companies – including government contractors – is a direct collision with that policy direction.

Why Microsoft Is Doing It Anyway

Because the cost math is forcing the decision. Copilot Cowork launched on June 16, 2026 with usage-based pricing – precisely because flat-rate pricing at frontier model costs would either price the product out of the market or destroy Microsoft’s margins on heavy users. The usage model transfers cost risk to customers, but customers experiencing $30-per-task costs will cancel, not absorb.

The Copilot Cowork disclosure is not a leak. Microsoft said it publicly. That’s a signal: they are floating the idea deliberately, probably to pressure Anthropic and OpenAI on pricing, or to signal to regulators that Chinese open-source models are so cost-competitive that US companies are being forced to choose between margins and national security.

That is a genuinely uncomfortable argument for US AI labs to rebut.

What to Do About It

If you are an enterprise evaluating AI agent platforms right now, the DeepSeek-Microsoft development changes the landscape in three ways:

  1. Open-weight models are now enterprise-competitive on coding and long-context tasks. DeepSeek V4-Pro at 80.6% SWE-bench is not a research curiosity – it’s a production-grade coding model. If your use case is code review, refactoring, or document processing at scale, the performance gap to closed frontier models has narrowed dramatically.
  2. Agentic AI budgets need a hard cap before deployment. The Uber example – burning through an annual AI budget in four months – is not a one-off. Agentic systems are non-linear in token consumption. Set hard limits per user, per task, per week before you scale.
  3. The China law risk is real and not solved by Azure hosting. If you are in financial services, defense, healthcare, or any sector with data residency requirements or government contracts, running DeepSeek V4 – even on US infrastructure – will require a legal opinion, not just a technical one. The MIT license does not override national intelligence obligations on DeepSeek’s side.

For individual developers and small teams without compliance constraints: DeepSeek V4-Flash at $0.09/1M tokens with a 1M context window and open weights is a serious tool. The coding benchmark performance is legitimate. Use it where it makes sense.

BetOnAI Verdict

This is the most consequential AI business story of the month, and it is not really about DeepSeek. It is about the unsustainable economics of agentic AI at frontier pricing. DeepSeek V4 is the symptom – the real story is that even Microsoft cannot afford to run AI agents at the cost structure its own ecosystem is priced at.

DeepSeek V4-Pro is a legitimately strong model: open weights, MIT license, near-frontier on coding, 73% cheaper inference at long contexts, and 53% cheaper than GPT-5.4 mini on most tasks. The gaps are real too – 8 months behind frontier on general reasoning, and a legal risk profile that cannot be hand-waved away by hosting location.

Microsoft disclosing the DeepSeek evaluation publicly was a calculated move. Whether it results in deployment or serves mainly as negotiating leverage with Anthropic and OpenAI, it signals one thing clearly: the era of US frontier models having no serious price competition is over.

The companies that adapt their AI stack to be model-agnostic – not locked to any single provider – will handle whatever pricing and geopolitical shocks come next better than those that don’t.


Sources:

Enjoyed this? There's more where that came from.

Get the AI Playbook - 50 ways AI is making people money in 2026.
Free for a limited time.

Join 2,400+ subscribers. No spam ever.

𝕏0 R0 in0 🔗0
Scroll to Top