Microsoft Combines GPT and Claude to Cross-Check Copilot Responses

Microsoft has introduced Critique, a new feature within its Microsoft 365 Copilot Researcher agent that combines AI models from OpenAI and Anthropic. This system pairs GPT with Claude to create a checks-and-balances workflow that improves the accuracy of AI-generated research.

In this setup, GPT generates initial responses to research queries, while Claude reviews them for accuracy, completeness, and citation integrity. As a result, users receive more reliable and refined outputs. Moreover, this approach reflects Microsoft’s broader strategy of “multi-model intelligence,” where different AI systems validate each other instead of relying on a single model.

How Critique Enhances Research Quality

According to Jared Spataro, Microsoft’s chief marketing officer for its AI at Work division, “GPT drafts, Claude reviews for accuracy, completeness, and citation integrity before it’s delivered,”. Furthermore, the company plans to expand this workflow so that Claude can also generate responses while GPT evaluates them.

Early testing shows promising results. Specifically, the multi-model system achieved a 13.8% improvement on the DRACO benchmark, which measures deep research quality. Therefore, it outperformed standalone research tools from several competitors, including Google and Perplexity.

In addition, Microsoft introduced a Council mode that allows users to compare outputs from multiple models side by side. Consequently, users can better assess accuracy and gain deeper insights from AI-generated content.

Driving Enterprise Adoption of AI Tools

At the same time, Microsoft continues to expand its enterprise AI offerings. The company recently launched Copilot Cowork, an agentic tool based on Anthropic’s technology that helps users manage long-running and complex tasks. This feature is now available through an early access program tied to a new Microsoft 365 tier.

Apple Hires Ex-Google Executive Ahead of Siri AI Overhaul

Despite these advancements, adoption remains a key focus. During a recent earnings call, CEO Satya Nadella reported 15 million paid Copilot seats, representing a small portion of the company’s broader enterprise user base. Although this figure reflects strong year-over-year growth, it also highlights the gap between AI innovation and widespread adoption.

Therefore, features like Critique aim to address concerns about reliability and AI “hallucinations.” By improving trust and output quality, Microsoft hopes to accelerate adoption across its enterprise ecosystem.

How Critique Enhances Research Quality

Driving Enterprise Adoption of AI Tools

Leave a Reply Cancel reply