Daily Tech News, Interviews, Reviews and Updates

Microsoft introduces two new multi-model capabilities in Microsoft 365 Copilot’s Researcher

Researcher- Microsoft 365 Copilot’s deep research agent for work now gets two new multi-model capabilities- Critique and Council.

Critique and Council: multi-model capabilities in Microsoft 365 Copilot’s Researcher

Critique

Critique is a new multi-model deep research system designed for complex research tasks. It separates generation from evaluation and utilises a combination of models from Frontier labs including Anthropic and OpenAI.

Critique divides responsibilities between two AI partners- one optimised for deep exploration and structured synthesis, and a second focused on validating claims, improving presentation and strengthening the structure. It delivers higher-quality results across factual accuracy, analytical breadth, and presentation. Critique will be the default experience in the Researcher, available when Auto is selected in the model picker.

It is built around a rubric-based evaluation- a structured review that focuses on strengthening the report without turning the reviewer into a second author. The reviewer concentrates on various dimensions: Source Reliability Assessment, Report Completeness, and Strict Evidence Grounding Enforcement.

According to the DRACO benchmark, Critique showed the largest improvement in Breadth and Depth of Analysis, followed by Presentation quality and Factual accuracy. Researcher with Critique achieved higher scores than the single-model approach across 10 domains.

Council

Council brings model responses side-by-side in the Researcher experience. Additionally, a cover letter provides valuable insights on where the models agree, where they diverge, and the unique insights each brings to the topic.

This is an alternative approach designed for side-by-side comparison across multiple models. Available when Model Council is selected in the model picker in Researcher, Council runs an Anthropic and OpenAI model simultaneously, with each model producing a complete, standalone report, surfacing facts, citations, and analytical farmings that the other may overlook or weigh differently.

Once both reports are generated, a dedicated judge model evaluates the reports to create a distilled summary of key findings and highlights where the models meaningfully agree or diverge.

Availability

Critique and Council are broadly available in the Frontier program.

Get real time updates directly on you device, subscribe now.

You might also like