A new Microsoft safety system can detect hallucinations in its consumers’ AI applications

Malicious prompt attacks and “unsupported” replies, often known as hallucinations, can be screened for using the Azure AI Studio tools.

In an interview with Sarah Bird, Microsoft’s chief product officer for responsible AI, said that her team has created several new safety measures that are simple to use for Azure users who aren’t recruiting teams of red teamers to test the AI services they’ve developed. According to Microsoft, Azure AI users working with any model housed on the platform can use these LLM-powered capabilities to identify potential vulnerabilities, keep an eye out for hallucinations “that are plausible yet unsupported,” and instantly stop dangerous cues.

The assessment system generates the prompts needed to simulate hostile material and prompt injection assaults because we understand that not all of our customers have a comprehensive understanding of these topics. Then, she says, customers can view the results and receive a score.

This can help prevent generative AI controversies brought about by unwanted or unintentional reactions, such as the ones that have recently involved explicit celebrity fakes (created by Microsoft Designer picture generator), historically incorrect images (found on Google Gemini), and Mario flying a jet in the direction of the Twin Towers (published by Bing).

Three features are now available in the preview on Azure AI: Groundedness Detection, which detects and blocks hallucinations; Prompt Shields, which block malicious prompts from external documents that instruct models to go against their training; and safety evaluations, which evaluate model vulnerabilities. We’re working on two more features that will guide models toward safe outputs and track prompts to identify users who might be a problem.

Before sending it to the model to respond, the monitoring system will check to see whether it triggers any forbidden words or contains hidden prompts, regardless of whether the user is entering a prompt or if the model is processing third-party data. The system then examines the model’s response to see if the model saw something that wasn’t in the prompt or the document.

In the case of the Google Gemini photographs, Microsoft claims that its Azure AI technologies will provide more individualized control in this area because filters meant to lessen prejudice had unexpected consequences. Bird and her team developed a feature that allows Azure users to toggle the filtering of hate speech and violence that the model detects and blocks. This was done in response to concerns expressed by Bird that Microsoft and other firms would be choosing what is or isn’t appropriate for AI models.

In the future, Azure customers will be able to obtain a report detailing all users who try to cause dangerous outputs. According to Bird, this enables system administrators to distinguish between users who may have more malevolent intent and those who are part of their own red team.

According to Bird, the safety features are “attached” right away to the GPT-4 and other well-known models, such as the Llama 2. However, users of smaller, less popular open-source systems might need to manually link the safety features to the models because Azure’s model garden has a large number of AI models.

With more customers showing interest in using Azure to access AI models, Microsoft has been utilizing AI to strengthen the safety and security of its software. The business has also made an effort to increase the quantity of potent AI models it offers; most recently, it signed an exclusive agreement with the French AI startup Mistral to make the Mistral Large model available on Azure.

Technology

IT & Consulting

IT & Consulting

Industry

Technology

IT & Consulting

IT & Consulting

Industry

A new Microsoft safety system can detect hallucinations in its consumers’ AI applications

Share:

More Posts

The new Microsoft Copilot AI features include ‘auto-complete’, editing, and more

Oracle plans to double its AI investments in India

The next generation of Google DeepMind’s AI drug discovery model has been unveiled

Google is Using Artificial Intelligence To Fight Cyber Scams And Attacks: Check Out How

Send Us A Message

more insights

The new Microsoft Copilot AI features include ‘auto-complete’, editing, and more

Oracle plans to double its AI investments in India

The next generation of Google DeepMind’s AI drug discovery model has been unveiled

Google is Using Artificial Intelligence To Fight Cyber Scams And Attacks: Check Out How

More

Special Edition

Exclusive Content