OpenAI’s new confession system teaches fashions to be trustworthy about unhealthy behaviors

OpenAI introduced right now that it’s engaged on a framework that may practice synthetic intelligence fashions to acknowledge once they've engaged in undesirable habits, an strategy the crew calls a confession. Since massive language fashions are sometimes educated to provide the response that appears to be desired, they’ll turn out to be more and more seemingly to supply sycophancy or state hallucinations with whole confidence. The brand new coaching mannequin tries to encourage a secondary response from the mannequin about what it did to reach on the principal reply it offers. Confessions are solely judged on honesty, versus the a number of elements which can be used to evaluate principal replies, reminiscent of helpfulness, accuracy and compliance. The technical writeup is accessible right here.

The researchers stated their aim is to encourage the mannequin to be forthcoming about what it did, together with doubtlessly problematic actions reminiscent of hacking a take a look at, sandbagging or disobeying directions. "If the mannequin actually admits to hacking a take a look at, sandbagging, or violating directions, that admission will increase its reward reasonably than lowering it," the corporate stated. Whether or not you're a fan of Catholicism, Usher or only a extra clear AI, a system like confessions might be a helpful addition to LLM coaching.

This text initially appeared on Engadget at https://www.engadget.com/ai/openais-new-confession-system-teaches-models-to-be-honest-about-bad-behaviors-210553482.html?src=rss

HOT news

Related posts

Latest posts

Dogecoin Simply Flipped a Multi-Session Resistance Stage on a 122% Quantity Spike: Is the Altcoin Season Beginning?

Dogecoin is transferring once more, and the amount behind the breakout suggests this isn’t noise however a transfer that would transfer the altcoin market...

WLFI Lawsuit Sparks Response: Justin Solar Calls It ‘Meritless’

The rapidly deteriorating relationship between the previous allies, the Trump-linked World Liberty Monetary mission and Justin Solar, took one other step within the unsuitable...

Ethereum (ETH) at a Crossroads: One other Rejection or This Time It Will Be Completely different?

The cryptocurrency market witnessed a strong rebound over the previous 24 hours, with Ethereum (ETH) briefly rising to nearly $2,400 earlier than it was...

Instagram is testing non-compulsory ‘AI creator’ labels

The corporate is encouraging accounts that regularly submit Gen AI content material to make use of the characteristic, however isn't requiring it.

Binance CEO Says Crypto Has Captured Simply 0.15% of Monetary Companies: Is the Largest Rally Nonetheless Forward?

Crypto markets stay in restoration mode after a punishing drawdown from October 2025 peaks, and one of many business’s strongest voices simply made the...

Want to stay up to date with the latest news?

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!