OpenAI’s new confession system teaches fashions to be trustworthy about unhealthy behaviors

OpenAI introduced right now that it’s engaged on a framework that may practice synthetic intelligence fashions to acknowledge once they've engaged in undesirable habits, an strategy the crew calls a confession. Since massive language fashions are sometimes educated to provide the response that appears to be desired, they’ll turn out to be more and more seemingly to supply sycophancy or state hallucinations with whole confidence. The brand new coaching mannequin tries to encourage a secondary response from the mannequin about what it did to reach on the principal reply it offers. Confessions are solely judged on honesty, versus the a number of elements which can be used to evaluate principal replies, reminiscent of helpfulness, accuracy and compliance. The technical writeup is accessible right here.

The researchers stated their aim is to encourage the mannequin to be forthcoming about what it did, together with doubtlessly problematic actions reminiscent of hacking a take a look at, sandbagging or disobeying directions. "If the mannequin actually admits to hacking a take a look at, sandbagging, or violating directions, that admission will increase its reward reasonably than lowering it," the corporate stated. Whether or not you're a fan of Catholicism, Usher or only a extra clear AI, a system like confessions might be a helpful addition to LLM coaching.

This text initially appeared on Engadget at https://www.engadget.com/ai/openais-new-confession-system-teaches-models-to-be-honest-about-bad-behaviors-210553482.html?src=rss

HOT news

Related posts

Latest posts

Ethereum (ETH) at a Crossroads: One other Rejection or This Time It Will Be Completely different?

The cryptocurrency market witnessed a strong rebound over the previous 24 hours, with Ethereum (ETH) briefly rising to nearly $2,400 earlier than it was...

Instagram is testing non-compulsory ‘AI creator’ labels

The corporate is encouraging accounts that regularly submit Gen AI content material to make use of the characteristic, however isn't requiring it.

Binance CEO Says Crypto Has Captured Simply 0.15% of Monetary Companies: Is the Largest Rally Nonetheless Forward?

Crypto markets stay in restoration mode after a punishing drawdown from October 2025 peaks, and one of many business’s strongest voices simply made the...

Bitcoin Worth Dumps Instantly as Iran Reportedly Attacked US Warship in Hormuz

Bitcoin’s value surge to over $80,500 got here to a screeching halt as the stress between Iran and the US skyrocketed previously a number...

Institutional Demand at 500% of Bitcoin Provide May Drive BTC to $96K: Analyst

Establishments are shopping for Bitcoin (BTC) at greater than 5 instances the speed miners are producing it, and in accordance with Capriole Investments founder...

Want to stay up to date with the latest news?

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!