OpenAI’s new confession system teaches fashions to be trustworthy about unhealthy behaviors

OpenAI introduced right now that it’s engaged on a framework that may practice synthetic intelligence fashions to acknowledge once they've engaged in undesirable habits, an strategy the crew calls a confession. Since massive language fashions are sometimes educated to provide the response that appears to be desired, they’ll turn out to be more and more seemingly to supply sycophancy or state hallucinations with whole confidence. The brand new coaching mannequin tries to encourage a secondary response from the mannequin about what it did to reach on the principal reply it offers. Confessions are solely judged on honesty, versus the a number of elements which can be used to evaluate principal replies, reminiscent of helpfulness, accuracy and compliance. The technical writeup is accessible right here.

The researchers stated their aim is to encourage the mannequin to be forthcoming about what it did, together with doubtlessly problematic actions reminiscent of hacking a take a look at, sandbagging or disobeying directions. "If the mannequin actually admits to hacking a take a look at, sandbagging, or violating directions, that admission will increase its reward reasonably than lowering it," the corporate stated. Whether or not you're a fan of Catholicism, Usher or only a extra clear AI, a system like confessions might be a helpful addition to LLM coaching.

This text initially appeared on Engadget at https://www.engadget.com/ai/openais-new-confession-system-teaches-models-to-be-honest-about-bad-behaviors-210553482.html?src=rss

OpenAI’s new confession system teaches fashions to be trustworthy about unhealthy behaviors

HOT news

Related posts

Latest posts

Dogecoin Simply Flipped a Multi-Session Resistance Stage on a 122% Quantity Spike: Is the Altcoin Season Beginning?

WLFI Lawsuit Sparks Response: Justin Solar Calls It ‘Meritless’

Ethereum (ETH) at a Crossroads: One other Rejection or This Time It Will Be Completely different?

Instagram is testing non-compulsory ‘AI creator’ labels

Binance CEO Says Crypto Has Captured Simply 0.15% of Monetary Companies: Is the Largest Rally Nonetheless Forward?

Latest Posts

Dogecoin Simply Flipped a Multi-Session Resistance Stage on a 122% Quantity Spike: Is the Altcoin Season Beginning?

WLFI Lawsuit Sparks Response: Justin Solar Calls It ‘Meritless’

Ethereum (ETH) at a Crossroads: One other Rejection or This Time It Will Be Completely different?

Most Popular

Humble Games reportedly lays off its entire staff

AI video startup Runway reportedly trained on ‘thousands’ of YouTube videos without permission

Kwenta and Perennial Kickstart Arbitrum Expansion with 1.9M ARB

Fast Access

OpenAI’s new confession system teaches fashions to be trustworthy about unhealthy behaviors

HOT news

Related posts

Latest posts

Want to stay up to date with the latest news?

Latest Posts

Most Popular

Fast Access