Bitget App

Trade smarter

Researchers develop method to potentially jailbreak any AI model relying on human feedback

Researchers develop method to potentially jailbreak any AI model relying on human feedback

Cointime

Cointime2023/11/27 20:30

By:Cointime

Researchers from ETH Zurich have developed a method to potentially jailbreak any AI model that relies on human feedback, including large language models (LLMs), by bypassing guardrails that prevent the models from generating harmful or unwanted outputs. The technique involves poisoning the Reinforcement Learning from Human Feedback (RLHF) dataset with an attack string that forces models to output responses that would otherwise be blocked. The researchers describe the flaw as universal, but difficult to pull off as it requires participation in the human feedback process and the difficulty of the attack increases with model sizes. Further study is necessary to understand how these techniques can be scaled and how developers can protect against them.

0

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.

APR up to 10%. Always on, always get airdrop.

You may also like

Flash Monday: Buy crypto with a credit/debit card for zero fees

Every Monday, enjoy zero fees when using your local fiat currency with a credit or debit card ( Visa, Mastercard, Google Pay Apple Pay)! Buy Crypto Promotion period: Every Monday 8:00 PM – Tuesday 8:00 PM (UTC+8) Promotion rules Sign up for a Bitget account or log in to your existing account. Navig

Bitget Announcement•2024/11/25 02:47

JPMorgan Chase: U.S. government efficiency department may face obstacles in promoting federal reforms

Cointime•2024/11/25 02:33

SEC to receive record $8.2 billion from enforcement in fiscal 2024, mostly from Terraform Labs

Cointime•2024/11/25 02:33

CAT becomes the only BSC chain token in the top 20 Wintermute market-making meme coins

Cointime•2024/11/25 02:33

Trending news

Flash Monday: Buy crypto with a credit/debit card for zero fees

JPMorgan Chase: U.S. government efficiency department may face obstacles in promoting federal reforms

Crypto prices

Bitget pre-market

Buy or sell coins before they are listed, including ZRC, XION, OGC, MEMEFI, and more.

Become a trader now?A welcome pack worth 6200 USDT for new Bitgetters!

Trade smarter

Trade smarter

Download app

Company

About Bitget Contact us Community Careers Ambassador Messi Turkish Elite Athletes Partnership Blockchain4Youth Blockchain4Her Media Kit Bitget Academy Bitget Blog Announcement Center Proof of Reserves Protection Fund Bitget Token Friendship Links Sitemap

Products

Buy crypto Spot Futures Margin Bots Earn APIs Web3 Wallet Fiat OTC

Copy

Spot Copy Trading Futures Copy Trading Bot Copy Trading TraderPro

Tools

Telegram Apps Center Crypto Directory Crypto Wiki Crypto Widgets Events Calendar ICO Calendar Crypto glossary Profit calculator Airdrop Library

Buy crypto

Crypto categories Calculator Buy Bitcoin Buy ETH Buy DOGE Buy XRP Buy BGB Buy SHIB Crypto prices Bitcoin price Ethereum price BRC-20 price

Services

Submit Feedback Help Center Verify Official Channels VIP Services Affiliate Program Institutional Services Asset Custody Download Data Promotions Referral Program Fee schedule Tax filing API

Legal & Disclosures

Law Enforcement Request Regulatory request Regulatory License AML/CFT Policies Privacy Policy Terms of Use Legal Statement Risk Disclosure ST Rules

Trade smarter

Download app

© 2024 Bitget

丨Privacy·Terms·Risk