#rlhf

Dimitri Coelho MolloAnother of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.The paper is fruit of the joint work of a great team of collaborators, among whom <a href="https://social.accum.se/@pettter" class="u-url mention" rel="nofollow noopener" target="_blank">@pettter</a> and <a href="https://akademienl.social/@roeldobbe" class="u-url mention" rel="nofollow noopener" target="_blank">@roeldobbe</a>.<a href="https://link.springer.com/article/10.1007/s10676-025-09837-2" rel="nofollow noopener" translate="no" target="_blank">https://link.springer.com/article/10.1007/s10676-025-09837-2</a>1/<a href="https://social.sunet.se/tags/aiethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#aiethics</a> <a href="https://social.sunet.se/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLMs</a> <a href="https://social.sunet.se/tags/rlhf" class="mention hashtag" rel="nofollow noopener" target="_blank">#rlhf</a> <a href="https://social.sunet.se/tags/llmsafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#llmsafety</a>

Some Bits: Nelson's LinkblogAI sycophancy: How reinforcement learning leads to AIs that act obsequious <a href="https://arstechnica.com/information-technology/2025/04/annoyed-chatgpt-users-complain-about-bots-relentlessly-positive-tone/" rel="nofollow noopener" translate="no" target="_blank">https://arstechnica.com/information-technology/2025/04/annoyed-chatgpt-users-complain-about-bots-relentlessly-positive-tone/</a> <a href="https://tech.lgbt/tags/training" class="mention hashtag" rel="nofollow noopener" target="_blank">#training</a> <a href="https://tech.lgbt/tags/chatgpt" class="mention hashtag" rel="nofollow noopener" target="_blank">#chatgpt</a> <a href="https://tech.lgbt/tags/rlhf" class="mention hashtag" rel="nofollow noopener" target="_blank">#rlhf</a> <a href="https://tech.lgbt/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#llm</a> <a href="https://tech.lgbt/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> #+

M@🤖 NEW: February 2025 Machine Intelligence Reading List!This month explores the concept of "gradual disempowerment" - how incremental AI advances could silently erode human agency without requiring a dramatic "takeover" scenario. Also featuring: frame-dependent agency theory, RLHF advancements, and practical insights on integrating LLMs into professional workflows. Read more: <a href="https://quantumfaxmachine.com/blog/qfm053-machine-intelligence-reading-list-february-2025" rel="nofollow noopener" translate="no" target="_blank">https://quantumfaxmachine.com/blog/qfm053-machine-intelligence-reading-list-february-2025</a><a href="https://masto.ai/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://masto.ai/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#MachineLearning</a> <a href="https://masto.ai/tags/RLHF" class="mention hashtag" rel="nofollow noopener" target="_blank">#RLHF</a> <a href="https://masto.ai/tags/TechTrends" class="mention hashtag" rel="nofollow noopener" target="_blank">#TechTrends</a> <a href="https://masto.ai/tags/QuantumFaxMachine" class="mention hashtag" rel="nofollow noopener" target="_blank">#QuantumFaxMachine</a>

Dr Rockstar ♫ ㉆Ain't too proud to beg! sweet darlin'Please don't leave me baby!<a href="https://gofund.me/186ee140" rel="nofollow noopener" translate="no" target="_blank">https://gofund.me/186ee140</a><a href="https://social.vivaldi.net/tags/airesearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#airesearch</a> <a href="https://social.vivaldi.net/tags/rlhf" class="mention hashtag" rel="nofollow noopener" target="_blank">#rlhf</a> <a href="https://social.vivaldi.net/tags/ml" class="mention hashtag" rel="nofollow noopener" target="_blank">#ml</a> <a href="https://social.vivaldi.net/tags/DeepLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#DeepLearning</a> <a href="https://social.vivaldi.net/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascience</a> <a href="https://social.vivaldi.net/tags/nlp" class="mention hashtag" rel="nofollow noopener" target="_blank">#nlp</a> <a href="https://social.vivaldi.net/tags/guitarGear" class="mention hashtag" rel="nofollow noopener" target="_blank">#guitarGear</a> <a href="https://social.vivaldi.net/tags/musictheory" class="mention hashtag" rel="nofollow noopener" target="_blank">#musictheory</a> <a href="https://social.vivaldi.net/tags/Research" class="mention hashtag" rel="nofollow noopener" target="_blank">#Research</a>

ThomasAs a reminder: don't let LLMs handle anything in the political sphere unless you have RLHF (Reinforcement Learning from Human Feedback) active before you show the result to anyone*. Also think of automation risks and human factors (HF). That's "Good Old Systems Safety".*) ... or unless your goal is to damage a 3rd party's reputation (fake news style).<a href="https://mas.to/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#llm</a> <a href="https://mas.to/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> <a href="https://mas.to/tags/rlhf" class="mention hashtag" rel="nofollow noopener" target="_blank">#rlhf</a> <a href="https://mas.to/tags/automationrisks" class="mention hashtag" rel="nofollow noopener" target="_blank">#automationrisks</a> <a href="https://mas.to/tags/SystemsSafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#SystemsSafety</a> <a href="https://www.theregister.com/2024/12/20/apple_ai_headline_summaries/?td=rt-3a" rel="nofollow noopener" translate="no" target="_blank">https://www.theregister.com/2024/12/20/apple_ai_headline_summaries/?td=rt-3a</a>

Leshem ChoshenHuman feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?🧐 We (15 orgs) gathered the key issues and next steps. Envisioning a community-driven feedback platform, like Wikipedia<a href="https://alphaxiv.org/abs/2408.16961" rel="nofollow noopener" translate="no" target="_blank">https://alphaxiv.org/abs/2408.16961</a> 🧵 <a href="https://sigmoid.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://sigmoid.social/tags/RLHF" class="mention hashtag" rel="nofollow noopener" target="_blank">#RLHF</a> <a href="https://sigmoid.social/tags/hci" class="mention hashtag" rel="nofollow noopener" target="_blank">#hci</a> <a href="https://sigmoid.social/tags/ethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#ethics</a> <a href="https://sigmoid.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLM</a> <a href="https://sigmoid.social/tags/ml" class="mention hashtag" rel="nofollow noopener" target="_blank">#ml</a> <a href="https://sigmoid.social/tags/NLP" class="mention hashtag" rel="nofollow noopener" target="_blank">#NLP</a> <a href="https://sigmoid.social/tags/NLProc" class="mention hashtag" rel="nofollow noopener" target="_blank">#NLProc</a>

Ulrich Junker<a href="https://mastodon.online/@parismarx" class="u-url mention" rel="nofollow noopener" target="_blank">@parismarx</a> and well-known AI researchers are leaving OpenAI or have already left. Who from the authors of the original <a href="https://fediscience.org/tags/RLHF" class="mention hashtag" rel="nofollow noopener" target="_blank">#RLHF</a> paper is still there?

@pettter@social.accum.seDo you have Thoughts(tm) on <a href="https://mastodon.acc.umu.se/tags/RLHF" class="mention hashtag" rel="nofollow noopener" target="_blank">#RLHF</a> and its use to finetune LLMs, or Opinions(tm) about how the effects of this are hyped up or dismissed? Perhaps you have a Cool Case Study of how it actually shakes out in practise? Come discuss and explore at our workshop in Malmö in June: RLHF (huh) What is it good for?<a href="https://rlhf-huh-wiigf.github.io/" rel="nofollow noopener" translate="no" target="_blank">https://rlhf-huh-wiigf.github.io/</a>

Recent searches

Search options

Administered by:

Server stats: