CRYPTO-GRAM, September 15, 202 Part 4
From 
Sean Rima@21:1/229.1 to 
All on Tue Oct  1 21:52:08 2024
 
 
** *** ***** ******* *********** *************
Evaluating the Effectiveness of Reward Modeling of Generative AI Systems
[2024.09.11] New research evaluating the effectiveness of reward modeling  during Reinforcement Learning from Human Feedback (RLHF): “SEAL: 
Systematic Error Analysis for Value ALignment.” The paper introduces  quantitative metrics for evaluating the effectiveness of modeling and 
aligning human values:
    Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to 
align language models (LMs) with human values by training reward models 
(RMs) on binary preferences and using these RMs to fine-tune the base LMs.  Despite its importance, the internal mechanisms of RLHF remain poorly  understood. This paper introduces new metrics to evaluate the 
effectiveness of modeling and aligning human values, namely feature 
imprint, alignment resistance and alignment robustness. We categorize  alignment datasets into target features (desired values) and spoiler 
features (undesired concepts). By regressing RM scores against these 
features, we quantify the extent to which RMs reward them a metric we term  feature imprint. We define alignment resistance as the proportion of the  preference dataset where RMs fail to match human preferences, and we 
assess alignment robustness by analyzing RM responses to perturbed inputs. 
Our experiments, utilizing open-source components like the Anthropic  preference dataset and OpenAssistant RMs, reveal significant imprints of  target features and a notable sensitivity to spoiler features. We observed 
a 26% incidence of alignment resistance in portions of the dataset where  LM-labelers disagreed with human preferences. Furthermore, we find that  misalignment often arises from ambiguous entries within the alignment 
dataset. These findings underscore the importance of scrutinizing both RMs 
and alignment datasets for a deeper understanding of value alignment.
** *** ***** ******* *********** *************
Microsoft Is Adding New Cryptography Algorithms
[2024.09.12] Microsoft is updating SymCrypt, its core cryptographic 
library, with new quantum-secure algorithms. Microsoft’s details are here.  From a news article:
    The first new algorithm Microsoft added to SymCrypt is called ML-KEM.  Previously known as CRYSTALS-Kyber, ML-KEM is one of three post-quantum  standards formalized last month by the National Institute of Standards and  Technology (NIST). The KEM in the new name is short for key encapsulation.  KEMs can be used by two parties to negotiate a shared secret over a public  channel. Shared secrets generated by a KEM can then be used with 
symmetric-key cryptographic operations, which aren’t vulnerable to Shor’s  algorithm when the keys are of a sufficient size.
    The ML in the ML-KEM name refers to Module Learning with Errors, a 
problem that can’t be cracked with Shor’s algorithm. As explained here,  this problem is based on a “core computational assumption of lattice-based  cryptography which offers an interesting trade-off between guaranteed 
security and concrete efficiency.”
    ML-KEM, which is formally known as FIPS 203, specifies three parameter  sets of varying security strength denoted as ML-KEM-512, ML-KEM-768, and  ML-KEM-1024. The stronger the parameter, the more computational resources 
are required.
    The other algorithm added to SymCrypt is the NIST-recommended XMSS. 
Short for eXtended Merkle Signature Scheme, it’s based on “stateful  hash-based signature schemes.” These algorithms are useful in very 
specific contexts such as firmware signing, but are not suitable for more  general uses.
** *** ***** ******* *********** *************
My TedXBillings Talk
[2024.09.13] Over the summer, I gave a talk about AI and democracy at  TedXBillings. The recording is live.
Please share. I’m hoping for more than 200 views....
** *** ***** ******* *********** *************
Upcoming Speaking Engagements
[2024.09.14] This is a current list of where and when I am scheduled to 
speak:
    I’m speaking at eCrime 2024 in Boston, Massachusetts, USA. The event  runs from September 24 through 26, 2024, and my keynote is at 8:45 AM ET 
on the 24th.
    I’m briefly speaking at the EPIC Champion of Freedom Awards in  Washington, DC on September 25, 2024.
    I’m speaking at SOSS Fusion 2024 in Atlanta, Georgia, USA. The event  will be held on October 22 and 23, 2024, and my talk is at 9:15 AM ET on  October 22, 2024.
The list is maintained on this page.
** *** ***** ******* *********** *************
Since 1998, CRYPTO-GRAM has been a free monthly newsletter providing  summaries, analyses, insights, and commentaries on security technology. To  subscribe, or to read back issues, see Crypto-Gram's web page.
You can also read these articles on my blog, Schneier on Security.
Please feel free to forward CRYPTO-GRAM, in whole or in part, to 
colleagues and friends who will find it valuable. Permission is also 
granted to reprint CRYPTO-GRAM, as long as it is reprinted in its entirety.
Bruce Schneier is an internationally renowned security technologist, 
called a security guru by the Economist. He is the author of over one 
dozen books -- including his latest, A Hacker’s Mind -- as well as 
hundreds of articles, essays, and academic papers. His newsletter and blog 
are read by over 250,000 people. Schneier is a fellow at the Berkman Klein  Center for Internet & Society at Harvard University; a Lecturer in Public  Policy at the Harvard Kennedy School; a board member of the Electronic  Frontier Foundation, AccessNow, and the Tor Project; and an Advisory Board  Member of the Electronic Privacy Information Center and 
VerifiedVoting.org. He is the Chief of Security Architecture at Inrupt, 
Inc.
Copyright © 2024 by Bruce Schneier.
--- 
 * Origin: High Portable Tosser at my node (21:1/229.1)