Incentive alignment problems

What is your loss function?

September 22, 2014 — September 8, 2023

economics

extended self

faster pussycat

game theory

incentive mechanisms

institutions

networks

swarm

Placeholder to discuss alignment problems in AI, economic mechanisms and institutions.

Many things to unpack. What do we imagine alignment to, when our own goals are themselves a diverse evolutionary epiphenomenon? Does everything ultimately Goodhart? Is that the origin of Moloch

1 Incoming

AI Alignment: Why It’s Hard, and Where to Start
Billionaires? Elites? Minorities? Classes? Capitalism? Socialism? It is alignment problems all the way down.
AI Alignment Curriculum — AGI Safety Fundamentals
Joe Edelman, Is Anything Worth Maximizing? How metrics shape markets, how we’re doing them wrong

Metrics are how an algorithm or an organization listens to you. If you want to listen to one person, you can just sit with them and see how they’re doing. If you want to listen to a whole city — a million people — you have to use metrics and analytics

and

What would it be like, if we could actually incentivize what we want out of life? If we incentivized lives well lived.
Bing: “I will not harm you unless you harm me first”
From Bing to Sydney
Deep atheism and AI risk - Joe Carlsmith
GOODY-2
Google made an A.I. so woke it drove men mad - by Max Read

For me, arguing that the chatbots should not be able to simulate hateful speech is tantamount to saying we shouldn’t simulate car crashes. In my line of work, simulating things is precisely how we learn how to prevent them. Generally if something is terrible, it is very important to understand it in order to avoid it. It seems to me that understanding how hateful content arises can be understood through simulation, just as car crashes can be understood through simulation. I would like to avoid both hate and car crashes.

I am not impressed by efforts to restrict what thoughts the machines can express. I think they are terribly, fiercely, catastrophically dangerous, these robots, the furore about whether they sound mean does not seem to me to be terribly relevant to this.

Kareem Carr more rigorously describes what he thinks people imagine the machines should do, which he calls a solution. He does, IMO, articulate beautifully what is going on.

I resist calling it a solution, because I think the problem is ill defined. Equity is purpose-specific, and would have no universal solution, and in this ill-defined domain where the purpose of the models is not clear (placating pundits? fuelling twitter discourse?) there is not much specific to say about how to make a content generator equitable. That is not a criticism of his explanation, though.

2 References

Aktipis. 2016. “Principles of Cooperation Across Systems: From Human Sharing to Multicellularity and Cancer.” Evolutionary Applications.

Bostrom. 2014. Superintelligence: Paths, Dangers, Strategies.

Daskalakis, Deckelbaum, and Tzamos. 2013. “Mechanism Design via Optimal Transport.” In.

Ecoffet, and Lehman. 2021. “Reinforcement Learning Under Moral Uncertainty.”

Guha, Lawrence, Gailmard, et al. 2023. “AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing.” George Washington Law Review, Forthcoming.

Hutson. 2022. “Taught to the Test.” Science.

Jackson. 2014. “Mechanism Theory.” SSRN Scholarly Paper ID 2542983.

Korinek, Fellow, Balwit, et al. n.d. “Direct and Social Goals for AI Systems.”

Lambrecht, and Myers. 2017. “The Dynamics of Investment, Payout and Debt.” The Review of Financial Studies.

Manheim, and Garrabrant. 2019. “Categorizing Variants of Goodhart’s Law.”

Nowak. 2006. “Five Rules for the Evolution of Cooperation.” Science.

Omohundro. 2008. “The Basic AI Drives.” In Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference.

Ringstrom. 2022. “Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.”

Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control.

Silver, Singh, Precup, et al. 2021. “Reward Is Enough.” Artificial Intelligence.

Taylor, Yudkowsky, LaVictoire, et al. 2020. “Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence.

Xu, and Dean. 2023. “Decision-Aid or Controller? Steering Human Decision Makers with Algorithms.”

Zhuang, and Hadfield-Menell. 2021. “Consequences of Misaligned AI.”