This is my post for day 20 of the Inkhaven writing retreat.
Every day, everyone wakes up all around the world and gets down to business trying to achieve their values; raising their children, producing goods, having positive experiences, and generally staying alive. Much of life has the sense of trying to push forward toward something, towards your values, and being constrained or rate-limited somehow.
It’s easy to see how your actions are constrained by things like money, skills, or social support. These constraints are like physical walls delineating the room you can act within. Inside the room is all the actions you can take immediately and freely, whether or not they effectively achieve your values. Walls can be broken down, but it takes quite a lot more effort, planning, and hard trade-offs.
It’s less natural to look at your values themselves as constraints. You “could” swerve the car into the oncoming lane, but you overwhelmingly don’t want to.
I think it can be a useful perspective to view yourself as physically constrained by your values, just as much as you are constrained by money or skills, even if it’s a constraint that you’ll never try to overcome. (In this post I’m only referring to terminal values, or what philosopher Paul Tillich called ultimate concerns, which are the things you value in and of themselves. I am not including instrumental values, which are things you only value because they lead to other values.)
Values as constraints is a pretty funny way to look at things, like putting the cart before the horse. The primary relationship between your actions and your values is that your values are why you take the actions you do take. It’s not as if your values are “take as many actions as possible”.
But it’s still kinda true, though. Scott Garrabrant once quipped that an agent was something whose type signature was (A → B) → A. That is, if the agent predicts that action A will lead to outcome B, and the agent values outcome B, it will take action A. Similarly, (A → ¬B) → ¬A.
This can also be a useful perspective for viewing other people.
I have sometimes been confused about why some of my friends seemed to struggle with certain things. It was easy to consider that they might have had less skills, or had different life experiences, or sensory sensitivities, or different brain chemistry that produces more anxiety, or something. It took me longer to realized that they simply had values that I didn’t. Once I could internalize that they really did have those different values, it was obvious that their action space was more limited, and their struggles made sense.
Or, maybe other people are missing values that you have. This would give them more options for acting. This is one reason that powerful people are more likely to be be sociopaths. I physically could not take the action of hurting people in the way that many politicians or businessmen do. But they can take that action, because (in part) they literally don’t care. All else equal, a larger action space implies a higher probability of achieving your goal. Perhaps it is tempting to think something like “curse my pro-social values, if only I didn’t have them, then I could gain great political power, and with it, do higher-leverage pro-social things”. But like. That doesn’t really make sense. As a disclaimer, I don’t mean to imply that all politicians and businessmen are sociopaths, or that society is doomed (by this particular selection effect). Just that it’s something you should have in your model of society.
This idea applies less cleanly to people whose values are less stable and coherent. A more coherent mind might value both apples and oranges, with some weighing between them. If it has to make a decision that trades off between apples and oranges, then it will just apply the weights to decide. A less coherent mind might simply contain two subsystems, one which values only apples, and one which values only oranges. This mind would also have some kind of supervising system that controls when each subsystem runs. In this case, the conflict between the values will be a genuine conflict, and one of the subsystems might figure out how to destroy the other one. This is more like how I would describe becoming corrupted.
(also aysja). I like the reminder that values physically constrain us; I had a bit of a fish in water update from it, a bit more grokking as to why alignment is important. As you mention, value alignment problems already crop up even in pretty nearby mindspace (politicians who simply care less about other people are more likely to take unethical actions, the sort of munchkin-y behavior in shows like Breaking Bad where they win by taking the insane actions everyone else would have prematurely screened off)… but then you remember that even this is seriously constrained by convergent human stuff (we tend to aim at similar goals, most people care about their families, etc).
It’s easy to forget just how similar people are to one another; easy to kind of slip into imagining all other minds will share whatever that thing is. And so when you step back and look at how much of that is actually something like our values a priori and dramatically reducing the space of possible actions we can take, it becomes kind of mind boggling to imagine just how different it could be. Not only how much more capable a superintelligence might be, but the actions it would be willing to take, absent solving alignment. Terrifying, really.
LikeLike