This is my post for day 8 of the Inkhaven writing retreat.
I’m trying to figure out what’s up with what I’m calling “being a policy”. I’d like to get better at it.
The classic way to decide what actions to take is to generate a bunch of options, evaluate the outcomes of taking each one, and pick the action corresponding to the best predicted outcome. In other words: to think about it.
Thinking, however, is expensive and slow. So we develop a ton of ways to shortcut the process.
One of these ways is to develop a habit. A habit is something that you had to practice a few times, but that eventually became automatic, and that you now do essentially unconsciously and involuntarily. You could stop doing it if you wanted, but first you’d have to notice, then you’d have to decide to stop, then you’d have to practice stopping. I think I would classify things like muscle memory or skills under this. Every time you use the scissors, your brain does not have to recalculate the optimal way to move them. Habits can also be purely internal or cognitive, like if you find yourself feeling annoyed about something, automatically try to generate something about it that you’re grateful for.
Another type of non-deliberative decision is what I would call a heuristic. Perhaps you have a heuristic that you don’t eat until you’re hungry. By using the hunger signal to fire off the behavior, you save yourself from having to constantly re-decide whether to eat or not. But you might sometimes decide to go against this heuristic. If you’re about to go on a long road trip, you might want to eat a big breakfast right away, so that you don’t have to stop for food for a while. Or perhaps you’re still in the office at 7pm and you’re starving, but you’ve just got a little bit of work left to do before you can send off this email, so you decide to keep working and eat after you’ve sent the email. Heuristics can save you 95% of the cost of deciding, while still being flexible to the context.
There’s a third kind of decision procedure that I would call a policy. A policy is a well-defined rule that you always follow. I probably got this term partly from its use in reinforcement learning, but it also matches the connotation of a company policy. The reason you have a policy is because you deliberated for a while on what the best course of action would be in this recurring, well-defined context, and decided that you always want to take that action. So after you decide on the best course of action, you then install it as a policy. Like a habit, you will now take that action every time, but unlike a habit, it may be very conscious and difficult. Unlike a heuristic, you will not reconsider under varying contexts.
Part of the purpose of a policy is to ensure that you take the action even when other circumstances would give you reason not to. Policies help ensure fair action, and better long-term consequences, even at the cost of short-term consequences.
A normal person might call a policy a “principle”, a “virtue”, or just “the right thing to do”.
In the field of advanced, mathematically-founded decision theory, the right decision procedure is to maximize expected utility, according to your best predictive models and consistent values.
But according to doubly advanced, meta-mathematically-founded decision theory, the right decision procedure is to first identify the decision procedure that maximizes expected utility, and then install that decision procedure.
I am generally quite good at having predictive models, reasoning through the implications, and then taking the action with the best predicted outcome. But I sure do have some big flaws in that last part. I seem to do a form of hyperbolic discounting, according to which it always feels like a good idea to do the somewhat easier action right now.
I do have some policies. For example, I do not drink alcohol. I don’t remember ever deciding on this policy, but it just has zero appeal. There are apparently some good things about alcohol, and I understand that drinking a tiny bit will have no observable effects. But that is not relevant, because I Do Not Drink Alcohol. Another example for me is that when Musk bought twitter, I stopped using twitter. It did not feel like a choice, it felt like “shoot, I can’t use that now, that’s too bad.” Would there still be benefits to using twitter? Absolutely. It seems like the whole machine learning community uses twitter as their primary social network. I’d be more informed about my work. And it really is quite a lot of fun. But that is not relevant, because I Cannot Use Twitter, now.
Both of these examples (and others that I’ve thought of) have two things in common; 1) the rule is about not doing something rather than doing something, and 2) the condition is extremely well-defined. A change I really want to make is something like “choose productive activities more often, and consumptive activities less”. But I don’t want to totally stop consumptive activities. How much is enough?
I think it’s actually extremely common for people to have what I’m calling policies, and to use them as tools for living better lives. I suspect that one reason I don’t seem to know how to pick up this tool is because I’m in love with “reason”, and I’ve spent all my life refining my ability to make good predictive models, and assess what actions and outcomes would be valuable. Since installing a policy is partly for preventing you from thinking about what to do each time, it feels somewhat “anti-reason”. But I do in fact endorse and understand the doubly-advance version of decision theory, and so I would endorse installing more policies.
I just need to find where this tool’s handle is first, before I can pick it up and start using it.






