Curiosity in minds and machine learners

In the “classic” machine learning paradigm of supervised learning, there’s no role for curiosity. The goal of a supervised learning algorithm is simply to match a set of labels for a provided dataset as closely as possible. The task is clearly defined, and even if the algorithm was capable of wondering about the meaning of the data pouring into it, this wouldn’t help with the learning task at all.

For humans, on the other hand, curiosity seems to be an integral part of how we learn. Even before starting school, children are hard at work figuring out the world around them. They crave novelty and surprise, and delight in finding solutions to new challenges. Their choice of things to investigate or focus on is internally motivated: they target objects and skills that present just the right amount of challenge (too little is boring, too much is frustrating). Although adults might provide toys and encouragement, children seem to find exploring and gaining knowledge intrinsically rewarding. (This isn’t just limited to humans, either – for example, mice will endure an electric shock to explore a new environment.1) “Curiosity” is difficult to define precisely, but surely choosing to learn about something just for the sake of it gets at the heart of what it means to be curious.

It shouldn’t come as a surprise that humans and machine learning algorithms learn in very different ways. However, unlike supervised learning, there seems to be a role for curiosity in reinforcement learning (RL). In RL, an algorithm (also called an agent) must choose actions in an attempt to reach some particular goal state, where it receives a reward that allows it to learn which actions are beneficial. For simple goals, an RL agent may be able to stumble upon a solution simply by choosing random actions until it receives a reward for the first time. However, this strategy doesn’t work so well if the task demands a precise sequence of actions. For example, when playing a video game, many hazards may need to be avoided before any points can be scored. 

In these cases, the RL agent has no idea what to do to reach the goal, and it might try random actions for years without success. However, the failures provide important information about what kinds of states don’t lead to a reward. This suggests a new strategy: systematically trying as many new things as possible, while keeping track of what has been tried so far. Various algorithms have been proposed to do this, and when implemented, the RL agent purposefully seeks out new situations, looking very much as if it is curious about its environment. One of these new situations will eventually be the goal, and the RL agent will successfully learn to complete the task.

The simplest way to make a “curious” RL agent is to have it try to predict what is going to happen next, and give it an “exploration bonus” when this prediction fails. Intuitively, the agent can make good predictions about situations it has encountered before, but not about novel ones. Therefore, it receives the exploration bonus for learning how to achieve new kinds of states. How well does this behavior reflect human-like curiosity? Well, the exploration bonus causes the agent to seek out novelty, just like a curious human. However, this version of curiosity seems strangely passive: anything that’s unpredictable is equally rewarding to the RL agent, and it has no concept of some kinds of novelty being more interesting or relevant than others. This turns out to be a very practical concern, because the RL agent will seek out any source of unpredictability in its environment, such as random TV static.

Several methods proposed in the last several years, such as Random Network Distillation (RND) and episodic curiosity, have solved the “TV static problem.” In fact, these kinds of approaches to exploration are so successful that they enabled RL agents to achieve superhuman performance on difficult Atari video games like Montezuma’s Revenge. However, they achieve this through algorithmic tricks that rule out randomness in the environment when calculating exploration bonuses. An RND agent still has no basis for deciding that some kinds of novelty are more important than others.2

This is important because, unlike in video games, the real world is incredibly complex. Predicting the consequence of every action, especially in the presence of active humans and a dynamic environment, is intractable. Humans deal with this in part by having relatively narrow interests and things that matter to us, either because they are extrinsically rewarding (e.g. income, food), or because we simply get curious about specific things. Exploration and play, especially in children, are almost certainly tied to overall learning, development, and gaining mastery in interacting with the world. But children learn actively; they choose their own problems to solve, mysteries to explore, and questions to try to find answers to. They introduce their own structure into a world that resists simple rules and explanations.

A related idea in machine learning is that of automatic curricula, where algorithms attempt to successively choose learning tasks that are neither too easy nor too difficult in an attempt to smooth the learning process. Similarly, in active learning3 algorithms seek out more information when they’re uncertain. But the truth is that we have very little idea why children (and humans in general) get curious about some things rather than others, even though this is probably crucial to how we learn to make sense of our highly complicated environments. Human curiosity has already proved a fruitful source of inspiration for machine learning; future insights from cognitive scientists about how and why we get curious are likely to propel further developments and even better algorithms.

A note of caution

As AI agents become more competent and have more influence over our lives, ensuring that their behavior is truly beneficial becomes increasingly important. Even when AI agents only optimize for a single human-provided objective, it has been well established that there could be serious unintended consequences (for example, since the AI has to be operational to complete its task, it’s incentivized to resist attempts to turn it off for almost any objective). From this view, the idea of AI agents actively exploring and choosing their own goals sounds especially risky.4 If this turns out to be helpful or even necessary for robust learning in complex environments, how could we ensure that the things an AI decides to attempt aren’t harmful? Learning more about how young children play and explore may help, but children have societal scaffolds (like being physically weaker than adults) that may not apply to robots and AI algorithms. Ensuring safety while supporting learning is an important topic for research. Read the rest

A common misconception about the Chinese Room Argument

The “Chinese Room Argument” is one of the most famous bits of philosophy among computer scientists. Until recently, I thought the argument went something like this: Imagine a room containing a person with no Chinese language proficiency and a very sophisticated book of rules. Occasionally, someone slides a piece of paper with a sentence in Chinese written on it under the door. The room’s inhabitant (let’s call them Clerk) uses the rulebook to look up each Chinese character and pick out the appropriate characters to form a response, which they slide back under the door. Clerk has no idea what the characters or the sentences they form mean, but with a sufficiently sophisticated rulebook, it would look to outside observers like Clerk was conversant in written Chinese. The conclusion of the argument was that even if we were to create an AI system that could pass a Turing test in Chinese (or any other language), that wouldn’t be sufficient to conclude that it actually understands Chinese. Understanding, here, means something like conscious awareness of what the Chinese characters mean and what is being said with them.

It turns out that this conclusion is quite different from what Searle, who originally proposed the Chinese Room thought experiment, intended.1 Searle wasn’t trying to argue that consciousness was difficult or impossible to detect in machines, he was arguing that it is impossible for a digital computer to be conscious at all.2 To understand why, consider this version of the thought experiment: someone sends me a program that they claim passes the Turing test. I take the assembly code for this program, print it into a giant manual, and shut myself up in an unused basement in Cory Hall. When a piece of paper with some writing on it is slid under the door, I use the manual to pick the responses, just as Clerk did before. In this fashion, I essentially become the computer running the program. But just like Clerk, I’m not conscious of the meaning of the sentences I receive as input or the reasoning behind selecting one output over another. This means (according to Searle) that a regular computer running this code also couldn’t be conscious of these things, even if it does pass the Turing test. Therefore, no matter how sophisticated a program is, the computer running it won’t achieve consciousness. Searle sums up this viewpoint by saying: “Symbol shuffling… does not give any access to the meanings of the symbols.”

Since humans are conscious and do have access to meanings, Searle believed that there is something special about the brain over and above any digital computer. He is commonly quoted as saying “brains cause minds” (i.e. the “software” running on the brain doesn’t create a mind, at least not by itself – something about the physical brain itself is critical). This stronger conclusion is unsurprisingly not widely accepted among AI researchers, who generally believe that a digital computer (perhaps a very powerful one) running the right kind of software could achieve understanding and consciousness. 

Most philosophers also seem to object to the Chinese Room Argument. One criticism of Searle’s argument is so well-known that it gets its own name: the “systems response.” This argument accepts Searle’s assumption that Clerk wouldn’t understand Chinese simply by manipulating symbols, but notes that we can’t logically conclude from this that the system that includes both Clerk and the rulebook doesn’t understand Chinese. Searle appears to struggle to take this objection seriously – how could a rulebook understand anything?3

The systems response seems a little less absurd when we consider just how sophisticated the rulebook would have to be to pass a serious Turing test. Imagine a human judge asks a computer to explain a bad joke. The computer might respond that explaining jokes ruins them, but when pressed, give an explanation that explains the cultural context of the joke and why the punchline is amusing.4 A rulebook that could exhibit this kind of behavior would have to be unimaginably complex! The problem with the Chinese Room Argument is that it invites us to imagine manipulating symbols according to a (say) dictionary-size rulebook, then extrapolate our intuition about this scenario to the wondrously complex software that would be needed to exhibit a human-like mastery of language. If you seriously consider just how far this extrapolation needs to go, it’s reasonable to entertain serious doubts as to whether the simple dictionary-rulebook case tells us anything at all about a program that passes the Turing test.    While the Systems Response and other criticisms make it difficult to take Searle’s conclusion that brains must have a special “consciousness sauce” missing in digital computers too seriously, these criticisms also don’t establish the other extreme, namely that a Turing test-passing program really would understand language (or be conscious). Therefore, the conclusion I’m left with is quite similar to my original misunderstanding of the Chinese Room Argument: computers may or may not achieve consciousness someday, but knowing for sure whether a future computer thinks or understands may not be possible. Read the rest