Summary

What is a habit? One problem with the concept of habit has been that virtually everyone has their own ideas of what is meant by such a term. Whilst not eschewing folk psychology, it is useful to re-examine dictionary definitions of ‘habit’. The Oxford Dictionary of English defines habit as “a settled or regular tendency or practice, especially one that is hard to give up” and also “an automatic reaction to a specific situation”. The latter, reassuringly, is not too far from what has come to be known as stimulus–response theory.

Main Text

The stimulus–response habit is a very old concept deriving from Thorndike’s Law of Effect as a fundamental mechanism underpinning instrumental or ‘purposeful’ behavior, by which animals appear to gain mastery over their environment. The fundamental idea is that, through repetition and learning, environmental stimuli come to automatically elicit responses that are initially made spontaneously by the animal, and that this associative learning between stimulus and response is mechanistically bound together by the ‘reinforcing’ action of the significant consequences of that response, such as food for a hungry animal (Figure 1).



Figure 1. Habits and goal-directed actions.

(A) Example of a sequence of stimulus, response or action, and outcome. Someone sees a light switch, flips the light switch, and the light comes on. (B) Goal-directed actions are performed in order to obtain the future outcome. If the outcome (light coming on) becomes less valuable — for example, if the environment is already well illuminated (outcome devaluation) — the action will not be performed. Similarly, if the contingency between the action (flipping the switch) and the outcome (light on) is reduced — for example, an automatic switch is installed and the light comes on in the absence of the action (contingency degradation) — the action will not be performed. (C) Habits are elicited by antecedent stimuli and not performed to obtain future outcomes. Therefore, the response (flipping the switch) will be performed upon sight of the switch (stimulus), even if the outcome has been devalued or the relation between action and outcome degraded.

Note that this learning is therefore somewhat distinct from Pavlovian conditioning, in which responses are not spontaneous, but are also automatically elicited by stimuli predictive of reinforcers, including the environmental context. It is also distinct from goal-directed learning, by which animals learn directly that their actions have valuable consequences, and hence have knowledge of instrumental contingencies of specific outcomes and their current motivational values for the animal. The latter is termed goal-directed behavior (or action–outcome learning). Behavioral output overall is presumably usually an optimal combination of these two controlling systems, which are coordinated in certain circumstances, but can be placed into competition in others.

In this Primer, we shall explore how this dual control system for action control has been developed to explain behavior and how the habit construct itself is related to other commonly used notions of learned behavior, such as skills. It will also discuss where the concept of habit may lead us in future studies relevant for understanding learning systems in the brain and their relevance to healthy as well as pathological behavior.

Defining habits

Following their introduction via the behaviorism movement, habits came back into mainstream thinking upon the division of the memory systems into declarative and procedural forms by Squire and others. Habits were dumped into the procedural memory category along with skills, emotional (Pavlovian) conditioning and perceptual priming, although the polyglot nature of this waste-basket category was later recognised alternatively as ‘non-declarative memory’. Nevertheless, the notion of ‘what to do’, rather than ‘what is the fact’, is clearly relevant to such notions as habits and skills, and raises the question of their relationship. Are skills well-practiced habits, or are they different forms of instrumental behavior?

By definition, habits are representations of stimulus–response links that do not refer to goals, and are in a sense directly elicited by the environmental states or stimuli or contexts. On the other hand, tying one’s shoe-laces cannot be classified as a habit (unless it begins to occur when they are already well laced!) but clearly involves skillful components. Acquiring skills, or perfecting ‘how to do’ an action usually implies increasing accuracy and precision of a difficult movement; however, very precise movements can be elicited by antecedent stimuli or states, or motivated by the relation to future outcomes. This implies that extended training, which is necessary in the mastering of complex skills, does not necessarily render behavior habitual (which we will discuss further below).

The notion of habits refers not so much to ‘how the behavior is performed’ but to ‘which stimuli elicit the behavior’. Both notions call upon the concept of automaticity, but on different aspects of the behavior: how it is executed versus what triggers it or the motivational reasons why the behavior is performed. Skilled behavior, whilst automatic, can clearly be goal-directed, whereas habitual behavior is apparently much less so; thus it is also necessary to re-examine the dichotomy between habits and goal-directed behavior.

The psychological dichotomy between stimulus–response and goal-directed learning may suggest the existence of different controllers of action (or at least different neural circuits controlling the initiation of the action), rooted in distinct neural systems, and potentially relevant to a host of mental health disorders ranging from addiction (compulsive drug seeking) to depression (ruminative thoughts or mental acts), Parkinson’s disease (loss of the initiation of voluntary behavior) and Tourette’s syndrome (many tics are essentially motor stereotypies). Given that identical movements can be under more habitual or goal-directed control, however, another hypothesis would be that both control systems eventually converge into the same controller for execution of a specific action. The idea is that, although habits are thought, evolutionarily speaking, to be a useful mechanism for freeing up cortical processing time for novel and important situations, their top-down control and orchestration with the goal-directed system is an important task with serious implications when it fails.

Diagnosing habits

Dickinson and others operationalized the diagnosis of habits in terms of the relationship with the goal. Goal devaluation, for example, in the case of food, by poisoning or satiety associated with specific flavors, should be impervious for habit controlled behavior— in other words, it proceeds regardless. This works well for food across species, but has proven difficult for other goals, such as drugs or electric shock avoidance. For humans, many goals are in fact conditioned or secondary reinforcers, such as money and reward points, and their devaluation requires cognitive processing. Thus, goals can be devalued by instruction — fewer, or even penalty, points for a goal replacing its previous high point status. The other major manipulation for detecting habits is contingency degradation, whereby the connection between an action and an outcome is degraded — again, if the behavior occurs regardless then it is presumably in habit mode (Figure 1).

One prominent brain imaging study has investigated habits trained extensively outside the scanner and then degraded in this way, to reveal two main systems: one encoding action–outcome relations, and the other stimulus–response or context–response associations, which depend on anterior and posterior portions of the caudate nucleus, respectively, the net computation of instrumental contingency occurring in a neural network including the parietal cortex.

Other neuroimaging studies in humans have confirmed that the subjective appraisal of contingencies is associated with activity in the inferior and superior parietal lobule and medial frontal gyrus. Research in rodents and non-human primates also implicates such regions as the perigenual anterior cingulate cortex and prelimbic cortex in the detection of instrumental contingency. It is also important to realise that reward devaluation and contingency degradation may well exert their effects via different inputs to the goal-directed system; for example, a recent study by Balleine’s group has shown that medial orbital lesions in the rat impair reward devaluation but not contingency degradation.

These manipulations are clearly relevant to the dictionary definition of ‘automaticity’, although the description might be better described as being autonomous from the goal (that is, independently controlled) with automaticity being reserved for skilled behaviors. The question then arises of whether habits are controlled by their own outcomes, and thus self-reinforcing. Note that a common assumption is that there is competition between the two systems: that if the behavior is not under goal control then it must be a habit, and that this is a zero-sum game. As mentioned above, however, these systems must frequently be coordinated in a more harmonious relationship to generate adaptive behavior and so we question this assumption.

Similar issues have arisen recently with respect to Kahneman’s ‘System 1’ versus‘System 2’ behavior: the associated ‘explicit’ versus ‘implicit’ processing dichotomy, which bears on the issue of conscious perception of goal-directed versus habitual behavior, and the alternative neurocomputational approach to goal and habit learning through the model-based/model-free dichotomy introduced by Daw and Dayan. The latter paradigm depends on a two-stage decision process that reveals human choice to depend in part on a hypothetical ‘model’ (or ‘cognitive map’) of the task that allows for considerable flexibility of responding to achieve optimal outcomes. The alternative strategy is to respond simply on the basis of Thorndike’s Law of Effect, that is, to repeat responses that were successful on the last trial (‘win–stay’) and to switch away from responses that were unsuccessful (‘lose–shift’). Computational modeling reveals that humans adopt a mixture of these two strategies, the relative proportion of which can be represented by the parameter . The paradigm has been used to show imbalances towards habitual responding in several psychiatric disorders including obsessive-compulsive disorder, drug addiction and binge eating.

Recently, the possible mapping of ‘model-based’ behavior and ‘model-free’ behavior to the dichotomy between goal-directed behavior and habits has been investigated by testing the effect of goal devaluation on this balance. Intriguingly, whilst goal devaluation reduces model-based behavior it does not enhance model-free responding. Thus, this is an obvious instance in which the reciprocity between the two systems of behavioral control is not found. Additionally, it poses the problem that lack of devaluation can hardly be used to diagnose ‘habits’. In other words, lack of evidence for goal-directed behavior should not be taken as evidence of habitual control. Moreover, there is a further controversy that ‘model free’ behavior does not simply connote ‘habit-based’ behavior, but reinforcement learning (prediction error learning) in general.

Another behavioral task used for measuring goal-directed behavior and habits is the so-called Fabulous Fruit Game, in which human participants learn which button to press in the presence of pictures of fruits to obtain rewarding (points-scoring) pictures of other fruits. A number of such associations are trained concurrently before an ‘instructed’ devaluation is provided that makes some of the outcome fruits point-deducting penalties rather than point-scoring rewards. Goal-directed knowledge is then measured in a number of ways, either directly (using behavioral choice or questionnaires), or indirectly by presenting the stimuli very rapidly and measuring whether the volunteers respond appropriately or not, in a ‘Go-NoGo’ paradigm. This may lead to ‘slips of action’ as a consequence of impaired inhibition of responding for devalued fruit rewards. Such behavioral disinhibition in response to cues that lead to devalued rewarding fruits can be taken as reflecting an impairment in goal-directed behavior and a shift in balance to habitual responding, elicited by the specific stimuli, and under time-pressured, stressful circumstances that can be assumed to promote habitual behaviors through the weakening of executive control. Possible problems in working memory or response inhibition can be excluded by using a version of the test in which the eliciting stimuli rather than their associated outcomes are ‘devalued’.

This paradigm, or variants of it, has been used to quantify a presumed transition to habitual responding in several psychiatric disorders, including for example, obsessive-compulsive disorder and stimulant drug addiction. In the case of obsessive-compulsive disorder, however, it was found that the deficit in the ‘slips of action’ task was related more to impairments in action–outcome knowledge than to altered stimulus–response memory (remembering which responses to make to which stimuli). Therefore, the demonstration of habit learning is again indirect, depending on a failure of value-updating rather than a strengthening of stimulus-response links. These studies again suggest that lack of evidence for goal-directed behavior should not be necessarily taken as evidence for habit.

A related issue is that the demonstration of an habitual tendency using a laboratory paradigm for one type of goal does not necessarily indicate that habits dominate over all forms of goal-directed behaviour. A more conservative interpretation is that the number of goals to which the individual’s behavior is directed have been narrowed or restricted; in fact, this may be characteristic of some forms of compulsive behavior, as in addiction, where there may still be one overarching goal, or this could also be a stage in addiction before even drug-related behavior becomes habitual. This may also be true for appetitive versus aversive ‘goals’; it may be possible for avoidance behavior to be goal-directed in an individual, whereas their appetitive behavior is habitual. This has been observed to occur for example after dietary tryptophan depletion in humans, which results in transient reductions in 5-HT function. The precise balance between goal-directed behavior and habitual tendency probably has to be measured for each goal.

Other ways of distinguishing habits depend on those factors during training that promote the tendency to rely on stimulus–response rather than outcome associations to guide behavior.

Extended training, practice and schedules of reinforcement

One of the earliest demonstrations of habit learning by Adams showed that the duration of training significantly determined whether rats exhibited a devaluation of outcome effect (produced by lithium chloride poisoning) in a food-reinforced operant lever-pressing task. Longer training produced less of a devaluation effect and hence, by inference, greater expression of habit learning. This and other studies have assumed that action–outcome and habit learning proceed more or less in parallel, but that habit learning eventually comes to dominate behavioral expression. That is not to suggest that goal-directed behavior necessarily is completely absent, as a number of experiments using lesion or optogenetic interventions have shown; a rat expressing habitual tendencies can be rendered goal-directed, for example, by suppressing the output of the infralimbic cortex, or by lesioning the dorsolateral striatum after training.

It has proven considerably more difficult, however, to demonstrate this necessity for over-training in human volunteers. A recent, as yet unpublished study by Gillan, de Wit and others (Claire M. Gillan and Sanne de Wit, personal communication) has shown in five separate experiments, entailing three different learning procedures, including avoidance learning and the Fabulous Fruit Game, that extended training does not significantly enhance habit. This study includes two attempted replications of the report by Tricomi et al. (2009) that extended training of an operant appetitive response led to habit formation in humans. In the Tricomi et al. study, however, participants were tested in a magnetic resonance imaging setting that may have been stressful (stress appearing to tip the balance towards more autonomous behavior and habit learning in animals and humans). Moreover, in a shock avoidance study by Gillan et al., whilst extended training in healthy human volunteers did not lead to habit learning, this did appear to be the case in patients with obsessive-compulsive disorder, possibly because of their greater state of stress.

One potential implication could be that the human paradigms are not measuring habits, whereas the rodent ones are. Possible reasons for the discrepancy are not obvious. They seem unlikely to rest simply on the fact that the rat paradigm utilizes only one instrumental response and therefore involves hypothetically less decision-making or choice processing that is antithetical to habit learning. The experimental findings on habits also appear against the general principle that ‘practice makes perfect’ in skill learning, as supported by a plethora of evidence.

Another consideration is that it is the nature, rather than the extent, of training that underpins habit learning. Dickinson and colleagues addressed this issue by examining the patterning of reinforcing feedback during learning with respect to behavior, according to schedules of reinforcement. They discovered that schedules in which rewards were presented on the basis of their relationship of responding to elapsed time —a fixed interval schedule whereby the first response after some period such as one minute or a variable or random interval schedule where the reward is unpredictable but averages, say, availability after one minute — were more prone to lead to habitual responding than schedules where the sheer number of responses made was the deciding factor for reinforcement (fixed or random ratio schedules). This is a commonly used experimental maneuver for generating behavior characterized by habits or goal-directed behavior, as assessed using devaluation. Typically, in these schedules, the number of reinforcers obtained, and hence the amount of training, are matched.

Even so, which factor(s) distinguishing these schedules is responsible is not totally clear. One possibility is the ‘strength of conditioning’ in terms of matching rates of responding to rates of outcomes. Another might relate to the well-known advantage for skill learning of spaced over massed practice; spaced practice possibly being superior for habit learning. Yet another potential reason is the certainty versus uncertainty of the relation between actions and outcomes under different schedules of reinforcement. Interval schedules matched for the average spacing of responding produce rather different biases towards habits depending on the predictability of the outcome by the response.

Habits, routines and skills

Although habits have often been measured as relatively simple responses such as operant lever pressing, it is also clear that long sequences of behavior can be regarded as habitual — for example, taking the customary route to work when one should be going to the airport. This leads us to the issue of how individual responses are ‘chunked’ together to form long sequences of actions — an important issue, not only for learning goal-directed sequences to obtain a specific goal (for example to get to the airport) but also for skills such as playing the piano. To what extent are such sequences or routines a blend of ‘goal-directed’, ‘habitual’ and ‘skilled’ learning? One hypothesis during training is that goal-directed responses are more likely to predominate at the beginning of the sequence, whereas habitual and skilled responses (for example, consummatory behavior) may perhaps come to terminate it.

In sequences of actions, or routines, it can be the performance of one action or response that facilitates the next (presumably via proprioceptive or kinaesthetic feedback in lieu of environmental stimuli), and not necessarily the future outcomes. The possibility then arises that this feedback itself acquires reinforcing properties by being associated with the original goal; the behavior then almost becomes an end in itself (Figure 3).

Both habits and skills are supposed to require extensive training, so that length of training per se need not relate solely to habits. Skills may take even more training than habits, but this does not mean that skills are simply ‘polished’ habits. Skills, of course, generally require the coordination and optimal performance of many responses in sequence. How are such sequences learned and performed? Studies from the Graybiel and Costa laboratories have described how behavioral units are clustered or chunked together into integrated behavioral sequences that are coded by striatal neuronal activity. Lashley originally pointed out that the problem posed by such serial order of behavior was that eventually the behavioral sequence is performed too rapidly to allow each response element in the sequence to be cued by the feedback of the preceding one, thus necessitating a central control or motor program for the sequence as a whole. The implications of such sequential habitual behavior are difficult for a simple reinforcement learning or ‘model free’ learning system to encompass. A motor sequence only receives reinforcement at its completion, although reinforcement learning presumably still operates on the basis of temporal credit assignment.

Ultimately, however, the sensory triggers or context for each element in the sequence, including its associated conditioned reinforcers, proprioceptive and kinaesthetic feedback, are incorporated with training into the motor control program. If the existence of chunking and the organization of different motor sequences so characteristic of skills is consistent with the general notion of hierarchical control in the motor system, it is also consistent with the more general case of executive control over behavior. This theory thus determines how behavioral output is selected among different motor programs and novel goal-directed sequences in everyday life. Useful advances have recently been made in this area, which may also have implications for behavioral interventional treatments for habit-dominated behavior, as occurs in compulsions linked to food, drugs or harms.

A relatively recent development has been to consider the role of habits in everyday life via self-reported questionnaires. Recently, Ersche et al. (2017) have produced a 27 item ‘Creature of Habit’ on-line questionnaire to quantify individual differences in habitual tendencies. Factor analysis revealed two separate aspects: automaticity, which captures some of the autonomous nature of habitual behavior described above; and routine, which refers to the regularity in scheduling of the behavior, such as always brushing your teeth before going to bed (Figure 2). Routines have previously been characterised in the context of a cognitive supervisory attentional system, the ‘attention to action’ concept of Shallice and Norman, in which schemas are subject to a contention scheduling process which again captures well the inherently hierarchical nature of motor control. Thus, a routine represents some of what is normally meant by habit but on the other hand can also be a component of an over-arching sequence of goal-directed behavior.



Figure 2. The Creature of Habit Questionnaire reveals habits in everyday life.

The nodes represent the individual items of the questionnaire. The thickness of lines connecting nodes is proportional to the size of corresponding correlation. The colors represent groupings of items of a subscale, as identified by Mokken Scale Analysis. Examples of routines and automaticities are given (adapted from Ersche et al. 2017and reproduced by permission of author and publisher).

An important aim for future research in this area will be to characterize the executive processes achieving this optimal control of behavioral output. So it is conceivable in this notion of hierarchical control that the initiation of a sequence or chunk be goal-directed (driving the implementation of the whole sequence), but the execution of the sequence once it starts is effected by habit — one element eliciting the next element even if the goal changes (Figure 3).



Figure 3. Sequences can become habits or routines.

(A) Example of a situation where a sequence of actions leads to the ultimate outcome. Upon sight of a dirty hand (stimulus), the faucet is turned on, the hands are washed and then dried, resulting in clean hands. (B) The sequence of actions can become automatized, with one action eliciting the next, and the entire sequence can become a response to the stimulus (represented hierarchically or not). This automatization makes it more difficult to stop the sequence in the middle, or to adapt to circumstances in which one of the elements in the sequence is not necessary — for example, if the faucet has a motion sensor. (C) The routine of washing hands can become reinforcing and a goal in itself. In this case, the completion of the routine — that is, achieving the last step in the sequence — may reinforce or drive the entire response.

Implications for neural systems of goal-directed and habitual control

There is a wealth of data both from animals and humans relevant to the neural substrates of goal-directed behavior and habits, which we will not be able to review in detail here. Many of the original discoveries related to the existence of the neural substrates of two relatively independent, albeit interacting, systems for goal-directed behavior and stimulus-response habits which depend on distinct cortical-striatal ‘loops’, the evidence for which came mainly from lesion studies in experimental animals and functional imaging studies in humans. Most evidence points to the notion that associative cortical-striatal ‘loops’ and their modulatory inputs are important for goal-directed behavior, while sensorimotor loops and their modulatory inputs are critical for habit formation.

A promising new direction may be to define distinctive neurochemical substrates, such as. endocannabinoids, and mechanisms, such as long-term potentiation and long-term depression, in specific projection circuits, for these different types of learning and control. This may have important therapeutic implications for those disorders impinging on the goal-directed/habit learning dichotomy, as well as for rehabilitation.

Particular inputs to these loops, including from dopamine, the amygdala, dopamine, and infralimbic cortex, are known to play important roles in the control of this cortico-striatal circuitry. Neural and behavioral evidence has shown the notion that goal-directed behavior ‘disappears’ with training into habitual control to be almost certainly incorrect following manipulations of the infralimbic cortex, or lesions of dorsolateral after habitual control. Skill learning implicates also some of the same sensorimotor cortical-striatal ‘loops’ that underlie habit formation. But these two aspects can be dissociated as mechanisms that are implicated in skill learning are not necessarily critical for habitual learning. Furthermore, skill learning traditionally implicates other systems, such as cortico-cerebellar circuits. The processes of skill learning and control have not been resolved with respect to the goal-directed/habit dichotomy, although being consciously goal-directed is well-known to have disastrous effects on skilled performance.

Future perspectives

The evidence for relatively independent, competitive neural systems controlling goal-directed behavior and habits may eventually have to be re-appraised in the light of the new findings and ideas described here and elsewhere. Thus, for example, completely separate neural system representations of ‘model based’ and ‘model-free’ learning have not thus far been clearly revealed by human fMRI studies. Some of the main issues for future research on habits will be how executive control is devolved among structures during behavior and how flexible (or plastic) top-down control can avoid competition between the goal-directed and habit systems, to promote their optimal cooperation and integration in determining successful behavioral outputs.


Taken from: https://www.sciencedirect.com/science/article/pii/S0960982217312587#fig1

Modifié le: mercredi 21 juin 2023, 13:05