Evolution of Attention in Psychology

Attention in Psychology: I Historical Background • Attention was one of the first concepts to appear in Psychology texts (ca 1730) – e.g., Ebbinghaus, Titchener, … • Early discussions (Hatfield, 1998) focused on properties such as • Narrowing (Aristotle, 4th century BC) • Active Directing (Lucretius, 1st century AD) • Involuntary shifts (Hippo, 400 AD) • Clarity (Buridan, 14th century) • Fixation over time (Descartes, 17th century) • Effectorsensitivity (Descartes) • All the above phenomena (William James, early 1900s) • More recent studies have been concerned with • The view of attention as selection • The analysis of attention as a process of resource allocation • The study of the relation between voluntary and involuntary control of attention

 Attention as Selection We will concentrate on the Selection or Filtering aspects of attention. We will ask: • Why do we need to select anyway? • Because our processing capacity is limited? • The Big Question: In what way is it limited? (Miller, 1957) • We will return to this core question after some preliminaries on the early study of attention as selection and the filter theory. • On what basis do we select? Some alternatives: • We select according to what is important to us (e.g., affordances) • We select what can be described physically (i.e., “channels”) • We select based on what can be encoded without accessing LTM • We “pick out” things to which we subsequently attach concepts: i.e., we pick out objects (or regions?) • What happens to what we have not selected? A largely unsolved mystery (though in some cases there are plausible answers).

 Big Question #1: Why do we need to select information? Along which dimensions is human information processing capacity limited? • Channel capacity: Shannon-Hartley Theorem • Capacity measured in some sort of “chunks” (Miller) • Capacity measured in terms of the number of arguments that can be simultaneously bound to cognitive routines (Newell) • To what things in the world can the arguments of visual predicates be bound?

Early studies: Colin Cherry’s “Cocktail Party Problem” • What determines how well you can select one conversation among several? Why are we so good at it? • The more controlled version of this study used dichotic presentations – one “channel” per ear. • Cherry found that when attention is fully occupied in selecting information from one ear (through use of the “shadowing” task), almost nothing is noticed in the “rejected” ear (only if it was not speech). • More careful observations shows this was not quite true • Change in spectral properties (pitch) is noticed • You are likely to notice your name spoken • Even meaning is extracted, as shown by involuntary ear switching and disambiguating effect of rejected channel content

Visual analogues illustrating the two-channel selection problem In these examples you are to read only the text in shadows and ignore the rest. Read as quickly as you can and when you are finished, close your eyes or look away from the text.

Visual analogue #1 illustrating the two-channel selection problem In performing an experiment like this one on man attention car it house is boy critically hat important she that candy the old material horse that tree is pen being phone read cow by book the hot subject tape for pin the stand relevant view task sky be read cohesive man and car gramatically house complete boy but hat without shoe either candy being horse so tree easy pen that phone full cow attention book is hot not tape required pin in stand order view to sky read red it nor too difficult.

Visual analogue #2 illustrating the two-channel selection problem It is important that the subject man be car pushed slightly boy beyond hat his normal limits horse of tree competence pen for be only in phone this cow way book can hot one tape be pin certain stand that snaps he with is his paying teeth attention in to the the empty relevant air task and hat minimal shoe attention candy to horse the tree second or peripheral task.

Broadbent’s Filter Theory Rehearsal loop Effectors Motor planner Senses Filter Limited Capacity Channel Very Short Term Store Store of conditional probabilities of past events (in LTM) Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press.

Problems with the Filter Theory • The filter “leaks.” Work by Treisman, Lackner, and many others shows that the filter could not be eliminating parts of the input using a physically-defined channel, because the properties on the basis of which the input is filtered require a high level of processing (e.g., determination of meaning). Consequently such information must have to have gotten through the filter! • Many solutions to this conundrum have been proposed, ranging from replacing the filter with an attenuator, to various complex (and highly incomplete) proposals such as those of Deutsch & Deutsch, (1963) and Norman (1968), Morton (1969) and Neisser(1967), none of which are satisfactory, but each of which embodies some ideas that may be part of the story. • What all these alternatives do is assume that the filter is responsive to top-down expectancy and prediction effects. But the evidence is against this sort of knowledge-based selection as a general property of perception (Pylyshyn, 1999), although it is possible within such modular domains as language processing.

Stroop EffectBaseline: Name the colors of the ink 

Stroop Effect in PortugueseName the colors of the ink VERMELHOVERDEAZULMARROMROSAALARANJADOVERDEROSAVERMELHOAMARELOVERDEAMARELOVERMELHOMARROMVERMELHOAZULMARROMVERDEVERMELHOALARANJADOVERMELHOAZULAMARELOROSAALARANJADO VERDEAZULMARROMROSAVERMELHOAMARELOVERDEAMARELOVERMELHOMARROMROSAVERMELHOAMARELOVERDEAMARELOVERMELHOROSAALARANJADOVERDEAZUL MARROMROSAVERMELHOAMARELOVERDEAMARELOVERMELHOBROWNVERMELHO AZULMARROM VERDEAMARELOVERDEAMARELOVERMELHOROSAALARANJADOVERDEVERMELHOAZULMARROMVERDEVERMELHOALARANJADOVERMELHOAZUL

Stroop Effect in English Name the colors of the ink REDGREENBLUEPINKBROWNORANGEGREENPINKREDYELLOWGREENYELLOWREDBROWNREDBLUEBROWNGREENREDORANGEREDBLUEYELLOWPINKORANGE GREENBLUEBROWNPINKREDYELLOWGREENYELLOWREDBROWNPINKREDYELLOWGREENYELLOWREDPINKORANGEGREENBLUE BROWNPINKREDYELLOWGREENYELLOWREDBROWNRED BLUEGREENBROWN YELLOWGREENYELLOWREDPINKORANGEGREENREDBLUEBROWNGREENREDORANGEREDBLUEYELLOWYELLOWGREENYELLOWREDBROWNPINKREDYELLOWGREENPINKREDYELLOW

Degree of Interference of the attended message, as well as its interpretation, shows that the rejected message was understood • Moral: Although the rejected channel appears to be rejected, it is being processed enough to understand the words! • The semantic interpretation of attended message depends on the meaning content of the rejected message. Subjects were asked to paraphrase the attended message in: • Channel 1 (attended): “I think I will go down to the bank but I will be back for dinner” • Channel 2 (rejected): “The election results will depend on the value of the dollar against the Euro and on the state of the domestic economy” • OR Channel 2 (rejected): “The rain has resulted in erosion by the overflowing river”Lackner, J. R., & Garrett, M. F. (1972). Resolving ambiguity: Effects of biasing context in the unattended ear. Cognition, 1, 359-372.

Amount of information in terms of the Information-theoretic measure (entropy) • Amount of information in a signal depends on how much one’s estimate of the probability of events is changed by the signal.H = -pi Log2 (pi) … information in bits • “One of by land, two if by sea” contains one bit of information if the two possibilities were equally likely, less if they were not (e.g., if one was twice as likely as the other the information in the message would be ⅓ Log ⅓ + ⅔ Log ⅔ = 0.92 bits) • The amount of information transmitted depends on the potential amount of information in the message and the amount of correlation between message sent and message received. So information transmitted is a type of correlation measure. • The information measure assumes an “ideal receiver”. It is the maximum information that could be transmitted, given the statistical properties of messages, assuming that the sender and receiver know the code. This maximum depends on physical properties of the channel – its Channel Capacity.

Information transmitted in a typical absolute judgment experiment • Information transmitted in an experiment in which subjects were presented with tones drawn from a known practiced set (of a given size, which determines the value of input information) and had to name the tones from a learned name set. • The information transmitted was always around 2.5 bits or an average of 6.25 equiprobable alternatives!

The channel capacity hypothesis implies that the amount of information retained in STM is constant and independent of the type of items • But it turns out that much more information is retained when the items are drawn from a larger set (e.g., more information can be retained when the input is numerals rather then than binary digits, more for letters, more for words, etc).

Why can we retain vastly different amounts of information just by using a different encoding vocabulary? • Answer: The architecture of the cognitive system has the property that it can deal with a fixed maximum number of items, regardless of what the items are. • This property can be exploited to get around the bottleneck of the short-term memory. We do this by recoding the input into a smaller number of discrete units, called chunks. • There is also evidence that it takes additional time to encode and decode chunks, so the recoding technique is a case of time-capacity tradeoff or what is known in CS as a compute-vs-store tradeoff. • Newell has a model of the time taken in the Sternberg memory scan experiment that attributes the observed RT to encoding or chunking.

Example of the use of chunking • To recall a string of binary bits – e.g., 00101110101110110101001 • People can recall a string of about 8 binary integers. If they learn a binary encoding rule (000, 011, 102, 113) they can recall about 8 such chunks or 18 binary bits. If they learn a 3:1 chunking rule (called the Octal number system) they can recall a 24 bit string, etc

Does the evidence support this idea? Memory span can be greatly increased through chunking! Yet chunking has also been used to explain things it cannot explain. It is only explanatory if you have an account of how chunking occurs and what rules in LTM are being used (and what counts as a chunk).

What does visual attention select? (What are the bases for selection?) • If attention is selection, what does visual attention select? • An obvious answer is places. We can select places by moving our eyesso our gaze lands on different places. • When places are selected, are they selected automatically? • Must we always move our eyes to change what we attend to? • Studies of Covert Attention-Movement: Posner (1980). • How does attention switch from one place to another? • Is it always the case that we attend to places? Can we attend to any other property? Can we select on the basis of color, depth, spatial frequency, affordances, or the property a painting has of having been painted by Da Vinci (A property to which Bernard Berenson was able to attend extremely well). cf Gibson

 How else can visual attention select? • Can we control the size and shape of the region that is selected, or is selection always punctate and data-driven? • Zoom Lens model of spatial attention (Eriksen & St James, 1986). • What controls where attention moves: • Is this automatic or voluntary? • How do we know where to direct our attention? How do we specify a location prior to attending to it? • We need a way to specify where or what prior to attending to it! • Keep this conundrum in mind – we will return to it later! • How narrowly can we focus our attention? Can we make it pick out one out of several objects? • Are there special conditions under which we are able to pick out individual things? We will return to “attentional resolution” or the minimum spacing for selecting individual things.

Covert movements of attention Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

Exogenous vs endogenous control of attention • In the Posner paradigm illustrated in the last slide, attention was automatically grabbed by the onset of a spot (exogenous attention allocation). Other experiments showed that this could be done under voluntary (endogenous) control – e.g., by providing an arrow at fixation indicating what direction to move attention. • Posner, Tsal and others showed that when attention goes from A to B, intermediate locations are maximally sensitive to detecting a signal at intermediate times. • Both exogenous and endogenous control produces movement of attention, but they differ in some of their effects. • Endogenously moved attention does not lead to Inhibition of Return (we will turn to this next) • Endogenous controlled movement does not appear to affect detection sensitivity, but it does affect discrimination • Endogenous controlled effects are stronger and appear earlier • Although the evidence suggests a continuously moving “spotlight” of attention, there are other models that claim that this is a side-effect of an attentional activation that fades at the starting place and grows at the target place, creating an overlap in intermediate locations (Sperling).

We can select a shape even when it is intertwined among other similar shapes Are the green items the same? On a surprise test at the end, subjects were not able to recognize recall shapes that had been present but had not appeared in green.

The time-course of attention:Inhibition of return • If we vary the time between the cue and target in a modified Posner paradigm, we find that when the Cue-Target-Onset-Asynchrony (CTOA) gets to around 300-900 ms, reaction time to the target begins to increase. This is called Inhibition-of-return (Klein, 2000). • To get this effect we actually have to attract attention to the target location and then attract it back to the origin. IOR is one of many examples of an inhibition effect being produced by attention.

Other examples of attentionally induced inhibition • Negative Priming (Treisman & DeShepper, 1996). • Is there a figure on the right that is the same as the figure on the left? • When the figure on the left is one that had appeared as an ignored figure on the right, RT is long and accuracy poor. • This “negative priming” effect persisted over 200 intervening trials and lasted for a month!

Another negative attention effect: Inattentional Blindness

Inattentional Blindness • The background task is to report which of two arms of the + is longer. One critical trial per subject, after about 3,4 background trials. Another “critical” trial presented as a divided attention control. • 25% of subjects failed to see the square when it was presented in the parafovea (2° from fixation). • But 65% failed to see it when it was at fixation! • When the background task cross was made 10% as large, Inattentional Blindness increased from 25% to 66%. • It is not known whether this IB is due to concentration of attention at the primary task, or whether there is inhibition of outside regions.

In what other ways might our information capacity be limited? • We have limitations on the input side that depend on the acuity of the sensors and the range of physical properties to which they respond. • But there is a limitation beyond that of acuity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. The capacity to individuate is different from the capacity to discriminate. • This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision which we will explore in the next lecture • First some reason for thinking that individuating is a distinct process

Individuating is different from discriminating

Individuating as a distinct process • Individuating has its own psychometric function: The minimum distance for individuating is much larger than for discriminating. • It may be that in vision our attention is limited in the number of things we can individuate and simultaneously access (more on this later). But how do you determine what counts as a “thing”? See next lecture. • Individuating is a prerequisite for recognition of patterns and other properties defined among a number of individual parts • An example of how we can easily detect patterns if they are defined over a small enough number of parts is in subitizing • Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is distinct from their perceptual abilities but is limited in its capacity

Pick out 3 dots and keep track of them • You can follow instructions to “move one up” or Move 2 right” etc so long as at no time do you have to hold on to more than 4 dots • You can pick out 4 dots and then search through those 4 locations if all dots change to search items (Burkell & Pylyshyn, 1997) • You can count up to 4 dots without error (Trick & Pylyshyn, 1994) • You can keep track of 4 dots through saccades (Irwin, 1996) • You can detect such basic patterns as inside(dot, contour), Collinear(x1,x2,x3,x4), or Online(dot, contour) so long as there are a small number of the relevant arguments to hold on to at one time.

Next: Objects and Attention

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments • The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first.

When items cannot be individuated, predicates over them cannot be evaluatedDo these figures contain one or two distinct curves? Individuating these curves requires a “curve tracing” operation, so Number_of_curves(C1, C2, …) takes time proportional to the length of the shortest curve.

The figure on the left is one continuous curve, the one on the right is two distinct curves – as shown in color.

Another example: Subitizing vs Counting. How many squares are there? Subitizing is fast, accurate and only slightly dependent on how many items there are. Only the squares on the right can be subitized. Concentric squares cannot be subitized because individuating them requires curve tracing, just as it did in the spiral example.

Signature subitizing phenomena only appear when objects are automatically individuated and indexed Counting slope subitizing slope Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

Example of subitizing popout and non-popout features(Count Pink vs. Count Online)

What is attention is for?Treisman’s Attention as Glue Hypothesis • The purpose of visual attention is to Bind properties together in order to recognize objects

How are conjunctions of features detected? Read the vertical line of digits in the following display Under these conditions Conjunction Errors are very frequent

Rapid visual search (Treisman) Find the following simple figure in the next slide:

Rapid visual search (conjunction) Find the following simple figure in the next slide:

Find the unique item in this slide

Serial vs parallel search? • Finding an object that differs from all others in a scene by a single feature – called a single-feature search – is fast, error-free and almost independent of how many nontargets there are; • Finding an object that differs from all others by a conjunction of two or more features (and that shares at least one feature with each object in the scene) – called a conjunction search – is usually slow, error-prone, and is worse the more nontargets there are in the scene*. • These results suggest that in order to find a conjunction, which requires solving the binding problem, attention has to be scanned serially to all objects. * This way of putting is simplifies things. Under certain conditions the serial-parallel distinction breaks down

Evolution of Attention in Psychology