What is special about representation of space in perception and thought?

Spatial representation in the mind/brain:Do we need a global topographical map?Zenon Pylyshyn Rutgers Center for Cognitive Science and Institute Jean Nicod What is special about representation of space in perception and thought? Do we need a single global spatial representation? Do we need a topographical display in the brain? Workshop on Frames of Reference Paris, November 17-19, 2005

What is special about spatial representation? • I have suggested (Pylyshyn, 1973) that no convincing reason has been given why a form of representation adequate for general knowledge (i.e., a Language of Thought) cannot also serve for encoding the content of spatial representations • The difference between representing spatial relations and representing other contents may lie in their being different topics requiring a different conceptual vocabulary, but they may not require a different formator medium of representation. Why can’t spatial content be encoded in a first-order calculus with using Cartesian coordinates? • Is it just that it conflicts with our conscious experience? • The problem with the general-LOT proposal is that it fails to account for certain psychophysical phenomena that are observed when vision and spatial reasoning are actively engaged in solving problems or in planning actions – i.e., when spatial representations are constructed in working memory.

Spatial representation during perception and reasoning • The impression that spatial representations are different from other kinds of representations is usually associated with examples from perception and spatial reasoning. In these contexts, as opposed to long-term-memory storage, there is reason to think that such representations are different in several ways • I have suggested several such differences (Pylyshyn, 1978) – e.g., • Working memory contents typically involve relationships among tokens and contains no quantifiersornegation, e.g., (x)F(x) is represented by a finite set of x’s, each of which has property F(x)(i.e., all circles are red is represented by a set of circles each of which is red) • In the present talk I will focus on another way that such representations are special – in the way they encode space. Because these representations are not tied to vision, and do not even require a visual cortex or be accompanied by conscious experience, they are best referred to as spatial representations rather than mental images

What are some constraints on a theory of spatial representation? • First I will attempt to tease out some functional requirements that may apply to a system for representing space and spatial relations in perception and especially in spatial reasoning • These requirements may explain why people often assume that there is a unified global frame of reference for vision and spatial reasoning that is implemented as a spatial display in the brain. • These requirements also serve to introduce an alternative proposal that meets the conditions without assuming a global spatial display

Some conditions on a system of codes for representing spatial relations (1) • The system must be able to representmagnitudes • Psychophysical evidence shows that we have encodings of magnitudes (at least relative magnitudes) and that the magnitudes that are encoded (i.e., the semantics of the codes) have a particular systematic effect in reasoning (e.g., scalar variance, Fechner’s law, symbolic distance effect, etc). • This suggests that the codes themselves must have properties that explain these systematic magnitude effects (which would not be the case if the magnitudes were encoded as numerals)

Some conditions on a system of codes for representing spatial relations (2) • The system must represent stable spatialconfigurations • Spatial configurations involve relations over multiple objects – in that sense they are holistic and require simultaneous access to multiple objects (multiple arguments in relational predicates must be simultaneously bound) • What is special about such configurations is that they may allow some spatial ‘inferences’ by pattern lookup without reference to independent geometrical axioms (see “Using space to represent spatial properties” later).

Some conditions on a system of codes for representing spatial relations (3) • The system must somehow ‘capture’ thecontinuity and connectedness of space. This leaves many unanswered questions: • Does continuity entail that empty places are represented as such? • Does continuity entail that the representational system itself determines that distances meet metrical axioms (e.g., the triangle inequality AB + BC ≥ AC) or that they are Euclidean? • Does continuity entail that the representation of movements of objects is constrained so that in getting from A to B objects must pass through ‘intermediate’ locations? • The proposal I will present later gives a partial answer to these

Some conditions on a system of codes for representing spatial relations (4) • The system must represent spatial properties across modalities, including proprioception and the motor system • It must be possible for a pattern such as SQUARE(w,x,y,z) to involve objects in different modalities • Spatial representations must be able to engage the motor system in a fairly direct manner • One of the characteristics of what we call a “spatial representation’ is that we can ‘point to’ represented things (e.g., in our mental image). • But note that motor actions towards perceptual and imagined representations are not identical because they engage different perceptual-motor systems (Goodale et al. 1994)

Some conditions on a system of codes for representing spatial relations (5) • The system must be able to represent spatial relations in 3D • When relations in the depth are encoded, they must be in a similar format to the encoding of relations in the plane since the two have to operate together • Experimental evidence from such mental imagery phenomena as ‘mental rotation’ or ‘mental scanning’ show identical functions in depth as in the plane

Summary of constraints to be met: A system of spatial representations must somehow do the following: • It must represent magnitudes • It must represent holistic configurationswhich enable at least some direct one-step inferences (by pattern-matching) • It must capture connectedness and continuity • It must represent spatial relations seamlessly across modalities and to engage the motor system • It must represent distances in depth as well as in the plane in a uniform manner (i.e., it must represent 3D) • I will return to these constraints when I discuss a different proposal for how we ‘represent’ space

Two additional common assumptions about spatial representation The foregoing list of constraints has frequently led people to make two assumptions about spatial representation that I will argue are not justified: • The single frame of reference assumption is the assumption that when we represent spatial layouts in perception or in thought we do so in a single global frame of reference, as opposed to a patchwork of distinct but coordinated frames • Our conscious awareness of spatial layouts suggests a single frame of reference, but like a lot of properties of conscious awareness this may be illusory • The holism/stability assumption is the assumption that when we represent spatial layouts in perception or thought the representation simultaneously contains a large number of objects and properties in a stable spatial configuration

Why an inner display for vision? In vision the spatial-display theory was meant to explain why our visual experience is panoramic and stable even though the visual inputs are highly local, partial and constantly changing But many studies have shown that there is no such rich stable panoramic display (e.g., change blindness, superposition, etc., see O’Regan, 1992)

Why an inner display for spatial reasoning? The spatial-display theory was also meant to explain how a mental representation can meet the spatial conditions listed earlier – by creating a 2D image in a real spatial medium Such a display was assumed to use the same global 2-D spatial medium that is used in vision. But both display assumptions have serious problems.

The global spatial display assumption • There are many deep problems with the assumption that spatial properties are represented in vision and reasoning by an inner spatial display which corresponds to our experience of a stable world (perceived or imagined), many of which I have discussed in connection with the ‘picture theory’ of mental imagery (Behavioral and Brain Sciences, 2002) • One of the main problems relevant to the present discussion is the assumption that visual perceptual, cross-modal spatial integration, visuomotor control, and spatial reasoning derive from a single representation in an allocentric reference frame • There are many reasons to doubt that there is a unified global frame of reference for representing spatial information

Reasons to reject the Master Map assumption • There are many known frames of reference between perception and motor control, relying on both external and internal sensors • While gaze-centered coordinates are common in motor control they are gain-modulated by inputs from eye, head and body positions as well as by motor intentions (Anderson & Buneo, 2002, Duhamel, ‘92) • Visual information is also represented in hand- and body-centered frames of reference (Làdavas, 2002) • The neglect syndrome appears in many different frames of reference • Motor control necessarily involves many different frames of reference, including joint-angle, proprioceptive, kinesthetic, and even frames that depend on groups of spindle bundles • Earlier (downstream) frames of reference are often not overwritten but may continue to have observable consequences in perceptual-motor coordination and in errors in kinesthetically-guided motion (Baud-Bovy & Viviani, 1998) so multiple frames continue to exist in the nervous system

A different way of approaching the question of spatial representation • Based on such problems with the global spatial display assumption, I have proposed a provisional hypothesis that preserves some of the advantages of the global spatial display, but assumes that the relevant spatial properties are in the perceived worldand can be accessed if we have the right access mechanisms for selecting and indexing objects in the perceived world • For ease of reference let us call this the Projection Hypothesis because it is as though the spatial display were projected onto the real space we perceive (though with only objects’ identities and locations, and none of their other visual properties)

The projectionhypothesis • The projection hypothesis relies on the spatial properties of the concurrently perceived world to meet the 5 conditions outlined earlier. It rests on two theoretical postulates: • We have a system of “pointers” (such as the FINST perceptual index mechanism to be described later) by which a small number of perceived objects in the world can be selected and indexed. Indexes provide a fixed reference to their targets despite changes in targets’ locations • When we perceive a scene that contains indexed objects, our perceptual system is able to treat those selected objects as though they were assigned unique labels. Thus our perceptual system is able to detect novel configurational properties among these indexed objects.

Aside on FINSTs indexes • Because FINST Indexes play a central role in this story I will make a short detour to illustrate this mechanism and to give some examples of indexes at work

Pick out 3 dots I will cue and keep track of them • After you pick out the 3 cued dots, I’ll ask you move your attention from the center one. Describe the new relation among the three dots. • In a field of identical elements you can select several of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 4 dots (Intriligator & Cavanagh, 2001)

In making relational judgments you must select and keep track of several objects at once When we judge that certain objects are collinear, we must first pick out the relevant objects while ignoring all their properties except their location Such picking out and referringare the basic functions of FINST Indexes

Several objects must be picked out at once in making relational judgments • In making relational judgments such as inside or on-the-same-contour you must pick out the relevant individual objects first. Are dots Inside-same-contour? On-same-contour?

Other experimental demonstrations of FINST indexes • Recognizing the cardinality of small sets of things: Subitizing vs counting (Trick, 1994) • Searching through subsets – selecting items to search through (Burkell, 1997) • Selecting subsets and maintaining the selection during a saccade (Currie, 2002) • Multiple Object Tracking (MOT)

Subset selection for search Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision, 11(2), 225-258.

Subset search results: • Only properties of the subset matter • If the subset is a single-feature search it is fast and the slope (RT vs number of items) is shallow • If the subset is a conjunction search set, it takes longer and is more sensitive to the set size • The distance between targets does not matter, so observers don’t seem to be scanning the display looking for the target but can switch their attention directly to the subset items

Selective search is also found when a saccade occurs between the late onset cues and start of search Even with a saccade between selection and access, items can be accessed efficiently

Demonstrating the function of FINSTs withMultiple Object Tracking (MOT) • In a typical MOT experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off. • After these 4 targets are briefly identified, all objects resume their identical appearance and move randomly. The observers’ task is to keep track of the ones that had been designated as targets at the start • After a period of 5-10 seconds the motion stops and observers must indicate, using a mouse, which objects are the targets

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

Keep track of the objects that flash

Our explanation is that FINST indexes are bound to targets when they flash and remain bound during the duration of the trial. At the end of the trial they allow attention to be moved to each target to select the targets

FINST indexes allow selected objects to be accessed directly and without searching for specific properties:Indexes stay bound to objects as the objects move

If you were like the cartoon character Plastic Man and could place your fingers on things in the world so as to refer to them uniquely, and if you could then move your gaze or attention to them, you would possess FINgers of INSTantiation (FINSTs)!

End of aside on FINSTs! Summary • The FINST mechanism provides a limited set of indexical pointers bound to perceived objects • FINSTs can associate perceived objects with objects of thought • The binding is stable over some period of time (e.g., a few seconds) and continues despite motion of the objects or eye movements. • Perception is able to treat the indexed objects as though they were perceptually marked

Examples of the projection hypothesis • To illustrate how the projection hypothesis works, first consider index-based projection in the visual modality, where indexes can convert some apparently mental-space phenomena into perceived-space phenomena (although I will return to the non-visual case shortly, the visual case is more salient and tends to dominate other modalities) • Examples from some ‘mental imagery” experiments • Mental scanning (Kosslyn, 1973) • Mental image superposition (Podgorny& Shepard, 1978) • Visual-motor adaptation (Finke, 1979) • S-R compatibility to imagined locations (Tlauka, 1998)

Time to “see” feature on image Distance on image Studies of mental scanningOften cited to suggest that representations have metrical properties

Brain image or index-based projection? • A way to do this task: • Associate places on the imagined map with places in the world that you perceive • Move your attention or gaze from one place to another as they are named

Using a perceived room to anchor FINSTs tagged with map labels

Using vision with selected ‘labeled’ objects • If you ‘project’ the pattern of map places by picking out objects in the room in front of you that correspond roughly to these memorized locations, then you can scan attention from one such marked object to another. The space here is real and the equation time = distance  speedis a physical principle, not tacit knowledge about the world. • You can also use the tagged objects to infer configurational properties you may not have noticed, despite somehow memorizing the location of all objects • Which 3 or more places on the map are collinear? • Which place on the map is furthest North, South, East, West? • Which 3 places form an isosceles triangle? • Such configurational consequence can be detected as opposed to logically inferred, so long as they involve only a few places, because the visual system can examine a scene with labeled indexed objects

Another example of a result attributable to FINST-based projection: Podgorny-Shepard experiment Remember the following pattern and imagine it after it is gone Are the following dots on or off the imagined pattern?

The pattern of reaction times is the same for perceived shapes as for recalled shapes • Both when the F display is seen and when the F is imagined, the time to judge that the dot was on the F was fastest when the dot was at the vertex of the F and slower when it was on an arm of the F (slowest when it was one square away). • Does this show that the F and dots are superimposed on a display in the brain and perceived with the visual system? • A more plausible explanation is that the cells corresponding to rows and columns of the F in the matrix are indexed and thus made distinct, allowing vision to be used to judge whether the dots fall on those rows/columns?

Perceptual-motor adaptation to imagined hand position (Finke, 1979) • If you wear prism displacing lenses and repeatedly reach for objects in front of you for just a few minutes, you adapt to the erroneous feedback. When the lenses are removed you overshoot in the opposite direction. • If, instead of wearing lenses, you move your hand invisibly while you imagine that your hidden hand is at the displaced location, you get the same adaptation phenomena • Does this show that both your imagined hand and other properties of the scene are displayed somewhere in your visual system? • All you need are indexes to several objects in the visual scene, together with a distinct label for each (e.g., hand, block). This allows attention or even gaze to move to them. • No other visual properties need to be represented in order to create the discrepancy between felt and ‘seen’ (i.e. indexed) position that is required for adaptation to occur

S-R Compatibility effect with a visual displayThe Simon effect: It is faster to make a response in the direction of an attended objects than in another direction Response for A is faster when YES in on the left in these displays

S-R Compatibility effect with a recalled (mental) display The same RT pattern occurs for a recalled display as for a perceived one RT is faster when the A is recalled (imagined) as being on the left

In all these cases you only need indexes to a few visual objects located in appropriate places • In all examples we have seen, the results can be predicted without appealing to a mental display, if you assume that: • You can index a few visible objects (including texture elements on an apparently plain surface) and • The visual system can treat indexed objects as distinct or visually labeled

Visual indexes can anchor spatial representations to a scene containing visual objects: But how does this work without vision (e.g., in the dark)? • We must rely on our remarkable capacity to orient to (point to, navigate towards, …) perceived and recalled objects (including proprioceptive ‘objects’) in space without vision  Call this general capacity our spatial sense • How can the projection hypothesis account for this apparently world-centered spatial sense without assuming a global allocentric frame of reference? • Answer: Just as it does with vision, by anchoring represented objects to (non-visually) perceived objects in the world

The spatial sense and the projection hypothesis • Indexing non-visual ‘objects’ must exploit auditory and somatosensory signals, and perhaps even preparatory motor programs (the ‘intentional’ frame of reference proposed by Anderson & Bruneo, 2002; Duhamel, Colby & Goldberg, 1992) • Is there some special problem about somatosensory inputs that makes them different from visual inputs?

Is there a problem about somatosensory inputs providing ‘objects’ for anchoring the spatial sense? • Unlike visual objects, the “objects” in the somatosensory modalities are not fixed in an allocentric frame of reference • Notice that even in vision and audition, objects are always moving relative to sensors, so representations must be updated to take account of movements (Andersen, 1999; Stricanne, Anderson & Mazzoni, 1996) • Does the spatial sense entail a representation in a global allocentric frame of reference? • Does coordinating between somatosensory and visual inputs require a single global representational frame of reference?

Some concrete examples of spatial skills that suggest a global frame of reference • The assumption of a global spatial representation underlying sense of space is suggested by such observations as your ability (not always very accurate) to do the following in the dark: • Point to (or touch) a finger of your other hand • Move your eye towards or reach towards a source of sound • Reach towards where your hand was a second or so earlier • Imagine a rectangle and point to where its vertices are in space • Pick a random point on each side of the imagined rectangle and join pairs of points on opposite sides of the rectangle. Describe and point to where the newly drawn lines intersect. • Look at the things in front of you and then turn around and point to the location of one of the object you saw that is now behind you (as in the experiments by Attneave & Ferrar, 1977)

How are indexes going to help with such examples? • In order for the somatosensory case to work the way the purely visual case worked: • We need to specify how it is possible to index an objectin space using somatosensory signals, and • We need to show that a limited number of selected (indexed) individuals are involved, as in the nonvisual case.

What is the real problem of our sense of space? • In order to solve the problem of how we index objects in the world using somatosensory inputs we need to solve the problem of how we recognize two such inputs as corresponding to (reaching) the same thing in the world • This is the problem of the equivalence of movements, or of proprioceptive inputs, corresponding to reaching the same object – it’s the problem that Henri Poincaré recognized as the central problem of understanding our sense of space (Poincaré’s “Why space has three dimensions” in Les Dernier Penseés, 1913) • Solving this problem requires solving the problem of coordinating signals across frames of reference • That’s why mechanisms of coordinate transformation are of central importance – they generate the relevant equivalences!

What is special about representation of space in perception and thought?

What is special about representation of space in perception and thought?

Presentation Transcript

Environmental Racism Changes in Use and Perception of Wilderness By Beth Darnell LAR 512 Recreational Dimensions in Natu

Knowledge representation 1

Psy1306 Language and Thought

THE STUDY OF PERCEPTION

Why is perception important?

METOC Metrics for Naval Special Warfare

The History of Economic Thought

Chapter 3 Data Representation

GLARE (GuideLine Acquisition Representation and Execution)

Perception visuelle et conception graphique

INFM 718A / LBSC 705 Information For Decision Making

THE 2007 NATIONAL INTEGRITY PERCEPTION INDEX REPORT

Perception and Communication

Auditory Perception

CHAPTER 5 SIGNAL SPACE ANALYSIS

Ways of Knowing

A Probabilistic Approach to Semantic Representation

Space and Time

Special Education – Solving the Mystery Together

Graph

Perception