Desiderata for a form of representation

What is thinking?Thinking/reasoning is the process by which we go beyond the information given (beyond what we see or are told) • Distinguish between representations involved in the course of actively reasoning and those that constitute “standing knowledge • Only active representations are referred to as thought These are sometimes viewed as being in “working memory” (STM) • Such active representations take part in reasoning, solving problems and drawing inferences from standing knowledge. • “Standing knowledge” is said to be in “long term memory” (LTM). • The BIG questions: • Do we think in language (e.g., English) or • Do we think in pictures, or • Both or neither

Desiderata for a form of representation • A format for representing thoughts must meet certain conditions (Fodor & Pylyshyn, 1988) : • The capacity to think is productive (there is no limit to how many distinct thoughts the competence encompasses), • Therefore thoughts are built from a finite set of concepts • The capacity to represent and to draw inferences is systematic: If we have the capacity to think certain thoughts then we also have the capacity to think other related thoughts. • Thoughts may be false but they are not ambiguous to the thinker • When sentences are ambiguous it is because they express several possible unambiguous thoughts.

Thinking in words • Our experience of “thinking in words” is that of carrying on in inner dialogue with ourselves. But consider a typical fragment of such a dialogue. It conforms to Gricean principles of discourse: Be maximally informative – don’t say things that your audience already knows. • Since our mental dialogue follows these maxims it is clear that much is left unsaid. But according to the view that one thinks in words, if it is unsaid then it is also unthought! Or, conversely, if it was thought, it was not thought in words. It was left to the imagination of the listener – but according to this view there is no room for imagining something other than what was said. • Thoughts experienced as inner dialogue grossly underdetermine what is thought, so words cannot be the vehicle of thought.

As I sit here thinking about what I will say in this lecture, I observe myself thinking, “I’d better find a concrete example of this or nobody will understand what I mean, and then they certainly will not believe it!” If this was my thought, then what did I mean by “example” or “this”? And who was I referring to when I said “nobody”? Was there a presupposition that I wanted to persuade someone? Obviously I knew what I meant, but how was this knowledge represented? Not in words since I cannot find it anywhere in my consciousness. And if it was there in unconscious words, it would still have the same properties of anaphora, ambiguity, presupposition, and entailment since those are inherent in natural language.

Lingua Mentis • The representation of thoughts needs to meet the four conditions just listed (finite conceptual base, productivity, systematicity, freedom from ambiguity) • For that reason, thought requires a format similar to a logical calculus (or LF). Call it the Language of Thought (LOT), after Fodor’s famous 1975 book. • This is not to say that reasoning cannot use other forms of representation in addition to LOT. • Because LOT appears ill suited to represent magnitudes, the proposal that there is an additional (perhaps analog) form of representation is attractive • But none proposed so far is satisfactory – perhaps because the notion of an analog is ill defined.

Representational and inferential systematicity • Representational systematicity (Fodor & Pylyshyn, 1998) refers to the fact that if you can think certain thoughts then you have the capacity to think an indefinite number of other related thoughts: e.g., if you can think both that snow is white and that crows are black then you have the concepts snow, crow, white, and black which gives you the capacity also to think snow is black and crows are white. • Inferential or rule systematicity (Pylyshyn, 1984, Chapter 3) refers to the fact that for representations to enter into rules, the representations must have the relevant distinct constituents. For a rule of inference such as “From P  Q andNot-Q infer P” the parts P and Q have to be explicitly recognizable. The same is true of if-then rules. Suppose a system’s behavior is expressed by a pair of rules such as (1) if Q1 and Q2 hold, then execute action A1, and (2) if Q1 and Q3 hold then execute action A2. The three distinct conditions Q1, Q2 and Q3 must be constituents of a representation of the state of the system to which a rule applies. The rule could not be expressed by a representation that fuses the conditions, as connectionist models do (with Qn≡F[Q1,Q2]). Will be discussed later by Jerry Fodor

We often have strong experiences about the steps we go through in solving a problem:But does that tell us how we solve it? • Arnheim’s Visual Thinking (1969) • Rudolph Arnheim claims that when people solve visual analogy problems, they go through a sequence of mental states that involve a “rich and dazzling experience” of “instability,” of “fleeting, elusive resemblances,” and of being “suddenly struck by” some perceptual relationships. If this is true, does it explain how we solve the problems? • What steps do you go through in understanding language? • How do you experience thinking about numbers? • What is 9 + 7? What is 6 x 8? Which is larger 379 or 397? • Daniel Tammet, autistic savant • Ramachandra’s Science Video • What does the description of Daniel’s experience tell us?

Thinking in pictures (or pictures + words) • There is a large literature on scientific discovery that credits images as the cause of the discovery (benzine ring) • Are pictures better than words for expressing thoughts or for creating new thoughts? Why are images often cited as the format for non-verbal or intuitive thoughts? • To understand how pictures or words could serve as the basis for encoding thoughts we need to understand the assumptions underlying the claim that thoughts are encoded in pictures or words. • What’s missing is an understanding of the distinction between form and content which itself rests on another distinction central to cognitive science – the distinction between architecture and representation (more on this later)

The problem of experiential access to mental contents and processes How well do you know your own mind? • What does how things look tell you about the contents of your mental representation? • Must there be a representation corresponding to an appearance? • What do the changing contents of your conscious experience tell you about the changing representations that your mind goes through? Does it provide a trace of the process? • What do the contents of experience tell you about how you make decisions or solve problems? (Example later) • Does a description of your experience provide the basis for an account that is explanatory?

Why suppose that thoughts might be represented in the form of pictures? • Over 65% of the cortex is devoted to vision • Most of our knowledge of the world comes through vision • If we have a visual module, why not use it to encode/decode thoughts?

There are many questions about what goes on when we have the experience of “seeing with the mind’s eye” • Is mental imagery a special form of thought? If so, in what way is it special? • Are mental images sensory-based and modality-specific? • Are mental images like pictures? In what respect? • Are images different from other forms of thought? Do they, for example, resemble what they represent? • Does mental imagery use the visual system? If so, what does that tell us about the format of images? • Is there neurophysiological evidence for a pictorial “display” in visual cortex? • What if a display were found in human visual cortex?

These questions will be addressed this week (and maybe even next week) • But if mental imagery is to be thought of as being closely related to vision, we first we need to ask some questions about what vision is like. • First we need to recognize that what drives the imagery-vision parallel is the similar phenomenology but yet the phenomenology of vision is very misleading

The phenomenology of seeing (including its completeness, its filled-out fine details, and its panoramic scope) turns out to be an illusion! • We see far less, and with far less detail and scope, than our phenomenology suggests • Objectively, outside a small region called the fovea, we are colorblind and our sight is so bad we are legally blind. The rest of the visual field is seriously distorted and even in the fovea not all colors are in focus at once. • More importantly, we register surprisingly little of what is in our field of view. Despite the subjective impression that we can see a great deal that we cannot report, recent evidence suggests that we cannot even tell when things change as we watch.

What do we take in when we see? • What we actually take in functionally depends on: • Whether you are asking about the preconceptual information or the conceptual (seeing-as) information • Even pre-conceptual information is impoverished and built up over time. We will see later that this consists primarily in individuating and keeping track of individual objects • Whether the information was attended or not. Although unattended information is not entirely screened out, it is certainly curtailed and sometimes even inhibited.

Examples of attentional inhibition • Negative Priming (Treisman & DeShepper, 1996). • Is there a figure on the right that is the same as the figure on the left? • When the figure on the left is one that had appeared as an ignored figure on the right, RT is long and accuracy poor. • This “negative priming” effect persisted over 200 intervening trials and lasted for a month!

The effect of attention on whether objects are perceived /encoded: Inattentional Blindness(Mack, A., & Rock, I., 1998. Inattentional blindness. Cambridge, MA: Mit Press)

Inattentional Blindness • The background task is to report which of two arms of the + is longer. One critical trial per subject, after about 3,4 background trials. Another “critical” trial presented as a divided attention control. • 25% of subjects failed to see the square when it was presented in the parafovea (2° from fixation). • But 65% failed to see it when it was at fixation! • When the background task cross was made 10% as large, Inattentional Blindness increased from 25% to 66%. • It is not known whether this IB is due to concentration of attention at the primary task, or whether there is inhibition of outside regions.

Where does this leave us? • Given the examples of memory errors, should we conclude that seeing is a process of constructing conceptual descriptions? • Most cognitive scientists and AI people would say yes, although there would be several types of exception. • There remains the possibility that for very short durations (e.g. 0.25 sec) there is a form of representation very like visual persistence – sometimes called an ‘iconic storage’(Sperling, 1960). • From a neuroscience perspective there is evidence of a neural representation in early vision – in primary visual cortex – that is retinotopic and therefore “pictorial.” • Doesn’t this suggest that a ‘picture” is available in the brain in vision? • We shall see later that this evidence is misleading and does not support a picture theory of vision or of visual memory • A major theme of later lectures will be to show that an important mechanism of vision is notconceptual but causal: Visual Indexes • Many people continue to hold a version of the “picture theory” of mental representation in mental imagery. More on this later.

Architecture and Process • We now come to the most important distinction of all – that between behavior attributable to the architecture of a system and that attributable to properties of the things that are represented. Without this distinction we cannot distinguish between phenomena that reveal the nature of the system and phenomena that reflect the effects of external variables • So here is an example to make the point

time An illustrative example: Behavior of a mystery box What does this behavior pattern tell us about the nature of the box?

The moral of this example: Regularities in behavior may be due to one of two very different causes: The inherent nature of the system (to its relatively fixed structure), or The nature of what the system represents (what it “knows”).

The imagery debate The main difference between picture-theorists and the rest of us (me) is in how we answer the following question: • Do experimental findings on mental imagery (such as those I will review) tell us anything about the properties of a special imagery architecture? Or do they tell us about the knowledge that people have about how things would look if they were actually to see them (together with some common psychophysical skills)? • While these are the main alternatives, there are also other reasons why experiments come out as they do • Notice that the architecture alternative includes properties of the format adopted in a particular domain of representation – e.g., the Morse code used by the code-box in our example

Examples to probe your intuition Imagine various events unfolding before your “mind’s eye” – • Imagine a bicyclist racing up a hill. Down a hill? • Imagine turning a large heavy wheel. A light wheel. • Imagine a baseball being hit. What shape trajectory does it trace out? Where would you run to catch it? • Imagine a coin dropping and whirling on its edge as it eventually settles. Describe how it behaves. • Imagine a heavy ball (a shot-put) being dropped at the same time as a light ball (a tennis ball). Indicate when they hit the floor. Repeat for different heights. • Form a vivid auditory image of Alfred Brendel playing the minute waltz so you hear every note clearly. How long will it take? Why?

What color do you see when two color filters overlap?

Conservation of volume example

A basic mistake: Failure to distinguish between properties of the world being represented and properties of the representation or of the representational medium “Representation of object O with property P” Is ambiguous between these two parsings: Representation of (Object O with property P) vs (Representation of object O) with property P

Major Question 1: Constraints imposed by imagery Why do things happen the way they do in your imagination? • Is it because of the format of your image or your cognitive architecture? Or because of what you know? • Did it reveal a capacity of mind? • Or was it because you made it do what it did? • Can you make your image have any properties you choose? Or behave in any way you want? Why not? • How about imagining an object from all directions at once, or from no particular direction? • How about imagining a 4-dimensional object? • Can you imagine a printed letter which is neither upper nor lower case? A triangle that is not a particular type?

More demonstrations of the relation between vision and imagery • Images constructed from descriptions • The D-J example(s) • The two-parallelogram example • Amodal completion • Reconstruals: Slezak

Can images be visually reinterpreted? • There have been many claims that people can visually reinterpret images • These have all been cases where one could easily figure out what the combined image would look like without actually seeing it (e.g., the J – D superposition). • Pederson’s careful examination of visual “reconstruals” showed (contrary to her own conclusion) that images are never ambiguous (no Necker cube or figure-ground reversals) and when new construals were achieved from images they were quite different from the ones achieved in vision (more variable, more guessing from cues, etc). • The best evidence comes from a philosopher (Slezak, 1992, 1995)

Slezak figures Pick one (or two) of these animals and memorize what they look like. Now rotate it in your mind by 90 degrees clockwise and see what it looks like.

Slezak figures rotated 90o

Connect each corner of the top parallelogram with the corresponding corner of the bottom parallelogram Do this imagery exercise:Imagine a parallelogram like this one Now imagine an identical parallelogram directly below this one What do you see when you imagine the connections? Did the imagined shape look (and change) like the one you see now?

Amodal completion by imagery?

Is this what you saw?

Continue…. Images and space • Are images spatial – i.e. do they have spatial properties such as size, distance, and relations such as above, next-to, in-between? Do the axioms of Euclidean geometry and measure theory apply to them? • ab + bc  ac and ab = ba • If abc = 90°, then ab2 + bc2 = ac2 • If yes, what would that entail about how they must be instantiated in the brain? • Could they be analogue? What constraints does that impose? • Is the space 2-D or 3D? • Might they be in some “functional space” – i.e., behave as though they were spatial without having to be in real physical brain-space? What does that entail?

Do mental images have size? • Imagine a mouse across the room so it’s image size (% of your image display it occupies) is small • Now imagine it close to you so it fills your image display • Of these two conditions, in which does it take longer to answer “can you see the mouse’s whiskers?” • Imagine a horse. How close can you come to the image before it starts to overflow your image display? Repeat with a toaster, a table, a person’s face, etc

Do mental images have size?Imagine a very small mouse. Can you see its whiskers? Now imagine a huge mouse. Can you see its whiskers? Which is faster?

Image of (small X) vs Small(image of X)

Mental rotation Time to judge whether (a)-(b) or (b)-(c) are the same except for orientation increases linearly with the angle between them (Shepard & Metzler, 1971)

Imagine this shape rotating in 3D When you make it rotate in your mind, does it seem to retain its rigid 3D shape without re-computing it?

The missing bit of logic: • What is assumed in the case of mental rotation? • According to Prinz (2002) p 118,“If visual-image rotation uses a spatial medium of the kind Kosslyn envisions, then images must traverse intermediate positions when they rotate from one position to another. The propositional system can be designed to represent intermediate positions during rotation, but that is not obligatory.” • But what makes this obligatory in “functional Space”?

How are these ‘assumptions’ realized? • Assumptions such as rigidity must therefore be a property inherent in the architecture (the ‘display’) • That raises the question of what kind of architecture could possibly enforce rigidity of form. This brings us back to the proposed architecture – a physical display • Notice, however, that such a display, by itself, does not rigidly maintain the shape as orientation is changed. • There is evidence that rotation is incremental not holistic and is dependent on the complexity of the form and the task • Also such rigidity could not be part of the architecture of an imagery module because we can easily imagine situations in which rigidity does not hold (e.g. imagine a rotating snake!).

Mental Scanning • Some hundreds of experiments have now been done demonstrating that it takes longer to scan attention between places that are further apart in the imagined scene. In fact the relation is linear between time and distance. • These have been reviewed and described in: • Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental images: A window on the mind. Cahiers de Psychologie Cognitive / Current Psychology of Cognition, 18(4), 409-465. • Rarely cited are experiments by Pylyshyn & Bannon which I will summarize for you.

Studies of mental scanningDoes it show that images have metrical space? (Pylyshyn & Bannon. See Pylyshyn, 1981) • Conclusion: The image scanning effect is Cognitively Penetrable • i.e., it depends on goals and beliefs, or on Tacit Knowledge.

 The central problem with imagistic explanations… What is assumed in imagist explanations of mental scanning? • In actual vision, it takes longer to scan a longer distance because real distance, real motion, and real time is involved, therefore this equation holds due to natural law: Time =distance speed But what ensures that a corresponding relation holds in an image? The obvious answer is: Because the image is laid out in real space! • But what if that option is closed for empirical reasons? • Imagists appeal to a “Functional Space” which they liken to a matrix data structure in which some pairs of cells are closer and others further away, and to move from one to another it is natural that you pass through intermediate cells • Question: What makes these sorts of properties “natural” in a matrix data structure?

Kosslyn view: Images as depictive representations • “A depictive representation is a type of picture, which specifies the locations and values of configurations of points in a space. For example, a drawing of a ball on a box would be a depictive representation. • The space in which the points appear need not be physical…, but can be like an array in a computer, which specifies spatial relations purely functionally. That is, the physical locations in the computer of each point in an array are not themselves arranged in an array; it is only by virtue of how this information is ‘read’ and processed that it comes to function as if it were arranged into an array (with some points being close, some far, some falling along a diagonal, and so on). • Depictive representations convey meaning via their resemblance to an object, with parts of the representation corresponding to parts of the object… When a depictive representation is used, not only is the shape of the represented parts immediately available to appropriate processes, but so is the shape of the empty space … Moreover, one cannot represent a shape in a depictive representation without also specifying a size and orientation….” (Kossyln, 1994, p 5)

Thou shalt not cheat • There is no natural law or principle that requires that the representations of time, distance and speed to be related according to the motion equation. You could equally easily imagine an object moving instantly or according to any motion relation you like, since it is your image! • There are two possible answers why the relation Time =Representation of distance Representation of speed typically holds in an image-scanning task? • Because subjects have tacit knowledge that this is what would happen if they viewed a real display, or • Because the matrix is taken to be a simulation of a real-world display, as it often is in computer science

Thou shalt not cheat • What happens in ALL imagist accounts of phenomena, including mental scanning and mental rotation is that imagists assume that images have the properties of real space in order to provide a principled explanation, and then retreat to some “functional” or not-quite-real space when it is pointed out that they are assuming that images are laid out in real brain space. • This happens with mental rotation as well, even though it is an involuntary and universal way of solving the rotated-figure task so long as the task involves tokens of enantiomorphs. • Experiments have shown that: • No rotation occurs if the figures have landmarks or asymmetries that can be used to identify them, and • Records of eye movements show that mental rotation is done incrementally: It is not a holistic rotation as experienced. • The “rate of rotation” depends on the conceptual complexity of the recognition task so is not a result of the architecture

A final point… • In Kosslyn, Thompson & Ganis (2007) the authors cite Ned Block to the effect that one does not need an actual 2D surface, so long as the connections upstream from the cortical surface can decode certain pairs of neurons in terms of their imagined distance. Think of long stretchy axons going from a 2D surface to subsequent processes. Imagine that the neurons are randomly moved around so they are no longer on a 2D layout. As long as the connections remain fixed it will still behave as though there was a 2D surface. • Call this the “encrypted 2D layout” version of literal space.

The encrypted-spatial layout alternative • By itself the encrypted-layout alternative will not do because without referring to the original locations, the relation between pairs of neurons and scan time is not principled. In the end the only principle we have is Time=distance/speed so unless the upstream system decrypts the neuron locations into their original 2D surface locations the explanation for the increase in time with increased imagined distance remains a mere stipulation. It stipulates, but does not explain why, when two points are further away in the imagined layout it takes longer to scan between them or why scanning between them requires that one visit ‘intermediate’ locations along the way. • But this is what we need to explain! So long as what we have is a stipulation, we can apply it to any form of representation! What was a principled explanation with the literal 2D display has now been given up for a mere statement of how it shall be.

Desiderata for a form of representation

Desiderata for a form of representation

Presentation Transcript

Desiderata for evaluation

A Probabilistic Framework for Video Representation

Desiderata for a satisfactory theory of musical expressiveness

Requirements and Desiderata for Fault-Tolerant Quantum Computing

Overall Desiderata for Sigma (

Desiderata of Theories

Desiderata

Image Metadata Desiderata for a manageable future

Identity A desiderata for the Next Generation Internet

Some Desiderata For Machine Understanding

A Probabilistic Framework for Video Representation

Representation is representation of similarities

Representation of...

New Desiderata for Biomedical Terminologies

Desiderata for a satisfactory theory of musical expressiveness