Publications

(For additional bibliometric data, see Google Scholar.)

  • Hafri, A., Bonner, M. F., Landau, B., & Firestone, C. (in press). A phone in a basket looks like a knife in a cup: Role-filler independence in visual processing. Open Mind.
    [pdf][data]

  • Sun, Z., Han, S., & Firestone, C. (2024). Caricaturing shapes in visual memory. Psychological Science.
    [pdf][data][demos]

  • Morales, J., & Firestone, C. (2024). Empirical evidence for perspectival similarity. Psychological Review, 131, 311–320.
    [pdf] - This paper is (mostly) a reply to Burge & Burge (2022). See also a companion article by Cheng and colleagues.

  • Firestone, C., & Phillips, I. (2023). Chasing an equation for awareness [book review]. Science, 382, 1251.
    [pdf]

  • Croom, S., Zhou, H., & Firestone, C. (2023). Seeing and understanding epistemic actions. Proceedings of the National Academy of Sciences, 120, e2303162120.
    [pdf][data][demos]

  • Hafri, A., Green, E. J., & Firestone, C. (2023). Compositionality in visual perception [commentary]. Behavioral and Brain Sciences, 46, e277.
    [pdf]

  • Nartker, M., Firestone, C., Egeth, H., & Phillips, I. (2023). Six ways of failing to see (and why the differences matter). i-Perception, 14(5), 1–6.
    [pdf]

  • Goh, R. Z., Phillips, I., & Firestone, C. (2023). The perception of silence. Proceedings of the National Academy of Sciences, 120, e2301463120.
    [pdf][data][demos]

  • Morales, J., & Firestone, C. (2023). Philosophy of perception in the psychologist's laboratory. Current Directions in Psychological Science, 32, 307–317.
    [pdf]

  • Firestone, C., & Phillips, I. (2023). Seeing fast and thinking slow [book review]. Science, 379, 1196.
    [pdf] - This is a review of The Border Between Seeing and Thinking, by Ned Block.

  • *Nartker, M., *Zhou, Z., & Firestone, C. (2023). When will AI misclassify? Intuiting failures on natural images. Journal of Vision, 23(4):4, 1–15.
    [pdf][data]

  • Phillips, I., & Firestone, C. (2023). Visual adaptation and the purpose of perception. Analysis, 83, 555–575.
    [pdf] - Reply by Block.

  • *Hafri, A., *Boger, T., & Firestone, C. (2022). Melting ice with your mind: Representational momentum for physical states. Psychological Science, 35, 725–735.
    [pdf][data][demos]

  • Mandelbaum, E., Dunham, Y., Feiman, R., Firestone, C., Green, E. J., Harris, D., Kibbe, M. M., Kurdi, B., Mylopoulous, M., Shepherd, J., Wellwood, A., Porot, N., & Quilty-Dunn, J. (2022). Problems and mysteries of the many languages of thought. Cognitive Science, 46, e13225.
    [pdf]

  • Morales, J., & Firestone, C. (2022). A new perspective on mental rotation. Current Biology, 32, R1281-R1283.
    [pdf]

  • Sun, Z., & Firestone, C. (2022). Beautiful on the inside: Aesthetic preferences and the skeletal complexity of shapes. Perception, 51, 904–918.
    [pdf][data][try it!]

  • Lepori, M. A., & Firestone, C. (2022). Can you hear me now? Sensitive comparisons of human and machine perception. Cognitive Science, 46, e13191.
    [pdf][data][try it!]

  • Groh, M., Epstein, Z., Firestone, C., & Picard, R. (2022). Deepfake detection by human crowds, machines, and machine-informed crowds. Proceedings of the National Academy of Sciences, 119, e2110013119.
    [pdf][data][try it!]

  • Sun, Z., & Firestone, C. (2022). Seeing and speaking: How verbal 'description length' encodes visual complexity. Journal of Experimental Psychology: General, 151, 82–96.
    [pdf][data][listen]

  • Morales, J., Bax, A., & Firestone, C. (2021). Perspectival interference up close. Proceedings of the National Academy of Sciences, 118, e2025440118.
    [pdf][data][supplement]
    - This publication is a letter replying to Linton, who was in turn replying to our original PNAS paper, "Sustained representation of perspectival shape". It gets its own entry on our publications page because it reports new data!

  • Won, I., Gross, S., & Firestone, C. (2021). "Impossible" somatosensation and the (ir)rationality of perception. Open Mind, 5, 30–41.
    [pdf][data]

  • Hafri, A., & Firestone, C. (2021). The perception of relations. Trends in Cognitive Sciences, 25, 475–492.
    [pdf][cover]

  • Little, P. C., & Firestone, C. (2021). Physically implied surfaces. Psychological Science, 32, 799–808.
    [pdf][data][demos]

  • Rivera-Aparicio, J., Yu, Q., & Firestone, C. (2021). Hi-def memories of Lo-def scenes. Psychonomic Bulletin & Review, 28, 928–936.
    [pdf][data]

  • Sun, Z., & Firestone, C. (2021). Curious objects: How visual complexity guides attention and engagement. Cognitive Science, 45, e12933.
    [pdf][data]

  • Firestone, C. (2020). Performance vs. competence in human-machine comparisons. Proceedings of the National Academy of Sciences, 117, 26562–26571.
    [pdf]

  • Sun, Z., & Firestone, C. (2020). Optimism and pessimism in the predictive brain. Trends in Cognitive Sciences, 24, 683–685.
    [pdf]

  • Morales, J., Bax, A., & Firestone, C. (2020). Sustained representation of perspectival shape. Proceedings of the National Academy of Sciences, 117, 14873–14882.
    [pdf][data][demos]
    - Check out some very interesting follow-up discussion (ft. Jonathan Cohen and several other philosophers) here.

  • Sun, Z., & Firestone, C. (2020). The dark room problem. Trends in Cognitive Sciences, 24, 346–348.
    [pdf]

    Responses:
    - Klein (2020)
    - Seth et al. (2020)
    - Van de Cruys, Friston, & Clark (2020)

  • Mandelbaum, E., Won, I., Gross, S., & Firestone, C. (2020). Can resources save rationality? "Anti-Bayesian" updating in cognition and perception [commentary]. Behavioral & Brain Sciences, 43, e16.
    [pdf]

    - This is a reply to a target article by Lieder & Griffiths. For the original article, including their reply to our commentary, see here.

  • Guan, C., & Firestone, C. (2020). Seeing what's possible: Disconnected visual parts are confused for their potential wholes. Journal of Experimental Psychology: General, 143, 590–598.
    [pdf][data]

  • Valenti, J. J., & Firestone, C. (2019). Finding the "odd one out": Memory color effects and the logic of appearance. Cognition, 191, 103934.
    [pdf][data]

  • Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10, 1334.
    [pdf][data]

  • Lowet, A. S., Firestone, C., & Scholl, B. J. (2018). Seeing structure: Shape skeletons modulate perceived similarity. Attention, Perception, & Psychophysics, 80, 1278–1289.
    [pdf]

  • Firestone, C., & Scholl, B. J. (2017). Seeing and thinking in studies of embodied "perception": How (not) to integrate vision science and social psychology. Perspectives on Psychological Science, 12, 341–343.
    [pdf]

    - This is a reply to a target article by Schnall. See another reply by Durgin and a reply to those replies by Schnall in turn.

  • Firestone, C., & Scholl, B. J. (2016). Seeing and thinking: Foundational issues and empirical horizons [response to commentaries]. Behavioral & Brain Sciences, 39, e264.
    [pdf]

  • Firestone, C., & Scholl, B. J. (2016). Cognition does not affect perception: Evaluating the evidence for 'top-down' effects [target article]. Behavioral & Brain Sciences, 39, e229.
    [pdf]

  • Firestone, C., & Keil, F. C. (2016). Seeing the tipping point: Balance perception and visual shape. Journal of Experimental Psychology: General, 145, 872–881.
    [pdf]

  • Firestone, C., (2016). Embodiment in perception: Will we know it when we see it? In H. Kornblith & B. McLaughlin (eds.), Alvin Goldman and his Critics (pp. 318–334). Wiley Blackwell.
    [pdf]

    - Reply by Goldman.

  • Firestone, C., & Scholl, B. J. (2016). 'Moral perception' reflects neither morality nor perception. Trends in Cognitive Sciences, 20, 75–76.
    [pdf]

    - Response (by Gantman & Van Bavel), and a rejoinder (by us) to that response.

  • Firestone, C., & Scholl, B. J. (2015). When do ratings implicate perception vs. judgment? The "overgeneralization test" for top-down effects. Visual Cognition, 23, 1217–1226.
    [pdf]

  • Firestone, C., & Scholl, B. J. (2015). Enhanced visual awareness for morality and pajamas? Perception vs. memory in top-down effects. Cognition, 136, 409–416.
    [pdf]

  • Firestone, C., & Scholl, B. J. (2015). Can you experience top-down effects on perception? The case of race categories and perceived lightness. Psychonomic Bulletin & Review, 22, 694–700.
    [pdf]

  • Firestone, C., & Scholl, B. J. (2014). "Please tap the shape, anywhere you like": Shape skeletons in human vision revealed by an exceedingly simple measure. Psychological Science, 25, 377–386.
    [pdf]

  • Firestone, C., & Scholl, B. J. (2014). "Top-down" effects where none should be found: The El Greco fallacy in perception research. Psychological Science, 25, 38–46.
    [pdf]

  • Firestone, C. (2013). On the origin and status of the "El Greco fallacy". Perception, 42, 672–674.
    [pdf]

    Some interesting source materials:
    - Original 'Astigmatismo Del Greco' pamphlet
    - 1914 Dutch news article on astigmatic El Greco theory

  • Firestone, C. (2013). How 'paternalistic' is spatial perception? Why wearing a heavy backpack doesn't — and couldn't — make hills look steeper. Perspectives on Psychological Science, 8, 455–473.
    [pdf]

    Responses:
    - Proffitt (2013)
    - Witt (2015)


Conference Abstracts

What factors can imply that something is *there*? Low-level visual cues such as coincidental clipping or occlusion can make us see surfaces that are not physically present, as in Kanizsa figures or amodal completion. Here, we show how such basic perceptual processes can also be driven by a surprisingly *high-level* cue: physical interaction. In three experiments, subjects who saw an object "bounce" off an invisible surface better identified a real surface that matched the invisible surface's implied orientation. Thus, seemingly low-level object detection can be driven by high-level cues, creating illusory contours defined by "physics".
A powerful approach for representing images is to determine their shortest possible "description length". A surprising prediction of such approaches is a U-shaped relationship between an image's "objective" complexity and the complexity of its representation, because overly complex stimuli often have "simple" explanations. Here, 3 experiments explore this relationship in a novel way: Subjects verbally described complex shapes, dotarrays, or motion-paths; 10,000 such speech clips revealed a striking quadratic relationship between raw image complexity and the complexity of their verbal descriptions. In other words, what we *say* about images can reveal how we *see* them.
Arguably the most foundational principle in perception research is that our visual experience of the world goes beyond the retinal image; we represent the distal environment outside of our minds rather than the proximal stimulation that reaches us. For example, when a circular coin is tilted, it casts an elliptical silhouette on our eyes; but we appreciate the circular object that it truly is, beyond the perspectival ellipse it projects to us.

Despite the ubiquity of shape constancy, philosophers struggle to agree on the nature of our experience when an object’s environmental shape doesn’t match its perspectival shape. Does a tilted coin look like a circular object, or an elliptical object? A broad set of philosophical discussions have a stake in this question, including the “objectivity” of perception (Burge 2010), the contents of visual experience (Peacocke 1983; Schellenberg 2008; Smith 2002; Tye 2000), the distinction between perception, imagery, and cognition (Nanay 2018) and the reliability of introspection (Schwitzgebel 2011).

Notwithstanding the centrality of this question, philosophical debates have traditionally involved only appeals to introspection (e.g., Locke, 1975; Gibson, 1986; Noë 2004; Schwitzgebel, 2011), despite suggestions that empirical data could be relevant (Schwenkler and Weksler 2018; Kelly 2008). Here, we provide the first empirical test of this core philosophical question, and provide evidence that perspectival shapes are explicitly represented by our visual systems after 3D representations arise.

Psychophysical evidence for perspectival shape experiences

The logic of our studies is simple: If a rotated circular coin is genuinely perceived as an ellipse, then it should impair search for an objectively elliptical object. In other words, if your task is to locate an objectively elliptical object, then a rotated circular object should serve as an effective “distractor”. Here, five experiments where we exploit a distinctive pattern of results from visual search show that this is the case.

Subjects saw a simple two-item search array containing highly realistic images of differently shaped 3D “coins”; their task on each trial was simply to locate a distally elliptical coin (for demos, visit www.perceptionresearch.org/coins). Strikingly, objectively circular coins slowed search for elliptical objects when the circular coins were rotated in depth, even when subjects clearly reported seeing them as circular. This pattern arose for images containing both static (Exp.1) and motion-based (Exp.2) depth cues; it held not only for speeded judgments but also in a delayed-judgment condition in which subjects viewed the coins for a sustained period before responding (Exp.3); this pattern also arose for images where the coins’ size was completely non-predictive of their “ellipticality” (Exp.4). Finally, a completely different paradigm (Exp.5) showed that viewing a rotated circular object boosted subsequent identification of an elliptical line-drawing, suggesting that rotated objects also prime their perspectival shapes.

We conclude that objects in the world have a surprisingly persistent dual character in the mind: Their objective shape “out there”, and their perspectival shape “from here”. Moreover, we present these results as a case study of a new way that empirical data can directly bear on philosophical questions.

References: www.perceptionresearch.org/coins

How does what we say reflect what we see? A powerful approach to representing objects is for the mind to encode them according to their shortest possible "description length". Intriguingly, such information-theoretic encoding schemes often predict a non-linear relationship between an image's "objective" complexity and the resources devoted to representing it, because excessively complex stimuli might have simple underlying explanations (e.g. if they were generated randomly). How widely are such schemes implemented in the mind? Here, we explore a surprising relationship between the perceived complexity of images and the complexity of *spoken descriptions* of those images. We generated a library of visual shapes, and quantified their complexity as the cumulative surprisal of their internal skeletons — essentially measuring the *amount of information* in the objects. Subjects then freely described these shapes in their own words, producing thousands of unique audio clips. Interestingly, the length of such spoken descriptions predicted explicit judgments of perceived complexity (by a separate group of subjects), as well as visual search speed in arrays containing those objects. But perhaps more surprisingly, the dataset of spoken descriptions revealed a striking quadratic relationship between the objects’ objective complexity and the length of their spoken descriptions: Both low-complexity stimuli *and* high-complexity stimuli received relatively shorter verbal descriptions, with a peak in spoken description length occurring for intermediately complex objects. Follow-up experiments extended this work to grouped objects, as well as to patterns of motion. The results establish a surprising connection between linguistic expression and visual perception: The way we describe images can reveal how our visual systems process them.
Perception research traditionally investigates how actual states of the world are seen — how we perceive the shapes, colors, and locations that objects actually have. By contrast, everyday life provokes us to consider possible states of the world that have not yet (and may not ever) actually obtain. For example, when assembling furniture or completing a jigsaw puzzle, we may appreciate not only the particular shapes of individual objects, but also their potential to combine into new objects with distinct shapes of their own. What is the nature of this experience? Here, we explore how visual processing extracts not only what objects are, but also what they could become, by showing how the mind literally confuses potential objects for real ones. In 5 experiments inspired by the puzzle game Tetris, subjects had to respond to a particular target within a stream of distracting “tetrominoes”; surprisingly, subjects false-alarmed more often to pairs of tetrominoes that could create their target than to pairs of tetrominoes that couldn’t — essentially representing possible objects as if they were physically present on the display. This pattern held for several types of objects and transformations, and could not be explained by previously known factors, such as spatial alignment, representational momentum due to imputed gravity, or various forms of response bias. We suggest that possible states of the world can not only be contemplated in moments of deliberate reflection, but also automatically represented by more basic mechanisms of perception and attention.
Perhaps the most basic task faced by visual processing is detecting an object’s presence; before computing an object’s shape, size, or orientation, we must first register that it is there. To this end, vision infers the presence of objects using cues such as continuity behind occluders, coincidental clipping of multiple figures, or obstructed motion. But these cues concern only low-level aspects of visual processing; could more sophisticated forms of input cue an object’s presence? Here we explore how a surprisingly high-level cue—physical interaction—can imply the presence of hidden surfaces, in ways that directly facilitate later detection.

Subjects watched a video of a man stepping onto a box; but the box had been digitally removed, giving the impression of the man stepping on an “invisible” surface. Afterwards, a visible line appeared where the man had stepped; its orientation either matched the invisible surface implied by his action (horizontal) or did not match (vertical), and subjects’ task was to report the visible line’s orientation. In other trials, the man ran into an invisible wall, followed by a congruent vertical line or incongruent horizontal line. Subjects were faster and more accurate at reporting the line’s orientation when it matched the orientation of the physically implied surface vs. when it conflicted. Further experiments extended this effect to more idealized animations of disks bouncing off horizontal or tilted invisible surfaces. This work demonstrates how a seemingly low-level process—detecting the presence of an object—can be influenced by a surprisingly high-level cue: otherwise-unexplained physical interactions.

Social stereotypes impact the conclusions we draw about other people. Women, for example, are deemed to be less likely to succeed than men in particularly intellectually demanding tasks (Bian et al. 2018). This suggests that higher-order judgments about qualities like ‘brilliance’ or ‘genius’ can be shaped by gender stereotypes. But could gender stereotypes be so cognitively entrenched that they could affect more basic perceptual judgments as well? For example, could harboring the stereotype ‘doctors are men’ make it more difficult to see a female doctor?

Consider an analogy with infections, where we might ask what sorts of judgments are susceptible to the infection and what judgments are immune. Prima facie, it seems plausible that stereotypes would only have a narrow scope of infectious influence—i.e. gender stereotypes could impact higher-level judgments; but it would be very surprising indeed for stereotypes to alter more basic and incidental perceptual responses.

Here, we present such surprising results. Our findings suggest that stereotyping has a considerably wider scope of causal influence than has been appreciated in the philosophical and psychological literature.

Arguably the most foundational principle in all of perception research is that our visual experience of the world goes beyond the retinal image; we represent the distal environment outside of our minds rather than the proximal stimulation that reaches us. This principle has been articulated since at least Helmholtz, and explains well-known phenomena such as color constancy (https://www.illusionsindex.org/ir/checkershadow) and amodal completion (https://www.illusionsindex.org/i/kanizsa-triangle). Shape perception, however, has long been the most influential example of such “unconscious inference”. For example, when a circular coin is tilted, it casts an elliptical silhouette on our eyes; but we appreciate the circular object that it truly is, beyond the perspectival ellipse it projects to us.

Despite the ubiquity of shape constancy, philosophers have struggled to agree on the nature of our experience when an object’s environmental shape doesn’t match its perspectival shape. Does a tilted coin look like a circular object, or an elliptical object? Importantly, an unusually broad set of philosophical discussions have a stake in this question, including the “objectivity” of perception (Burge 2010), the contents of visual experience (Peacocke 1983; Schellenberg 2008; Smith 2002; Tye 2000), the distinction between perception, imagery, and cognition (Nanay 2018) and the reliability of introspection (Schwitzgebel 2011). For example, one’s commitment to the objective representational nature of perception (Burge 2010) might weaken if some form of “ellipticality” persists in our minds when we see a tilted coin.

Can empirical data break a philosophical deadlock?

Despite the centrality of this question in so many philosophical domains, these debates—in both their classical and their contemporary incarnations—have traditionally involved only appeals to introspection. Locke, for instance, insists that the world looks flat: “When we set before our eyes a round globe…the idea thereby imprinted in our mind is of a flat circle”; and that we only have higher-level thoughts and judgments about a 3D world: “[judgment] alters the appearances into their causes.” (1975, II.9.8) Gibson’s intuitions disagree: “No one ever saw the world as a flat patchwork of colors” (1986, 286). This pattern of disagreement prevails today, with philosophers similarly relying on introspection. Noë, for example, thinks tilted coins look both circular and elliptical (2004, 163), while Schwitzgebel writes: “My first and recurring inclination is to say that the penny looks just plain circular […] not elliptical at all, in any sense or by any effort I can muster.” (2011, 19)

Strikingly, these discussions haven’t put this core question to empirical test, despite suggestions that empirical data could be relevant (Schwenkler and Weksler 2018; Kelly 2008). Here, we provide the first empirical test of this core philosophical question. In five experiments, we exploit a distinctive pattern of results from visual search to show that perspectival shape representations guide the deployment of visual attention and action—the first empirical evidence that perspectival shapes are explicitly represented by our visual systems after 3D representations arise, and in ways that bear directly on the philosophical questions posed earlier.

Psychophysical evidence for perspectival shape experiences

The logic of our studies is simple: If a rotated circular coin is genuinely perceived as an ellipse, then it should impair search for an objectively elliptical object. In other words, if your task is to locate an objectively elliptical object, then a rotated circular object should serve as an effective “distractor”. Here, five experiments show that this is the case.

Subjects saw a simple two-item search array containing highly realistic images of differently shaped 3D “coins”; their task on each trial was simply to locate a distally elliptical coin (for demos, visit www.perceptionresearch.org/coins). Strikingly, objectively circular coins slowed search for elliptical objects when the circular coins were rotated in depth, even when subjects clearly reported seeing them as circular. This pattern arose for images containing both static (Exp.1) and motion-based (Exp.2) depth cues; it held not only for speeded judgments but also in a delayed-judgment condition in which subjects viewed the coins for a sustained period before responding (Exp.3); this pattern also arose for images where the coins’ size was completely non-predictive of their “ellipticality” (Exp.4). Finally, a completely different paradigm (Exp.5) showed that viewing a rotated circular object boosted subsequent identification of an elliptical line-drawing, suggesting that rotated objects also prime their perspectival shapes.

We conclude that objects in the world have a surprisingly persistent dual character in the mind: Their objective shape “out there”, and their perspectival shape “from here”. Moreover, we present these results as a case study of a new way that empirical data can directly bear on philosophical questions.

References: www.perceptionresearch.org/coins

Social stereotypes shape our judgments about people around us. What types of judgments are susceptible to such biased interference? A striking example of stereotype bias involves the treatment of people whose identities run counter to our stereotypes—as when women are assumed to be students, research assistants, or nurses rather than professors, principal investigators, or doctors. Can such stereotypes also intrude on representations that have nothing to do with the content of the stereotype in question? Here, we explore how the assumptions we make about other people can impair our ability to process completely incidental, and surprisingly low-level, aspects of their appearance—including even their location in space. We collected professional headshots of male and female physicians from a major medical institution, and asked subjects simply to indicate the direction of the depicted subject's shoulders (left or right)—an extremely straightforward task that subjects performed with near-ceiling accuracy. The key manipulation was a cue on each trial that the upcoming image would be of a "doctor" or a "nurse", and a regularity in the experiment such that "doctor"-labeled images tended to face one way and "nurse"-labeled images tended to face the other way. Even though the gender of the subjects was completely irrelevant to any aspect of the task, subjects were slower to judge the orientation of stereotype-incongruent people (female "doctors" and male "nurses") than stereotype-congruent people (male "doctors" and female "nurses"), even though the images were identical in both conditions (with only the labels changing)—including in a large direct replication. Follow-up experiments without these regularities showed that this effect couldn't be explained by the raw surprisingness of, e.g., seeing a man when expecting a nurse; instead, these results suggest that even straightforward forms of statistical learning (here, between labels and orientations) can be intruded upon by long-held social biases, and in ways that alter processing of incidental and basic visual features.
Perception research traditionally investigates how actual states of the world are seen — how we perceive the shapes, colors, and locations that objects actually have. By contrast, everyday life provokes us to consider possible states of the world that have not yet (and may not ever) actually obtain. For example, when assembling furniture or completing a jigsaw puzzle, we may appreciate not only the particular shapes of individual objects, but also their potential to combine into new objects with distinct shapes of their own. What is the nature of this experience? Here, we explore how visual processing extracts not only what objects are, but also what they could become. Our previous work showed that, for extremely simple displays, pairs of geometrically compatible objects prime their potential completions, such that (e.g.) two puzzle pieces activate representations of a completed puzzle. Here, we explore how the mind literally confuses potential objects for real ones. In 5 experiments inspired by the puzzle game Tetris, subjects had to respond to a particular target within a stream of distracting “tetrominoes”; surprisingly, subjects false-alarmed more often to pairs of tetrominoes that could create their target than to pairs of tetrominoes that couldn’t — essentially representing possible objects as if they were physically present on the display. This pattern held for several types of objects and transformations, and could not be explained by previously known factors, such as spatial alignment, representational momentum due to imputed gravity, or various forms of response bias. We suggest that possible states of the world can not only be contemplated in moments of deliberate reflection, but also automatically represented by more basic mechanisms of perception and attention.
Our minds effortlessly recognize the objects and environments that make up the scenes around us. Yet scene understanding relies on much richer information, including the relationships between objects—such as which objects may be in, on, above, below, behind, or in front of one another. Such spatial relations are the basis for especially sophisticated inferences about the current and future physical state of a scene (“What will fall if I bump this table?” “What will come with if I grab this cup?”). Are such distinctions made by the visual system itself? Here, we ask whether spatial relations are extracted at a sufficiently abstract level such that particular instances of these relations might be confused for one another. Inspired by the observation that certain spatial distinctions show wide agreement across the world’s languages, we focus on two cross-linguistically “core” categories—Containment (“in”) and Support (“on”). Subjects viewed streams of natural photographs that illustrated relations of either containment (e.g., phone in basket; knife in cup) or support (e.g., spoon on jar; tray on box). They were asked to press one key when a specific target image appeared (e.g., a phone in a basket) and another key for all other images. Although accuracy was quite high, subjects false-alarmed more often for images that matched the target’s spatial-relational category than for those that did not, and they were also slower to reject images from the target’s spatial-relational category. Put differently: When searching for a knife in a cup, the mind is more likely to confuse these objects with a phone in a basket than with a spoon on a jar. We suggest that the visual system automatically encodes a scene’s spatial composition, and it does so in a surprisingly broad way that abstracts over the particular content of any one instance of such relations.
Perhaps the most basic task faced by visual processing is to detect the presence of objects; before computing an object's color, shape, or orientation, we must first register that something is there. Detection may be fairly straightforward when an object is fully visible, but in many realistic viewing conditions the visual system can only infer the presence of objects using cues such as continuity behind occluders, coincidental clipping of multiple figures, or unified motion against a background. All such cues, however, concern fairly low-level aspects of visual processing, encompassing only basic geometric and kinetic factors. Might more sophisticated forms of input cue an object's presence? Here, we explore how a surprisingly high-level cue—physical interaction—can imply the presence of a hidden surface, in ways that directly facilitate later detection. Subjects saw an animation of a disk falling and then unexpectedly bouncing off of an 'invisible' surface. Sometimes, the disk bounced straight back up, implying a flat surface; other times, the disk’s bounce implied an angled surface. Afterwards, a visible line appeared where the disk had just bounced, whose orientation either matched or didn't match the surface implied by the disk’s exit trajectory; subjects' task was simply to report the orientation of this new, visible line, regardless of the physical events that came before it. Subjects were faster and more accurate at reporting the line's orientation when it matched the orientation of the physically implied surface vs. when it conflicted. Follow-up experiments extended this work to attentive search in multi-event displays; again, detection of a specific oriented line was facilitated by seeing physical interactions that implied surfaces with that orientation. This work shows how a process as basic and seemingly low-level as detecting the presence of an object or contour can be influenced by a surprisingly high-level cue: otherwise-unexplained physical interactions.
Arguably the most foundational principle in perception research is that our visual experience of the world goes beyond the retinal image; we perceive the distal environment itself rather than the proximal stimulation it causes. Shape, in particular, may be the paradigm case of such “unconscious inference”: When a circular coin is rotated in depth, for example, we experience it as the circular object it truly is, rather than as the perspectival ellipse it projects on the retina. But what is the fate of such perspectival shapes? Once our visual system infers that an elliptical projection arose from a distally circular object, do our minds continue to represent the “ellipticality” of the rotated coin? If so, objectively circular objects should, when rotated, impair search for objectively elliptical objects. Here, four experiments demonstrate that this is so, suggesting that perspectival shape representations persist far longer than is traditionally assumed. Subjects saw a simple two-item search array containing cue-rich images of differently shaped 3D “coins”; their task on each trial was simply to locate a distally elliptical coin. Surprisingly, objectively circular coins slowed search for elliptical objects when the circular coins were rotated in depth, even when subjects clearly reported seeing them as circular. This pattern arose for images containing both static (Exp.1) and motion-based (Exp.2) depth cues, and it held not only for speeded judgments but also in a delayed-judgment condition in which subjects viewed the coins for a sustained period before responding (Exp.3). Finally, a completely different paradigm (Exp.4) showed that viewing a rotated circular object boosted subsequent identification of an elliptical line-drawing, suggesting that rotated objects also prime their perspectival shapes. We conclude that objects in the world have a surprisingly persistent dual character in the mind: Their objective shape “out there”, and their perspectival shape “from here”.
Memories fade over time: A crisp hike on a wooded trail becomes harder to vividly recall as it moves further into the past. As the quality of a memory wanes, what happens to that memory’s content? For example, as one’s memory of a hike fades and loses clarity, might one also recall the day itself as literally being dimmer or more faded? Or might the opposite occur: Might we recall the experience as having been clearer and more detailed than it really was, even as our ability to recall those details is diminished? Here, four experiments demonstrate a surprising bias to remember visual scenes as having been more vivid and higher quality than they really were. Subjects saw images of natural scenes that had been blurred to varying degrees. A brief delay followed each scene, after which a new instance of the same scene appeared; subjects adjusted the blur of the new image to match the blur of the scene they had just viewed. Surprisingly, a powerful bias emerged wherein subjects misremembered scenes as being sharper and more vivid (i.e., less blurry) than they had truly appeared moments earlier. Follow-up experiments extended this phenomenon to saturation (with a bias to remember scenes as more colorful) and pixelation (with a bias to remember scenes as appearing at a higher resolution), while ruling out various response biases (e.g., a preference to look at sharper scenes, or to give extreme responses). The strength and pervasiveness of this bias suggests that, just as the mind fills in the details surrounding scenes in phenomena such as boundary extension, a similar process occurs within a scene itself: A phenomenon of “vividness extension”, whereby scenes are remembered as being more vivid than they really were.
How does what we say reflect what we see? A powerful approach to representing objects is for the mind to encode them according to their shortest possible "description length". Intriguingly, such information-theoretic encoding schemes often predict a non-linear relationship between an image's "objective" complexity and the actual resources devoted to representing it, because excessively complex stimuli might have simple underlying explanations (e.g. if they were generated randomly). How widely are such schemes implemented in the mind? Here, we explore a surprising relationship between the perceived complexity of images and the complexity of spoken descriptions of those images. We generated a library of visual shapes, and quantified their complexity as the cumulative surprisal of their internal skeletons — essentially measuring the amount of information in the objects. Subjects then freely described these shapes in their own words, producing more than 4000 unique audio clips. Interestingly, we found that the length of such spoken descriptions could be used to predict explicit judgments of perceived complexity (by a separate group of subjects), as well as ease of visual search in arrays containing simple and complex objects. But perhaps more surprisingly, the dataset of spoken descriptions revealed a striking quadratic relationship between the objective complexity of the stimuli and the length of their spoken descriptions: Both low-complexity stimuli and high-complexity stimuli received relatively shorter verbal descriptions, with a peak in spoken description length occurring for intermediately complex objects. Follow-up experiments went beyond individual objects to complex arrays that varied in how visually grouped or random they were, and found the same pattern: Highly grouped and highly random arrays were tersely described, while moderately grouped arrays garnered the longest descriptions. The results establish a surprising connection between linguistic expression and visual perception: The way we describe images can reveal how our visual systems process them.
Some of the most striking phenomena in visual perception are “impossible figures”—objects or scenes that could never exist in real life, such as a staircase that ascends in every direction, or a triangle with three 90° sides. How pervasive are such experiences in the mind? Specifically, could there be impossible multisensory experiences? Here, we explore one such example that is both (i) phenomenologically striking, and (ii) theoretically significant for notions of perception as rational Bayesian inference. In the Size-Weight Illusion, a smaller object is perceived as heavier than an objectively-equally-weighted larger object. This illusion, though not “impossible”, is puzzling: typically, our interpretation of new data is attracted towards our priors, but the size-weight illusion instead seems to involve repulsion from our priors; faced with ambiguous sensory evidence (i.e., two equally massive objects), we experience the object we expected to be lighter as heavier. Can the insight from this illusion be used to create an impossible perceptual experience? In three experiments, subjects were shown three visually identical boxes in a stack, and were asked to compare the weight of all three boxes lifted together vs. the top box lifted alone. Unbeknownst to them, the top box contained 250g of copper, while the other two boxes were empty. Which felt heavier? As in the classic size-weight illusion, the single top box felt heavier than all three combined—no matter whether the subjects hefted the boxes themselves (Exp.1), had them placed on their hands (Exp.2), or lifted them with strings rather than grasping the boxes directly (Exp.3). However, this outcome is impossible: A subset (box A alone) could never weigh more than its superset (boxes A, B, and C together). Evidently, the mind tolerates not only improbable, but also impossible, integration of information across modalities—and in a way one can feel for oneself.
How similar is the human visual system to the sophisticated machine-learning systems that mirror its performance? Models of object categorization based on convolutional neural networks (CNNs) have achieved human-level benchmarks in labeling novel images. These advances not only support new technologies, but may also serve as candidate models for human vision itself. However, unlike human vision, CNNs can be “fooled” by adversarial examples — carefully crafted images that appear as nonsense patterns to humans but are recognized as familiar objects by machines, or that appear as one object to humans and a different object to machines. This extreme divergence between human and machine classification challenges the promise of these new advances, both as applied image-recognition systems and as models of human vision. Surprisingly, however, little work has empirically investigated human classification of adversarial stimuli; do humans and machines fundamentally diverge? Here, we show that human and machine classification of adversarial stimuli are robustly related. We introduce a “machine-theory-of-mind” task in which observers are shown adversarial images and must anticipate the machine’s label from a set of various alternatives. Across eight experiments on five prominent and diverse adversarial imagesets, human subjects reliably identified the machine’s preferred labels over relevant foils. We observed this result not only in forced-choice settings between two candidate labels, but also when subjects freely chose among dozens of possible labels. Moreover, this pattern persisted for images with strong antecedent identities (e.g., an orange adversarially perturbed into a “power drill”), and even for images described in the literature as “totally unrecognizable to human eyes” (e.g., unsegmented patterns of colorful pixels that are classified as an “armadillo”). We suggest that human intuition may be a more reliable guide to machine (mis)classification than has typically been imagined, and we explore the consequences of this result for minds and machines alike.
When assembling furniture or completing a jigsaw puzzle, we appreciate not only the particular shapes of individual objects, but also their potential to *combine* into new objects. How does the mind extract this property? In 5 experiments inspired by Tetris, subjects had to respond to a particular target within a stream of “tetrominoes”; however, subjects false-alarmed more often to pairs of tetrominoes that could create their target than to tetromino-pairs that couldn’t—essentially confusing ‘potential’ objects for real ones. We suggest that the mind automatically represents not only what objects *are*, but also what they *could become*.
Human vision is increasingly well-approximated by cutting-edge Convolutional Neural Networks. However, such models are “fooled” by so-called adversarial examples — carefully-crafted images that appear as nonsense to humans but as objects to CNNs. Surprisingly, however, little work has investigated human performance on such stimuli; could humans “crack” adversarial images by predicting the machine’s classifications? In four experiments on three prominent adversarial imagesets, subjects reliably identified the machine’s chosen label over relevant foils — even for images previously considered “totally unrecognizable to human eyes”. Computer object-representation may resemble a human’s more than recent challenges suggest.
[pdf]
How do prior assumptions about uncertain data inform our inferences about those data? Increasingly, such inferences are thought to work in the mind the way they *should* work in principle — with our interpretations of uncertain evidence being nudged toward our prior hypotheses in a “rational” manner approximating Bayesian inference. This approach has taken the mind and brain sciences by storm, being successfully applied to perception, learning, memory, decision-making, language, and development — leading psychologists, neuroscientists, and philosophers to argue that “humans act as rational Bayesian estimators” (Clark, 2013) and that we have a fundamentally “Bayesian brain” (Knill & Pouget, 2004).

Do any mental phenomena resist such a rational analysis? Whereas some researchers have suggested so by pointing to cases where people reason poorly about various kinds of evidence (as in, e.g., base-rate neglect or the conjunction fallacy), we focus here on a more specific — and perhaps more puzzling — sort of interaction between prior hypotheses and new evidence. In particular, whereas inferences about new data are typically *attracted toward* prior expectations, we show here how inferences may also be *repelled away* from prior expectations, in seeming defiance of normative statistical inference. We do this both by reporting new experiments that investigate these phenomena, and also by reevaluating previously under-emphasized findings. We call such inferences “antirational” (to distinguish them from mere *irrationality*), because they appear to proceed exactly opposite the recommendation of a rational analysis. We conclude by discussing the consequences of such phenomena for foundational issues in philosophy and psychology.

What is this class of phenomena? Consider the classic *size-weight illusion* (Charpentier, 1891), wherein subjects are shown a large object and a small object that are in fact objectively equal in mass, and the subject is asked to lift them both up. Which object should feel heavier? The straightforward “Bayesian” prediction is that the *larger* object should feel heavier, since the ambiguous evidence (two objects giving approximately equal resistance) should be resolved in favor of the strong prior (that larger = heavier). However, the surprising result of the size-weight illusion, replicated hundreds of times over the last century, is that the *smaller* object feels heavier! For this reason, the size-weight illusion is sometimes considered a “problem case” for larger-scale theories of a Bayesian mind/brain (Clark, 2013).

At the same time, this single illusion is a somewhat ‘fringe’ phenomenon, involving many factors that are not poorly understood in their own right. Our goal is thus to demonstrate that the very same logic that makes the size-weight illusion so puzzling is actually highly generalizable, and can be exploited to produce other kinds of antirational updating, including in other areas of cognitive science where it may be easier rule out alternative explanations (cf. Peters et al., 2016).

We demonstrated this by studying the perception of numerosity. Across nine experiments, subjects briefly saw arrays of two spatially intermixed sets of objects (e.g. several dozen squares and circles). Over the course of the session, subjects learned that one set was typically more numerous than the other — for example, that there are typically more squares than circles. Surprisingly, however, subjects who were then shown an *equal* number of squares and circles on a subsequent trial (such that it was unclear exactly which had more) judged the *circles* to be more numerous. In other words, just as in the size-weight illusion, subjects adjusted their inferences *away* from their prior hypotheses about what they would see — seemingly doing exactly the opposite of what a rational model would dictate.

Follow-up experiments (1) generalized this phenomenon to many other kinds of stimuli, including not only shapes but also colors (blue vs. yellow dots) and configurally-defined letters (Ts vs. Ls); (2) ruled out low-level sensory adaptation, since the effects also occur (a) even when various sensory dimensions are equated; and (b) even at very short exposure durations (100ms) and very long intertrial intervals (1 minute) — conditions that do not reliably produce adaptation in other contexts; (3) extended the effect beyond one specific judgment made by subjects, since the results obtain both with two-alternative forced-choice (“which has more?”) and precise enumeration (“how many are there?”).

This work points to a new and general sort of phenomenon in the mind: A kind of contrast effect between hypotheses and evidence that consists in adjusting away from our priors. The existence of such an “antirational” class of mental phenomena is both a discovery to be explained by cognitive science, and a challenge to notions of mental processes as rational inferences championed by psychologists and philosophers alike.

How do prior assumptions about uncertain data inform our inferences about those data? Increasingly, such inferences are thought to work in the mind the way they should work in principle — with our interpretations of uncertain evidence being nudged towards our prior hypotheses in a “rational” manner approximating Bayesian inference. By contrast, here we explore a class of phenomena that appear to defy such normative principles of inference: Whereas inferences about new data are typically attracted toward prior expectations, we demonstrate how inferences may also be repelled away from prior expectations. In seven experiments, subjects briefly saw arrays of two spatially intermixed sets of objects (e.g. several dozen squares and circles). Over the course of the session, subjects learned that one set was typically more numerous than the other — for example, that there are typically more squares than circles. Surprisingly, upon forming the expectation that they would continue to see more squares, subjects who were then shown an equal number of squares and circles (such that it was unclear exactly which had more) judged the circles to be more numerous, seemingly adjusting their inferences away from their prior hypothesis about what they would see. Six follow-up experiments show how this effect is not explained by low-level sensory adaptation (occurring even when various sensory dimensions are equated), generalizes to many kinds of stimuli (including colors, and configurally-defined letters), and is robust to different measures (not only forced-choice [“which has more?”] but also precise enumeration [“how many are there?”]). We discuss how this “expectation contrast” effect is a genuine case of adjusting “away” from our priors, in seeming defiance of normative principles of inference. We also point to a broader class of phenomena that may behave in this way, and explore their consequences for Bayesian models of perception and cognition.
Some properties of objects are intrinsic to the objects themselves, whereas other properties encompass that object’s relationship to other objects or events in a scene. For example, when completing a jigsaw puzzle, we might notice not only the singular properties of an individual piece (e.g., its particular shape), but also its relationship to other pieces — including its ability to combine with another piece to form a new object. Here, we explore how the visual system represents the potential for two discrete objects to create something new. Our experiments were inspired by the puzzle game Tetris, in which players combine various shapes to build larger composite objects. Subjects saw a stream of images presented individually, and simply had to respond whenever they saw a certain target image (such as a complete square), and not at any other time. The stream also included distractor images consisting of object-pairs (shaped like the “tetrominoes” of Tetris) that either could or could not combine to produce the subject’s target. Accuracy was very high, but subjects occasionally false-alarmed to the distractor images. Remarkably, subjects were more likely to false-alarm to tetromino-pairs that could create their target than to tetromino-pairs that could not, even though both kinds of images were visually dissimilar to the target. We also observed a priming effect, whereby target responses were faster when the previous trial showed tetrominoes that could create the target vs. tetrominoes that could not. Follow-up experiments revealed that these effects were not simply due to a general response bias favoring matching shapes, nor were the results explained simply by representational momentum due to perceived “gravity” (since the effects generalized to 90-degree rotations of the tetromino-pair images). These results suggest that the mind automatically and rapidly evaluates discrete objects for their potential to combine into something new.
We can readily appreciate whether a tower of blocks will topple or a stack of dishes will collapse. How? Recent work suggests that such physical properties of scenes are extracted rapidly and efficiently as part of automatic visual processing (Firestone & Scholl, VSS2016, VSS2017). However, physical reasoning can also operate in ways that seemingly differ from visual processing. For example, subjects who are explicitly told that some blocks within a tower are heavier than others can rapidly update their judgments of that tower’s stability (Battaglia et al., 2013); by contrast, automatic visual processing is typically resistant to such explicit higher-level influence (Firestone & Scholl, 2016). Here, we resolve this apparent conflict by revealing how distinct flexible and inflexible processes support physical understanding. We showed subjects towers with differently-colored blocks, where one color indicated a 10x-increase in mass. Subjects successfully incorporated this information into their judgments of stability, accurately identifying which towers would stand or fall by moving their cursors to corresponding buttons. However, analyses of these cursor trajectories revealed that some towers were processed differently than others. Specifically, towers that were “stable” but that would have been unstable had the blocks been equally heavy (i.e. towers with unstable geometries) yielded meandering cursor trajectories that drifted toward the incorrect stability judgment (“fall”) before eventually arriving at the correct judgment (“stand”). By contrast, towers that were “stable” both in terms of their differentially heavy blocks and in terms of their superficial geometries produced considerably less drift. In other words, even when subjects accurately understood how a tower would behave given new information about mass, their behaviors revealed an influence of more basic visual (geometric) cues to stability. We suggest that physical understanding may not be a single process, but rather one involving separable stages: a fast, reflexive, “perceptual” stage, and a slower, flexible “cognitive” stage.
A notoriously tricky “bar bet” proceeds as follows: One patron wagers another that the distance around the rim of a standard pint glass is about twice the glass’s height. Surprisingly, this patron is usually correct, owing to a powerful (but, to our knowledge, unexplained) visual illusion wherein we severely underestimate the circumferences of circles. Here, we characterize this illusion and test an explanation of it: We suggest that the difficulty in properly estimating the perimeters of circles and other shapes stems in part from the visual system’s representation of such shapes as closed objects, rather than as open contours which might be easier to ‘mentally unravel’. Subjects who saw circles of various sizes and adjusted a line to match the circles’ circumferences greatly underestimated circumference — initially by a magnitude of over 35%. (Care was taken to exclude subjects who conflated circumference with diameter.) Estimates for these closed circles were then compared to estimates of the perimeter of a circle that was missing a continuous 18-degree segment of arc. We predicted that removing a portion of the circle’s perimeter would, paradoxically, cause the circle’s perimeter to appear longer, since this violation of closure would bias the visual system to process the stimulus as an open contour. Results revealed that, indeed, this manipulation very reliably reduced the magnitude of this “pint glass illusion” by as much as 30%, such that a circle missing a portion of its circumference was judged to have a greater perimeter than a complete, closed circle of the same diameter. We suggest that the property of closure not only influences whether a stimulus is processed as an object, but also constrains how easily such a stimulus can be manipulated in the mind.
Objects in the world frequently strike us as being complex (and informationally rich), or simple (and informationally sparse). For example, a crenulate and richly-organized leaf might look more complex than a plain stone. What is the nature of our experience of complexity — and why do we have this experience in the first place? We algorithmically generated hundreds of smoothed-edge shapes, and determined their complexity by computing the cumulative surprisal of their internal skeletal structure — essentially quantifying the amount of information in the object. Subjects then completed a visual search task in which a single complex target appeared among identical simple distractors, or a single simple target appeared among identical complex distractors. Not only was search for complex targets highly efficient (8ms/item), but it also exhibited a search asymmetry: a complex target among simple distractors was found faster than a simple target among complex distractors — suggesting that visual complexity is extracted ‘preattentively’. (These results held over and above low-level properties that may correlate with complexity, including area, number of sides, spatial frequency, angular magnitudes, etc.). Next, we explored the function of complexity; why do we experience simplicity and complexity in the first place? We investigated the possibility that visual complexity is an attention-grabbing signal indicating that a stimulus contains something worth learning. Subjects who had to memorize and later recall serially presented objects recalled complex objects better than simple objects — but only when such objects appeared within a set of other objects, and not when they were presented one-at-a-time (suggesting that the effect is not driven simply by increased distinguishability of complex shapes). We suggest not only that object complexity is extracted efficiently and preattentively, but also that complexity arouses a kind of 'visual curiosity' about objects that improves subsequent learning and memory.
Does a gray banana look yellow? Does a heart look redder than a square? A line of research stretching back nearly a century suggests that knowing an object’s canonical color can alter its visual appearance. Are such effects truly perceptual, or might they instead reflect biased responses without altering online color perception? Here, we replicate such classical and contemporary “memory-color effects”, but then extend them to include conditions with counterintuitive hypotheses that would be difficult for subjects to grasp; across multiple case studies, we find that such conditions eliminate or even reverse memory-color effects in ways unaccounted-for by their underlying theories. We first replicated the classic finding that hearts are judged as redder than squares, as measured by matching a color-adjustable background to a central stimulus. But when we varied the shape of the background itself (to be either square or heart-shaped), subjects who estimated a square’s color by adjusting a heart-shaped background made the background redder than when adjusting a square-shaped background — whereas a memory-color theory would predict the opposite pattern. Next, we successfully replicated the more recent finding that gray disks and blueish bananas are judged as more purely gray than are gray bananas (which purportedly appear yellow); however, we also found that a blueish disk is judged to be more gray than a blueish banana, exactly opposite the prediction of memory-color theories. Moreover, when asked to identify the “odd color out” from an array of three objects (e.g., gray disk, gray banana, and blueish banana) subjects easily identified the blueish banana as the odd color out, even though memory-color theories predict that subjects should pick the gray banana. We suggest that memory color effects may not be truly perceptual, and we discuss the utility of this general approach for separating perception from cognition.
Is working memory simply the reactivation of perceptual representations? Decoding experiments with fMRI suggest that perceptual areas maintain information about what we have seen in working memory. But is this activity the basis of visual working memory itself? If it is, then perceptual interference during maintenance should impair our ability to remember. We tested this prediction by measuring visual working memory performance with and without interfering mask gratings, presented during the memory delay at the same location as the to-be-remembered stimulus. Participants memorized the orientations of 1-4 sample gratings, which appeared for 800ms. After a 5-second pause, the participants were exposed to a target grating in the same location as one of the sample gratings, and the target grating was rotated either clockwise or counterclockwise relative to the original. The task was to identify the direction of change. The key manipulation was that during the 5-second maintenance period, participants were exposed either to a blank screen, or to a rapidly changing stream of mask gratings in each of the previously occupied positions. We reasoned that if visual working memory relies on early perceptual substrates then exposure to conflicting masks that putatively activate the same substrates should impair performance (relative to no-mask trials). In other words, there should be interference, between the rapidly changing perceptual inputs and the perceptually maintained memory representations at the same retinal location. Contrary to this prediction, there was no difference in performance between the masked and unmasked conditions. We did, however, observe significantly reduced accuracy as a function of set size (the number of sample gratings in a trial). This evidence suggests that representations in early perceptual brain regions may not play a functional role in maintaining visual features.