Perception & Mind Lab Presentations @ VSS 2022

Friday, May 13

Symposium | Beyond objects and features: High-level relations in visual perception
Alon Hafri, Chaz Firestone

A typical VSS program devotes sections to low-level properties such as motion, orientation, and location; higher-level properties such as faces, scenes, and materials; and core visual processes such as working memory and attention. Yet a notable absence among these are relational representations: properties holding *between* elements, beyond any properties each element has on its own. For example, beyond perceiving red apples and glass bowls, we may also see apples contained inside bowls; beyond perceiving an object and its motion, we may see it collide with another object; and beyond perceiving two agents, we may also see them socially interact. The aim of this symposium is to showcase work that investigates relational representation using the methods and tools of vision science, including classic paradigms from visual cognition, modern neuroimaging techniques, and state-of-the-art computational modeling. A central theme is that fully understanding the nature of visual perception — including core processes such as object and scene representation, visual attention, and working memory — requires a consideration of how visual elements relate to one another. First, Alon Hafri and Chaz Firestone will provide an overview of the "relational landscape". They will delineate criteria for determining whether a relational property is perceived rather than merely judged or inferred, and they will discuss several case studies exemplifying this framework. Second, Melissa Võ will discuss her work on "scene grammar", whereby the mind represents natural environments in terms of the typical composition of their objects (e.g., soap generally appears on sinks). Võ suggests that certain clusters of objects (especially "anchor objects") guide visual search, object perception, and memory. Third, Liuba Papeo will present her work on social relations (e.g., when two agents approach, argue, or fight). Papeo shows that the visual system identifies social relations through a prototypical "social template", and she explores the ways such representations generalize across visual contexts. Fourth, Daniel Kaiser will extend the discussion from objects to scene structure. Using neuroimaging evidence, he shows that natural scene processing is fundamentally relational: when configural relations between scene parts are disrupted, there are downstream consequences for scene and object processing. Finally, Hongjing Lu and Phil Kellman will discuss the computational machinery necessary to achieve relational representations. Although deep-learning models achieve remarkable success at many vision tasks, Lu and Kellman present modeling evidence arguing that abstract structure is necessary for representing visual relations in ways that go beyond mere pattern classification. Overall, this work explores how relational structure plays a crucial role in how we see the world around us, and raises important questions for future vision science research. David Marr famously defined vision as the capacity to "know what is where by looking" — to represent objects and their features, located somewhere in space. The work showcased here adds an exciting dimension to this capacity: not only what and where, but "how" visual elements are configured in their physical and social environment.

Talk: Friday, May 13, 12:00pm - 2:00pm ET, Talk Room 1

[VSS Link] [Paper]

Saturday, May 14

Attentional prioritization by absent parts
Jorge Morales, Chaz Firestone

Stimuli attract attention when they appear suddenly, when they differ from other stimuli, or when they otherwise become salient. Can absent stimuli attract attention too? The missing tooth in a child’s smile, or the missing wheel of a bicycle, often seem salient and noticeable. Previously, we obtained initial evidence that visual processing privileges absent objects. Here, we dramatically expand on this work by exploring the timecourse and mechanism of absence-enhanced visual processing. In Experiment 1, subjects saw line-drawings of objects missing a part (e.g., a butterfly missing a wing). After onset (150ms), a probe appeared on the object or in empty space, and subjects classified it accordingly. Crucially, empty-space probes appeared on an “absent” part (i.e., empty space where missing parts “should” have been) or on true empty space. Even at such short probe-onset asynchronies, subjects classified “absent” probes faster than true empty probes, suggesting attentional prioritization of absent parts. In Experiment 2, stimuli appeared inside a bounding box in two conditions: (1) with enough space for the missing part to have fit had it been there (Room); and (2) with stimuli slightly skewed such that the missing part could not have fit (No Room). Subjects classified “absent” probes slower than object probes in No-Room compared to Room—as if the bounding box interfered with absence-guided attentional enhancement. In Experiment 3, stimulus and probe appeared simultaneously, massively reducing the previously observed interference; this ruled out that the interference was produced by the crowded position of the stimuli near the edge of the box. Together, these results suggest that absence-enhanced attention is fast and automatic—unlikely to be driven by voluntary mental imagery—and an active process that can be disrupted via spatial interference. Not only can present objects attract attention, but absences are salient too.

Poster: Saturday, May 14, 8:30am - 12:30pm ET, Pavilion, Attention: Reward, Capture

[VSS Link]

Sequential construction of visual relations
Alon Hafri, Zekun Sun, Chaz Firestone

You return to your locked-up bicycle and immediately notice that the front wheel is missing. (Oh no! It must have been stolen.) As you stare at your incomplete frame, you have a visceral sense of the wheel's *absence*; there isn't just empty space where the wheel should be — there is a missing wheel. What is the nature of this experience? Whereas we typically think of perception and attention as being directed toward (present) objects, here we explore attention to missing or absent parts. Six experiments show that regions of space with missing parts ("absent space") are processed differently than more ordinary empty regions ("empty space"). Subjects saw line drawings of objects missing a part (e.g. a bicycle missing a wheel, a butterfly missing a wing, a jacket missing a sleeve, etc.), and then judged whether a probe appeared on the object or not. Intriguingly, when non-object probes appeared in absent space (e.g. where the front wheel should have been), subjects classified them faster than when probes appeared in empty space (e.g. next to the bicycle). We found this effect with spatially adjacent probes (E1), probes distributed around the stimulus (E2), and when subjects had to discriminate the probe’s color instead of its position (E3 & E4), suggesting that "absent" space attracts attention automatically and efficiently. In contrast, no reaction-time difference was found with scrambled images (destroying the appearance of absence), even though the images' low-level features and the probes' relative positions were preserved (E5). Finally, the absent-part attentional benefit was lost when stimuli were placed closer to the border of a bounding box to create the impression that the absent part couldn’t "fit" (E6). We conclude that, despite not being "objects" at all, absences are prioritized over otherwise identical empty spaces by mechanisms of perception and attention.

Poster: Saturday, May 14, 8:30am – 12:30pm ET, Banyan Breezeway, Scene Perception: Spatiotemporal Statistics

[VSS Link]

Monday, May 16

Intuiting machine failures
Makaela Nartker, Zhenglong Zhou, Chaz Firestone

A key ingredient of effective collaborations is knowing the strengths and weaknesses of one’s collaborators. But what if one’s collaborator is a machine-classification system, of the sort increasingly appearing in semi-autonomous vehicles, radiologists’ offices, and other contexts in which visual processing is automated? Such systems have both remarkable strengths (including superhuman performance at certain classification tasks) and striking weaknesses (including susceptibility to bizarre misclassifications); can naive subjects intuit when such systems will succeed or fail? Here, five experiments (N=900) investigate whether humans can anticipate when machines will misclassify natural images. E1 showed subjects two natural images: one which reliably elicits misclassifications from multiple state-of-the-art Convolutional Neural Networks, and another image which reliably elicits correct classifications. We found that subjects could predict which image was misclassified on a majority of trials. E2 and E3 showed that subjects are sensitive to the nature of such misclassifications; subjects’ performance was better when they were told what the misclassification was (but not which image received it), and worse when the label shown was from another, randomly-chosen category. Crucially, in E4, we asked subjects to either (a) choose which image they thought was misclassified, or (b) choose the image that is the worst example of its category. While both instructions resulted in subjects choosing misclassified images above chance, subjects who were instructed to identify misclassifications performed better. In other words, humans appreciate the cues that mislead machines, beyond simply considering the prototypicality of an image. Lastly, E5 explored more naturalistic settings. Here, instead of an NAFC choice, subjects identified potential misclassifications from a stream of individual images; even in this setting, subjects were remarkably successful in anticipating machine errors. Thus, humans can anticipate when and how their machine collaborators succeed and fail, a skill that may be of great value to human-machine teams.

Talk: Monday, May 16, 9:30am ET, Talk Room 1, Object Recognition: Models, Reading

[VSS Link]

Attention to fire
Caroline Myers, Chaz Firestone, Justin Halberda

Fire is a natural phenomenon of tremendous evolutionary significance, from cooking to tool-making to war and warmth. Might our minds be tuned to the features that characterize this important visual stimulus? Here, we investigated whether and how fire selectively guides visual attention. In a visual search task, adult observers viewed a display containing a variable number of burning fires placed randomly along an imaginary circle, and reported whether a target — one fire burning differently from the others — was present or absent. Stimuli were created from high-resolution video of fires burning (a) normally or (b) in reverse (achieved by playing the video frames backward). Remarkably, subjects were both faster and more accurate to detect a reverse-burning fire among normally-burning fires than a normally-burning fire among reverse-burning fires — a classic “search asymmetry” indicating privileged visual processing of the relevant features. This initial effect was further explored in multiple follow-up experiments. First, this search asymmetry persisted both with in-phase and out-of-phase burning; in other words, subjects were faster to find a reverse-burning fire among normal-burning fires both (a) when the distractor fires all burned in unison, and also (b) when they burned out-of-sync, thus forming a more variable search array. Second, and crucially, we tested the specificity with which the visual properties of fire selectively guide attention, by rotating all of the fires 90°; this simple manipulation completely eliminated the search asymmetry observed earlier, suggesting that these privileged patterns of attention arise only for fire stimuli presented under naturalistic conditions. This experiment also served as a control for the lower-level visual properties of fire, which were equated across conditions. Collectively, these findings raise the intriguing possibility that fire is privileged by visual attention, echoing the evolutionary significance of this vital stimulus.

Poster: Monday, May 16, 8:30am – 12:30pm ET, Pavilion, Attention: Search and Salience

[VSS Link]

Looking tight: Visual judgments of knot strength reveal the limits of physical scene understanding
Sholei Croom, Chaz Firestone

While early studies cataloged striking errors of physical scene understanding in naive observers, a resurgence of research has instead revealed a remarkable ability to intuit how physical scenes will unfold. Properties such as stability, mass, gravity and kinematics may all elicit accurate intuitions, especially when presented in naturalistic settings. A leading interpretation of these findings is that physical scene understanding recruits a general-purpose “physics engine”. Under this theory, the mind simulates how a scene will unfold by modeling or approximating its physics, and only fails for physics problems that are contrived or presented without context. But might there be tasks that persistently strain physical reasoning, even when presented naturalistically and in realistic contexts? Here, 5 experiments explore this question by evaluating intuitions about the strength of simple knots. Knots are naturalistic stimuli that are ubiquitous across cultures and time periods, and in many contexts evaluating them correctly can spell the difference between safety and peril. Despite this, here we show that naive observers fail to discern even striking differences in strength between knots. In a series of two-alternative forced-choice tasks, observers viewed simple “bends” (i.e., knots joining two lengths of string) and decided which would require more force to undo. Though the relative strength of these bends is well-documented, observers’ judgments completely failed to reflect these distinctions—including in naturalistic photographs (E1), idealized renderings (E2), dynamic videos (E3), and even when accompanied by schematic diagrams of the knots’ structures (E4). Moreover, these failures persisted despite observers demonstrating visual understanding of the topological differences between the knots (E5); in other words, even when observers knew exactly what kind of knot they were viewing, they failed to extract or predict its physical behavior. These results expose a blindspot in physical reasoning, posing a new kind of challenge to general-purpose theories of scene understanding.

Poster: Monday, May 16, 8:30am – 12:30pm ET, Pavilion, Perception and Action: Affordances

[VSS Link]

phiVis: Philosophy of Vision Science
A VSS satellite event, organized by Kevin Lande and Chaz Firestone

The past decade has seen a resurgence in conversation between vision science and philosophy of perception on questions of fundamental interest to both fields, such as: What do we see? What is seeing for? What is seeing? The phiVIS workshop is a forum for continuing and expanding this interdisciplinary conversation. Short talks by philosophers of perception that engage with the latest research in vision science will be followed by discussion with a slate of vision scientists.

Conversations between philosophers of vision and vision scientists have enriched research programs in both fields. On the one hand, the latest generation of philosophers of vision are deeply immersed in the scientific literatures on natural scene statistics, visual short-term memory, ensemble perception, contour integration, amodal completion, visual salience, multi-sensory integration, visual adaptation, and much else. On the other hand, vision scientists have found a great deal of value in responding to and thinking together with philosophers about the mechanisms and effects of perceptual constancies, attentional selection, object perception, and perceptual uncertainty, to name just a handful of topics. These conversations are not only intrinsically interesting for everyone involved, they have been fruitful sources of research and collaboration. However, opportunities for dialogue are all too rare, often occurring only through chance interactions or one-off workshops. The phiVis satellite is meant to be a platform to extend these discussions. Our first event took place at the 2021 V-VVS and drew nearly 300 attendees. Join us this year, in person, for phiVis 2!

Event: Monday, May 16, 3:30pm - 5:30pm ET, Blue Heron Room.
Register: in person; online.

[VSS Link]

Tuesday, May 17

How to look unique
Zekun Sun, Qian Yu, Justin Halberda, Chaz Firestone

No two snowflakes are alike — each one is unique, with its own distinct properties. But our visual experience often fails to reflect this: If we view an array of snowflakes, for example, it may be difficult to recognize the distinctness of each individual, even if they truly are unique at bottom. What, then, *does* make an object look unique — i.e., most different from its peers? Here, 5 experiments (N=336) explore a deep connection between visual complexity and the experience of uniqueness. We algorithmically generated random-looking shapes across varying complexity levels, based on their skeletal surprisal. In E1, subjects dragged-and-dropped the shapes to place similar objects near one another and dissimilar objects far from one another. Remarkably, both highly simple and highly complex shapes were placed closer to one another on average, whereas medially-complex shapes showed the greatest overall dispersion; in other words, sets of medially-complex objects were judged as having the most unique members. To explore the nature of this quadratic pattern, four follow-up experiments targeted different stages of cognitive processing. We found this same quadratic pattern in another conceptually-laden task (E2), in which subjects were shown a named object and decided which other objects would have the same name; we found that medially-complex objects were least likely to be given similar names, as compared to highly complex objects and highly simple objects. Intriguingly, however, this quadratic pattern did *not* appear in more rapid perceptual processes, including tests of discrimination (E3), change detection (E4), and visual search (E5) — which all showed linear relationships between complexity and performance. The findings suggest a role for complexity in visual uniqueness that evolves through the hierarchy of cognitive processing, whereby medially-complex objects are seen as the most unique objects in higher-level processes, whereas simple objects are most distinctive in lower-level processes.

Poster: Tuesday, May 17, 8:30am - 12:30pm ET, Banyan Breezeway, Object Recognition: Perceptual Similarity

[VSS Link]

Seeing nothing happening: Moments of absence as perceptual events
Rui Zhe Goh, Ian Phillips, Chaz Firestone

Event representations canonically arise when something is doing something, such as a ball bouncing or a clock chiming. But does perception represent eventhood even when nothing is happening, as in a break in the rain or a moment of silence? Here, by leveraging the fact that event representations distort experienced duration, we demonstrate that perception constructs positive and differentiated representations of absence. Experiment 1 exploited the recent discovery that a single continuous event is perceived as longer than two discrete events having the same objective duration (the “one-is-more illusion”; Yousif & Scholl, 2019). Instead of events with actual objects, our observers viewed periods of absence, during which an eye-catching object (e.g., a drifting UFO) briefly vanished, before reappearing. Remarkably, observers judged a single continuous absence as longer than two discrete absences separated by a brief reappearance — demonstrating that this illusion occurs with absences as well as presences. Experiment 2 generalized this finding to audition. Observers were immersed in ambient noise (e.g., a busy restaurant) and occasionally heard periods of silence. Again, one continuous silence was heard as longer than two equivalent silences. Finally, we asked: Do experiences of absence have a common, purely negative character, or can they be positively represented as *different* from one another? Experiment 3 explored this question by adapting the “oddball illusion”, where a stimulus that breaks a sequence (e.g., a looming circle after several non-looming circles) is seen as longer (Tse et al., 2004). In our study, observers viewed several “standard absences”, wherein one of two circles disappeared for a fixed duration, followed by an “oddball absence”, wherein the other circle disappeared for a variable duration. Observers judged oddball absences as longer than standards, suggesting that oddball absences were perceived differently from standard absences. Thus, moments of absence elicit differentiated, perceptual event representations.

Poster: Tuesday, May 17, 8:30am - 12:30pm ET, Pavilion, Temporal Processing: Timing Perception, Duration

[VSS Link]

Attending to future objects
Chenxiao Guan, Chaz Firestone

In addition to attending to continuous regions of space, we also attend to discrete visual objects. A foundational result in this domain is that attention "spreads" within an object: If we attend to one portion of an object, we can't help but attend to the rest of it, as revealed by facilitated probe detection for other within-object locations. But what can be the objects of object-based attention? In particular, is this process limited only to the here and now, or can we also attend to objects that don't yet exist but merely *could* exist at some future time? Here, we explore how attention spreads not only to other locations within a single, present object, but also to disconnected object-parts that could combine with a presently attended object to create a new object. We designed stimulus triplets consisting of one puzzle-piece-like central shape and two nearby puzzle-piece-like shapes, one of which could neatly combine with the central shape and one of which could not (as determined by the presence of certain protrusions and indentations). Shortly after stimulus onset, two letters appeared, one on the central shape and another on one of the two smaller parts: either the "combinable" piece or the "non-combinable" piece. Subjects simply decided whether the two letters were the same or different. We found that subjects were faster to evaluate letter-similarity when the two letters appeared on shapes that could combine into one, rather than on two shapes that could not — even without any differences in accuracy (ruling out a speed-accuracy tradeoff). Follow-up experiments ruled out mere similarity as a driver of this effect, isolating combinability per se. We suggest that attention can select not only actual objects that are present now, but also "possible" objects that may be present in the future.

Talk: Tuesday, May 17, 11:45am ET, Talk Room 1, Attention: Features, Objects, Endogenous

[VSS Link]

Automatic simulation of unseen physical events
Tal Boger, Chaz Firestone

When a feature (e.g., the letter T) becomes associated with an object (e.g., a square), we are faster to detect that feature if it later appears on that same object than if it appears elsewhere, even after the object changes location—a foundational result known as the object-specific preview benefit (OSPB). But what if that feature is a physical entity with mass and extent (e.g., a ping-pong ball with a T printed on it), and the object is a container (e.g., a wooden box) in which the ball could slide and roll? Do we (merely) create a static association between the feature and the object? Or do we represent the feature’s location “within” the object, even after the feature disappears from view? Here, we exploit the OSPB to explore how attention automatically represents the physical dynamics of unseen objects. Observers viewed a letter drop into a box, which then moved to another location before abruptly disappearing, revealing the letter in one of several locations; then, observers reported whether or not it had changed (e.g., remaining a T, or changing from a T to an L). If object-feature bindings rely on simple association, then subjects should be fastest to detect the feature in the same box-relative location where it last appeared. However, our results showed that facilitation was greatest when the feature appeared where it “should” have been, as predicted by physical simulation of how a ball slides and rolls within a moving box (which would leave it in a different box-relative location than it started). Follow-up experiments ruled out lower-level explanations, including biases toward the screen’s center and to the last screen-relative location where the feature was seen. We suggest that perception automatically simulates the forces acting on unseen objects, such that feature-object bindings incorporate complex physical interactions.

Poster: Tuesday, May 17, 2:45pm - 6:45pm ET, Pavilion, Attention: Awareness

[VSS Link]

Visual guessing is anti-Bayesian
Justin Halberda, Caroline Myers, Emily Sanford, Chaz Firestone

When we don’t know what we’ve seen, we often guess. How? One possibility is that guessing is "random", such that observers sample from a uniform distribution across feature values. For example, if shown a colored patch or oriented line, observers who don't know what they saw might randomly select a color or orientation (e.g., lapse parameter). Alternatively, Bayesian accounts contend that guesses are drawn from distributions weighted toward prevalent, high-precision values; on this account, a guessing observer might choose saturated colors, cardinal orientations, or other features with high prior probability and perceptual precision. However, a third possibility is that guessing is anti-Bayesian—e.g., by leveraging strategic metacognition, uncertain observers may select feature values that they *struggle* to perceive, as if reasoning “it must have been tilted, because if it were vertical, I would have noticed it.” These accounts make strikingly different predictions about guessing: (1) random values; (2) biases towards high-precision values; (3) biases toward low-precision values. Here, we show that guesses can derive from the metacognitive strategy, relying on an intuitive estimation of one’s own perceptual capacities. Adult observers performed a visual working memory task in which three oriented arrows simultaneously appeared at isoeccentric locations for 0, 16, 33, 66, or 132ms, before being masked. Afterwards, observers reported the orientation of a single target arrow, indicated by a response cue. On trials where stimuli were presented, observers were significantly above chance at reporting arrow orientation across display times. However, on 0-ms “guess” trials, when stimuli were physically absent, participants' guesses revealed systematic non-constancies. Specifically, observers were less likely to report cardinal orientations (which are represented with higher precision, and are more common in natural environments) than oblique orientations. Both psychophysical and computational modeling results suggest that guessing is both strategic and metacognitive: Guessing reflects the complement of precision.

Poster: Tuesday, May 17, 2:45pm - 6:45pm ET, Pavilion, Visual Memory: Models and Mechanisms

[VSS Link]

Wednesday, May 18

Shape bias at a glance: Comparing human and machine vision on equal terms
Katherine Hermann, Chaz Firestone

Recent work has highlighted a seemingly sharp divergence between human and machine vision: whereas people exhibit a shape bias, preferring to classify objects according to their shape (Landau et al. 1988), standard ImageNet-trained CNNs prefer to use texture (Geirhos et al. 2018). However, existing studies have tested people under different conditions from those faced by a feedforward CNN, presenting stimuli long enough for feedback and attentive processes to come online, and using tasks which may bias judgments towards shape. Does this divergence remain when testing conditions are more fairly aligned? In six pre-registered experiments (total N=1064) using brief stimulus presentations (50ms), we asked participants whether a stimulus exactly matched a target image (e.g. a feather-textured bear). Stimuli either matched (a) exactly (the same image), (b) in shape but not texture (“shape lure”, e.g. a pineapple-textured bear), (c) in texture but not shape (“texture lure”, e.g. a feather-textured scooter), or did not match in either shape or texture (“filler”). We tested whether false-alarm rates differed for shape lures versus fillers, for texture lures versus fillers, and for texture lures versus shape lures. This paradigm avoids explicit object categorization and naming, allowing us to test whether a shape bias is already present in perception, regardless of how shape is weighted in subsequent cognitive and linguistic processing. We find that people do rely on shape more than texture, false-alarming significantly more often for shape lures than texture lures. However, although shape-biased, participants are still lured by texture information, false-alarming significantly more often for texture lures than for fillers. These findings are robust to stimulus type (including multiple previously studied stimulus sets) and mask type (pink noise, scramble, no mask), and establish a new benchmark for assessing the extent to which feedforward computer vision models are “humanlike” in their shape bias.

Poster: Wednesday, May 18, 11:15am ET, Talk Room 2, Human Vision and Neural Networks: General considerations

[VSS Link]

V-VSS | June 1-2

Pretending not to see: Pretense behavior reveals the limits of self-simulation
Matan Mazor, Ian Phillips, Chaz Firestone

What we do depends on what we know. But what if we wish to decouple our behavior from our knowledge, by appearing not to know something that we really do? Such pretense behavior relies on counterfactual self-simulation — an understanding of how we would behave if our knowledge were different — and so provides an opportunity to investigate how well people can emulate a hypothetical knowledge state. Here, we examined the ability to both produce and visually detect pretense behaviour, using the game "Battleships". Normally, "Battleships" is played by searching for ships hidden behind cells in a grid. In our studies, we instead showed subjects where all the ships were but asked them to play as if they didn’t see our hints and were ignorant of the ships' locations. Analyzing non-pretend (regular) games of the same subjects, we identify robust behavioural patterns in the location and timing of cell selections, including serial dependencies in hit probability and decision time, and an association between decision time and the entropy of the posterior distribution over candidate actions. Critically, we show that pretend games demonstrate similar, but exaggerated, behavioural patterns. By comparing pretend and non-pretend games to the modeled behaviour of a near-optimal Bayesian agent, we find that pretend behaviour is markedly less optimal than non-pretend behaviour. This is because pretenders often play in ways that don’t make sense given the limited knowledge they pretend to have. However, despite these striking differences, independent "judge" participants were completely unable to discriminate the games of pretenders from non-pretenders. Thus, while pretenders behave in ways that could reveal their pretense to a keen eye, these subtle patterns are not detected by naïve observers. We conclude by discussing the implications of our findings for simulation accounts of theory of mind and metacognition.

Poster: June 1-2, V-VSS

[VSS Link]