Perception & Mind Lab Presentations @ VSS 2026



Friday, May 15

 
Gender bias in visual categorization: When DiCaprio is an actor but Jolie is a woman
Rui Zhe Goh, Jiayi Li, Ian Phillips, Chaz Firestone
[email protected]

Objects can be categorized at differing levels of abstraction; for example, the same image may be classified as a blue jay, a bird, an animal, or a living thing. This pattern also arises for people; for example, the same image may be classified as Gordon Ramsay, a chef, a Scot, or a person. But which categories are most visually salient—and is such salience biased by gender? We investigated this question using a visual categorization task. Subjects saw images of objects and people, and were asked simply to name any category corresponding to the image. In addition to objects such as vegetables and buildings, each subject saw one woman and one man, drawn from one of the following classes: 1) famous people from various professions (e.g., Angelina Jolie and Leonardo DiCaprio, Serena Williams and LeBron James); 2) stock photographs of anonymous people from minimally-gendered categories (e.g., tourist, pedestrian); 3) AI-generated images of these same minimally-gendered categories, created to be identical except for gender. Every image was part of a matched pair of men and women from the same image class, but no subject saw a man and woman from the same matched pair. (For example, a subject could see Angelina Jolie and LeBron James, but never Angelina Jolie and Leonardo DiCaprio.) Remarkably, subjects were more likely to categorize women according to their gender than men. Put differently, they were biased to say “actor” for DiCaprio but “woman” for Jolie, and “pedestrian” for male pedestrians but “woman” for female pedestrians. This trend arose for all three image classes, suggesting that it was not driven by idiosyncratic image differences or knowledge of the people depicted. Our results reveal a gender bias in visual categorization that may have pervasive cognitive and social implications.

Poster 16.335: Friday, May 15, 3:45pm – 6:00pm ET, Banyan Breezeway, Face and Body Perception: Social cognition 1

[VSS Link]

Sunday, May 17

 
Can you count this? Perceiving affordances for mental actions
Lana Milman, Ian Phillips, Chaz Firestone
[email protected]

We can often see how easy a physical action is to execute, such as climbing stairs or grasping an object. Is the same true for mental actions? Can we also see how easy it is to execute a cognitive operation, before actually carrying it out? Whereas the perception of physical affordances has a long history in cognitive science, the hypothesis that we perceive “mental affordances” (McClelland, 2020) has received less empirical attention. Here, four experiments explore this hypothesis for the mental affordance “countability”—how easily an array of objects can be precisely counted. In Experiments 1-2, subjects briefly (500ms or 2500ms) saw two “cookies” (circles) containing a number of “M&Ms” (dots), varying in size (large/small/mixed), color (single/mixed), opacity (full/partial), and grouping (clustered/dispersed). Subjects selected whichever cookie seemed easier to count (causing it to reappear onscreen), and then counted that cookie’s M&Ms. In Experiments 3-4, subjects counted a single cookie’s M&Ms, without indicating preference. Results showed that even a 500ms preview was sufficient to form strong preferences for countability, with subjects preferring fully opaque to partially opaque M&Ms, clustered M&Ms to dispersed M&Ms, and large M&Ms over mixed and small M&Ms. However, these impressions were only partially accurate. Subjects performed better with opaque M&Ms, and large as opposed to mixed and small M&Ms, in line with their preferences. But performance was better for dispersed M&Ms than clustered M&Ms—contrary to their preference. Moreover, despite expressing no preference for single over mixed-color M&Ms, subjects performed better with single colors. These results suggest that naïve observers can rapidly form impressions of a mental affordance and use it to guide behavior—but that, just like with physical affordances, we may be imperfectly calibrated to our actual capabilities.

Poster 33.317: Sunday, May 17, 8:30am – 12:30pm ET, Banyan Breezeway, Perceptual Organization: Ensembles

[VSS Link]

 
Phrasal momentum
Chaz Firestone, Tal Boger
[email protected]

The world looks different from one moment to the next, and various mental processes anticipate such changes. A foundational example of this anticipation is representational momentum (RM), wherein the mind ‘plays forward’ visual events. For example, falling objects are recalled as closer to the ground, and rotating rectangles are recalled as more rotated. Though early studies hypothesized that RM primarily concerns the anticipation of motion, recent work demonstrates that it arises for other continuous properties, such as the brightness of a stimulus or even how melted an ice cube appears. Here, we extend representational momentum even further than these sophisticated phenomena, to discrete (rather than continuous) properties, as well as to more conceptual (rather than purely visual) properties of the world. Participants watched animations of common phrases being typed; the animations were interrupted partway through (e.g., “The quick brown fox jum”), and participants used a slider to recall the phrase’s state at the moment it disappeared. 5 experiments revealed ‘phrasal momentum’: The mind plays forward the phrases we see, in ways that distort visual memory. This distortion was greater for real phrases than for (a) strings of “X”s (Experiment 1), (b) length-equated strings of random letters (Experiment 2), and (c) scrambled versions of the words composing the phrases (e.g., “heT kicqu rbnow oxf”; Experiment 3). We also observed a stronger distortion for normal, in-order phrases than for word-shuffled versions of those same phrases (e.g., “The quick brown fox…” vs. “Fox quick the brown…”; Experiment 4). Finally, Experiment 5 found that phrasal momentum follows a conceptual ‘gradient’; in a single experiment combining all the aforementioned conditions, normal phrases produced the strongest RM effects, followed by shuffled phrases, scrambled letters, random letters, and Xs. Thus, representational momentum arises even for discrete and non-visual stimuli.

Poster 36.325: Sunday, May 17, 2:45pm – 6:45pm ET, Banyan Breezeway, Visual Memory: Encoding and retrieval, capacity

[VSS Link]

 
Pantomimed actions recruit intuitive knowledge about visuomotor feedback
Sholei Croom, Chaz Firestone
[email protected]

Visually guided actions arise from a complex synergy between perception and action; when someone grabs a cup to take a drink, for example, the mechanics of their reach are updated online in response to evolving perceptual input. To what extent are ordinary observers aware of this aspect of others’ goal directed behavior? Here, we explored these questions through “pantomimed actions”, in which people perform actions with imaginary objects. We created a stimulus set of videos where agents performed both genuine object-directed actions (e.g., stepping over a box), and pantomimes of those actions (e.g., stepping over an imagined box). We asked both (a) whether naive observers who watch these videos can distinguish real actions from pantomimed actions, and also (b) which kinds of information underwrite this performance. In Experiment 1, subjects watched raw video of real and pantomimed actions side-by-side, with a black ‘censor bar’ covering the real (or imagined) object’s location. Under these conditions, subjects were able to distinguish the two action types at rates above chance; for example, they could tell whether someone was interacting with a real (vs merely imagined) box, and also whether someone was shuffling between two real (vs merely imagined) poles. Moreover, subject text responses reflected rich inferences about which features of the movement should be diagnostic. However, in Experiment 2, subjects viewed the same actions but with body movements instead depicted by simple ‘pose skeletons’ (dots connected by lines on a black background) generated from the original videos. Under these conditions, observer performance dropped to chance, despite the kinematic information being preserved across experiments. Together, these results suggest that ordinary people can relate differences in action kinematics to differences in sensory conditions, but that this capacity must be grounded in contextual information about the actor’s relation to their environment.

Talk 35.24: Sunday, May 17, 6:00pm ET, Talk Room 2, Action

[VSS Link]

Monday, May 18

 
Perceiving animacy in otherwise-identical images
Tal Boger, Chaz Firestone
[email protected]

Even when completely motionless, some objects look animate (e.g., dogs and elephants) while others don’t (e.g., boots and sofas). A rich literature suggests that the mind automatically encodes this property, reporting striking effects of perceived animacy on visual attention and working memory. However, objects that differ in animacy tend to differ in many lower-level features (e.g., shape), and follow-up work has revealed that such confounds often account for these seemingly high-level effects. Thus, whether animacy itself drives visual processing remains unanswered and even controversial. Here, we take a new approach to this question by exploiting “visual anagrams”: static images whose interpretations change radically with orientation. We used a diffusion model to generate such images in ways that varied animacy — e.g., a dog in one orientation and a boot when rotated, or an elephant in one orientation and a sofa when rotated. Each anagram image contains the exact same pixels in either orientation, such that animacy varies across the two interpretations while nearly all lower-level features remain constant. 9 experiments used this approach to demonstrate that animacy itself guides memory and attention. Experiments 1–2 found that changes to an object in a memory array were more detectable when they altered animacy (e.g., dog→sofa is more detectable than dog→elephant, even when the elephant and sofa are the very same image, just rotated). Experiments 3–6 found that animate targets in a search array were easier to find among inanimate distractors than among other animate distractors (and vice versa). Finally, Experiments 6–9 verified that differences in orientation (one of the only lower-level features uncontrolled by visual anagrams) cannot account for these results; the effects disappeared when using silhouetted, blurred, and pixelated versions of the anagrams. Thus, visual processing extracts animacy itself, over and above its covarying lower-level features.

Poster 43.305: Monday, May 18, 8:30am – 12:30pm ET, Banyan Breezeway, Object recognition: Categories

[VSS Link]

Tuesday, May 19

 
Introspecting visual biases
Noa Perlmutter, Chaz Firestone, Ian Phillips
[email protected]

We are aware of the world and its properties; for example, we can see the number, size, and motion of objects around us. Are we also aware of the internal processes that generate these percepts? Whereas it is controversial whether higher-level cognitive processes are accessible to introspection (e.g., knowing why we make various choices), basic perceptual biases are widely assumed to be closed off from introspection. In contrast to this consensus, here we reveal successful introspection of three classic effects in vision and visual memory: numeric underestimation, size contrast, and representational momentum. Experiment 1 investigated numeric underestimation: Subjects briefly saw an array of 11-100 dots, and estimated their numerosity; on a subset of trials, subjects were then asked whether they thought they had underestimated or overestimated. Experiment 2 investigated the Ebbinghaus illusion: Subjects saw a target circle surrounded by flankers of variable size, and adjusted a second circle to match the target; subjects were then asked whether they thought they had underestimated or overestimated. Experiment 3 investigated representational momentum: Subjects saw a fish glide across the screen and disappear, and estimated its final seen location; subjects were then asked whether they thought their estimate was too far left or right. We replicated all three biases: Subjects underestimated numerosity, were influenced by flankers, and extrapolated motion. Strikingly, however, all three experiments also revealed awareness of the effects themselves. For example, in Experiment 1, subjects correctly answered that they underestimated numerosity. And in Experiments 2 and 3, they showed trial-by-trial awareness of their biases, as computed by detection-theoretic statistics. These findings go beyond successful metacognition of performance (i.e., knowing whether one is performing well or poorly) to awareness of directional effects themselves, suggesting deeper and subtler access to internal mental processes than traditionally assumed.

Talk 52.24: Tuesday, May 19, 11:30am ET, Talk Room 2, Decision Making

[VSS Link]

 
phiVis: Philosophy of Vision Science
A VSS satellite event, organized by Kevin Lande and Chaz Firestone
www.phivis.org

The past decade has seen a resurgence in conversation between vision science and philosophy of perception on questions of fundamental interest to both fields, such as: What do we see? What is seeing for? What is seeing? The phiVIS workshop is a forum for continuing and expanding this interdisciplinary conversation. Short talks by philosophers of perception that engage with the latest research in vision science will be followed by discussion with a slate of vision scientists.

Event: Tuesday, May 19, 1:00pm – 3:00pm ET, Banyan/Citrus Room.

[phiVis Link]