
Perception & Mind Lab Presentations @ VSS 2025
Saturday, May 17
Controlling for everything: Canonical size effects with identical stimuli
Chaz Firestone, Tal Boger
[email protected]
Among the most impressive effects in recent vision science are those associated with “canonical size”. When a building and a rubber duck occupy the same number of pixels on a display, the mind nevertheless encodes the real-world size difference between them. Such encoding occurs automatically, organizes neural representations, and drives higher-order judgments. However, objects that differ in canonical size also differ in many mid- and low-level visual properties; this makes it difficult—and seemingly impossible—to isolate canonical size from its covariates (which are known to produce similar effects on their own). Can this challenge be overcome? Here, we leverage a new technique called “visual anagrams”, which uses diffusion models to generate static images whose interpretations change with image orientation. For example, such an image may look like a rabbit in one orientation and an elephant when upside-down. We created a stimulus set of visual anagrams whose interpretations differed in canonical size; each image depicted a canonically large object in one orientation but a canonically small object when rotated, while being pixel-wise identical in every other respect. Six experiments show that most (though not all) canonical size effects survive such maximal control. Experiments 1–2 tested Stroop effects probing the automaticity of canonical size encoding; consistent with previous findings, subjects were faster to correctly judge the onscreen size of an object when its canonical size was congruent with its onscreen size. Experiments 3–4 tested effects on viewing-size preferences; consistent with previous findings, subjects chose larger views for canonically larger objects. Experiments 5–6 tested efficient visual search when targets differed from distractors in canonical size; departing from previous findings, we found no such search advantage. This work not only applies a long-awaited control to classic experiments on canonical size, but also presents a case study of the usefulness of visual anagrams for vision science.
Talk 22.15: Saturday, May 17, 10:45am - 12:30pm ET, Talk Room 1, Object Recognition: Categories and neural mechanisms
[VSS Link]
Complexity is a cognitive universal: Evidence from cross-modal transfer
Tal Boger, Chaz Firestone
[email protected]
What connects a sharply twisted shape, a many-layered melody, and the multisyllabic string “animipatorun”? These items are unrelated in nearly every aspect; they span different modalities, arise from different domains, and have independent properties. Nevertheless, they seem unified by their *complexity*: Each is informationally dense relative to prototypical stimuli of its kind (cf., a square, major scale, or short string). Does the mind appreciate the complexity these stimuli share, even across dramatically different properties? Here, 4 experiments demonstrate *transfer* across these different stimuli, suggesting that a ‘universal’ representation of complexity exists in the mind. In Experiment 1, participants learned a reward rule for simple and complex shapes; selecting a complex shape was worth more (or less) points than selecting a simple shape. After this learning phase, participants saw new stimuli that also differed in their complexity: two arrays of colored dots, one uniform and the other highly varied. Without any further instruction, subjects transferred the reward rule to the dots, spontaneously selecting the more complex (or simpler) dot array. Experiment 2 generalized this pattern to audition: Subjects who learned that complex shapes were worth more points spontaneously selected complex melodies. Experiment 3 extended this result even further, finding successful transfer from shapes to letter-strings. In each case, this transfer arose bidirectionally. Finally, Experiment 4 tested the automaticity of such transfer. In a Stroop-like task, two shapes of differing complexity appeared above two letter-strings of differing complexity, and participants judged which shape (or letter-string) was more complex. Though only one stimulus class (either the shapes or the letter-strings) was task-relevant, participants were faster to judge the complexity of the target stimulus when the task-irrelevant stimulus was congruent in complexity. We suggest that visual, auditory, and linguistic complexity are ‘unified’ in the mind, supporting spontaneous and automatic transfer across modalities.
Poster 26.427: Saturday, May 18, 2:45pm - 6:45pm ET, Pavilion, Multisensory Processing: Audiovisual integration
[VSS Link]
Response duration: A ubiquitous implicit measure of confidence
Hanbei Zhou, Rui Zhe Goh, Ian Phillips, Chaz Firestone
[email protected]
Among the most reliable connections between internal mental processing and external behavior is *response time*, with easier, more accurate, and more confident judgments typically made faster. But which aspects of response time are relevant? Whereas psychophysical studies traditionally focus on the time taken to initiate a response, an underexplored measure is the duration of the response itself—not just the amount of time between stimulus onset and keypress (reaction time), but also how long one holds down the key before releasing it (response duration). Response duration is a ubiquitous and freely available data source, yet almost no studies report or analyze it (Pfister et al., 2023). Here, 3 varied experiments demonstrate that response duration reliably predicts subjective confidence, independent of reaction time. In Experiment 1, subjects detected faces within white noise, with difficulty manipulated by varying face opacity. Subjects responded with a keypress (with both keyUp and keyDown events recorded separately), followed by a confidence judgment. Remarkably, subjects held down the response key longer during trials in which they subsequently reported lower confidence, as if making these face-detection judgments in a tentative fashion. The same pattern held in another visual task (judging the coherence of random-dot motion; Experiment 2), and a cognitive task (classifying American cities as geographically Eastern or Western; Experiment 3). In all cases, response duration accounted for variance in confidence that was not predicted by reaction time. Response duration has distinct advantages as a measure of confidence: It taps confidence at the time of judgment (rather than retrospectively), it can be used when traditional confidence judgments are difficult to elicit (e.g., in animals or infants), and it may be less affected by biases associated with explicit reports. Our results suggest that response duration is a valuable and untapped source of information, raising many avenues for future investigation.
Talk 25.17: Saturday, May 17, 5:15pm – 7:00pm ET, Talk Room 1, Decision Making
[VSS Link]
Sunday, May 18
Are they only acting? Performing and observing pantomimed actions
Sholei Croom, Chaz Firestone
[email protected]
Ordinary observers can infer the goals of others’ actions, such as where someone is walking or which object they are reaching for. However, actions are more than their goals; underlying visually-guided behavior are complex dynamics between an agent’s body and the environment. What do observers know about such dynamics and the behaviors that emerge from them? Here, we explore this question through “pantomimed actions”, in which people perform actions with imaginary objects. Pantomimed actions differ kinematically from genuine object-directed actions because the absence of visual information disrupts the typical perception-action feedback cycle. If observers can distinguish real actions from pantomimed actions, this would reveal finer-grained intuitions about the dynamics underlying visually-guided action. We created a set of videos in which agents performed object-directed actions involving no physical contact with the objects (e.g., stepping over a box, ducking under an overhang, or weaving between poles). In half of the videos, actors interacted with real boxes, real overhangs, etc.; in the other half, they were instructed to move *as if* interacting with these objects (in both cases, a censor box occluded where the object was or would have been). Then, independent subjects watched these videos and had to determine which was which; which videos showed real actions and which showed pantomimes? Collapsing across all actions, observers discriminated real actions from pantomimes at rates above chance. However, certain actions were more easily discriminable than others; for some actions, actors successfully ‘fooled’ observers into thinking an object was present. Our work supports two conclusions: (1) Observers are sensitive to kinematic differences distinguishing genuine visually-guided actions from their pantomimed counterparts (revealing surprisingly fine-grained intuitions about visuomotor processing); (2) The ability to “fake” actions may be more robust than previously suggested (e.g., findings that actors cannot make a light box seem heavy).
Poster 36.333: Sunday, May 18, 2:45pm – 6:45pm ET, Banyan Breezeway, Face and Body Perception: Body
[VSS Link]
Tuesday, May 20
Who drew this? Children appreciate visual style differently than adults
Shari Liu, Chaz Firestone, Tal Boger
[email protected]
Perception often confronts us with the distinction between *content*—what something is—and *form*—how it appears or is represented. For example, the same letter may appear in different typefaces, the same tool may be made of different materials, and the same body may take on different poses. Perhaps the richest example of this distinction arises in visual art: When viewing a painting, for example, we can discern not only what is depicted (e.g., a mountain or a sunset) but also the *manner* in which it is depicted (e.g., an impressionist sketch or a realistic portrayal). What are the origins of our capacity to distinguish content and form? And how might this capacity change throughout development? Artistic style presents an intuitive way to pit content against form, making it a useful case study for these questions. Here, in 3 experiments, we introduced participants to artists who produced various scenes with distinct contents and styles (e.g., a mountain sketched with crayons vs. a beach rendered as a detailed comic). Participants then saw a critical third scene whose content matched one artist’s drawing but whose style matched the other, and were asked which artist produced this critical scene. Whereas adults attributed the critical scene to an artist based on style (responding, e.g., that the crayon artist produced the new crayon scene, even with differing content; Experiment 1), children aged 4-7 years behaved *oppositely*, attributing based on content (responding, e.g., that the mountain artist produced the mountain scene, even with differing style; Experiment 2). We also replicated this pattern on LookIt, an online platform for collecting developmental data (Experiment 3). This work supports two conclusions: (1) The capacity to distinguish content from form arises early; but (2) the way this capacity is applied shifts throughout development.
Poster 53.349:: Tuesday, May 20, 8:30am - 12:30pm ET, Banyan Breezeway, Perceptual Organization: Aesthetics
[VSS Link]
Irresistibly logical: Disjunctive inferences facilitate visual recognition of likely and unlikely events
Nathaniel Braswell, Chaz Firestone, Nicolò Cesana-Arlotti
[email protected]
Whereas logical inference is typically associated with symbolic notation and laborious proofs, it also arises intuitively in everyday reasoning. Previous work shows that human infants deploy basic disjunctive inferences to infer occluded objects’ identities (Cesana-Arlotti et al., 2018). In two prior studies (Braswell et al., VSS 2023), we discovered that this developmentally basic logical computation arises spontaneously when adults recognize objects in visual scenes. In particular, we showed adults visual events wherein objects are hidden and then revealed in ways that either follow or violate the logically predicted outcomes; subjects responded faster when a revealed object’s identity was consistent with the inference’s prediction than when it violated it. Here, we explore whether such inferences are automatic and even “irresistible”, arising in circumstances where the subject receives statistical evidence contrary to the inference. In Experiment 1, participants had to identify two kinds of concurrent objects: ones logically predicted by the events in the scene and logically unrelated ones. Strikingly, participants recognized objects predicted by the logical inference faster than identically looking objects that were logically unrelated to the scene, suggesting that logical inferences were facilitating and expediting visual processing. In Experiment 2, we manipulated the statistical distribution of revealed objects to create cases where the logically predicted outcome was statistically unlikely (by a ratio of 2:1). Remarkably, participants often misidentified the statistically likely object as the improbable one merely because logic compelled them to do so, despite their predictions being contradicted by previous statistical evidence. In other words, even when it would have benefited participants not to reason logically, they couldn’t help but do so. This work shows how methods from vision science can illuminate the mind's logical capacities. Our findings suggest the presence of core logical inferences that automatically facilitate visual processing and are hard to resist despite prevailing counterevidence.
Poster 56.462:: Tuesday, May 20, 2:45pm - 6:45pm ET, Pavilion, Scene Perception: Spatiotemporal factors
[VSS Link]
The perception of countability: A case study of 'mental affordances'
Lana Milman, Ian Phillips, Chaz Firestone
[email protected]
In addition to physical actions (e.g., climbing a staircase, or grasping an object), we also perform mental actions (e.g., counting objects in our head, or shifting attention). Recent work in philosophy of mind proposes that, just as we can appreciate whether and how easily we can execute various physical actions (physical affordances), we can also do the same for mental actions — appreciating in advance how effectively we will be able to execute a certain cognitive operation before actually carrying it out (the “mental affordance hypothesis”; McClelland, 2020). Here, we explore this hypothesis for counting and its corresponding mental affordance “countability” — i.e., how quickly and accurately an array of objects can be precisely counted. Subjects briefly (500ms or 2500ms) saw two “cookies” (circles) containing within them a number of “M&Ms” (dots). The M&Ms varied in size (large, small, or mixed), color (single or mixed), and opacity (full or partial). Subjects selected whichever cookie seemed easier to count (causing it to reappear onscreen), and then went ahead and counted that cookie’s M&Ms. Results showed that even a 500ms preview was sufficient for subjects to accurately predict many aspects of their own counting performance on a given display, including that larger M&Ms would be easier to count than smaller M&Ms, that opaque M&Ms would be easier to count than semitransparent M&Ms, and so on. However, they also “misperceived” countability: Subjects preferred to count cookies with mixed-size M&Ms over smaller-sized M&Ms, even though counting performance was better on the latter. Our results suggest that naive observers can rapidly form impressions of a mental affordance and use it to guide behavior. Moreover, like physical affordances, we may be imperfectly calibrated to our actual capabilities.
Poster 56.347:: Tuesday, May 20, 2:45pm - 6:45pm ET, Banyan Breezeway, Undergraduate Just-In-Time 2
[VSS Link]
phiVis: Philosophy of Vision Science
A VSS satellite event, organized by Kevin Lande and Chaz Firestone
www.phivis.org
The past decade has seen a resurgence in conversation between vision science and philosophy of perception on questions of fundamental interest to both fields, such as: What do we see? What is seeing for? What is seeing? The phiVIS workshop is a forum for continuing and expanding this interdisciplinary conversation. Short talks by philosophers of perception that engage with the latest research in vision science will be followed by discussion with a slate of vision scientists.
Event: Tuesday, May 20, 1:15pm - 3:15pm ET, Banyan/Citrus Room.
RSVP: in person; online.
[VSS Link]