Perception & Mind Lab Presentations @ VSS 2024

Saturday, May 18

Sensitivity to highly salient features in dynamic inattentional blindness
Makaela Nartker, Chaz Firestone, Howard Egeth, Ian Phillips
[email protected]

In inattentional blindness (IB), subjects who fail to report unexpected stimuli are typically assumed not to have seen them. Recent work challenges this assumption by showing that inattentionally blind subjects can respond above-chance to stimuli they report not noticing (Nartker et al., 2022), suggesting that inattention may not completely abolish awareness. However, these results have been limited to briefly and peripherally presented static stimuli (e.g., a line appearing on the edge of a display for 200 ms). Does this pattern extend to long-lasting IB involving highly salient dynamic stimuli? Here we report data from a large-scale online study (N>10,000) addressing precisely this question. Subjects were shown a gray rectangular display containing moving white and black squares, and counted how often the white squares bounced off its perimeter (adapted from Wood & Simons, 2017). For some subjects, the third trial included an additional brightly colored and highly salient shape (a circle or triangle that was orange or green), which traversed the height of the display for five full seconds. After this critical trial, subjects were asked the standard IB question: “Did you notice anything unusual on the last trial that wasn’t there on previous trials?” (yes/no) followed by additional questions probing the extra object’s color, location and shape. By including absent trials in which no additional stimulus appeared, we found that subjects were biased to report not noticing (c=0.45, 95% CI=[0.41,0.49]), suggesting greater awareness than revealed by yes/no questioning. Consistent with this interpretation and our previous studies, inattentionally blind subjects could report the color of the unexpected object at above-chance levels (d′=0.12, 95% CI=[0.02,0.23]). Strikingly, these ‘non-noticing’ subjects were also above-chance in discriminating the objects’ shape (d′=0.23, 95% CI=[0.13,0.33]), raising the possibility that even mid- or high-level features survive inattention.

Poster 23.438: Saturday, May 18, 8:30am - 12:30pm ET, Pavilion, Attention: Inattention, attentional blindness, suppression

[VSS Link]

The psychophysics of style
Tal Boger, Chaz Firestone
[email protected]

Images vary not only in content, but also in style. When viewing a Monet painting, for example, we see both the scenery it depicts (lilies dotting a pond) and the manner in which it does so (broken brushstrokes, blended colors, etc.). Parsing images in this way is a remarkable perceptual achievement, akin to separating illumination and reflectance to achieve color constancy, or disentangling letter-identities from typefaces when reading. What is the nature of this process, and what are its psychophysical signatures? Here, 9 experiments reveal 3 new phenomena of style perception. (1) Style tuning. Using neural style-transfer models, we rendered natural scenes in the styles of famous artists. Then, inspired by ‘font tuning’ (wherein text is easier to read in a single typeface than multiple typefaces), we asked observers to scan arrays of images and enumerate all scenes of one type (e.g., mountains). Observers were faster and more accurate in same-style arrays than mixed-style arrays [E1–E2]. Such tuning accumulated over time [E3] and survived controls for color and luminance [E4]. (2) Style discounting. Analogous to ‘discounting the illuminant’ in color constancy, we find that vision ‘discounts’ style. Changes to a scene’s content (e.g., Monet-pond → Monet-building) were more easily detected than changes to its style (Monet-pond → Klimt-pond; E5), even when low-level image statistics predicted the opposite [E6]. (3) Style extrapolation. After viewing items in a given style (e.g., a fork and knife from one cutlery set), observers misremembered seeing additional items from that style (the spoon from that set; E7), even with low-level similarity equated across lures [E8–E9]. Such errors suggest spontaneous representation of the unseen items — as if mentally 'rendering' objects in newly learned styles. While we typically associate style with more qualitative approaches, our work explores how tools from vision research can illuminate its psychological basis.

Poster 26.430: Saturday, May 18, 2:45pm - 6:45pm ET, Pavilion, Object Recognition: Visual preference

[VSS Link]

Sunday, May 19

The psychophysics of compositionality: Relational scene perception occurs in a canonical order
Alon Hafri, Zekun Sun, Chaz Firestone, Alon Hafri
[email protected]

An intriguing proposal in recent literature is that vision is compositional: Just as individual words combine into larger linguistic structures (as when “vase,” “table,” and “on” compose into the phrase “the vase on the table”), many visual representations contain discrete constituents that combine in systematic ways (as when we perceive a vase on a table in terms of the vase, the table, and the relation physical-support). This raises a question: What principles guide the compositional process? In particular, how are such representations composed in time? Here we explore the psychophysics of scene composition, using spatial relations as a case study. Inspired by insights from psycholinguistics, we test the intriguing hypothesis that the mind builds relational representations in a canonical order, such that ‘reference’ objects (those that are large, stable, and/or exert physical ‘control’; e.g., tables)—rather than ‘figure’ objects (e.g., vases resting atop them)—take precedence in forming relational representations. In Experiment 1, participants performed a ‘manual construction’ task, positioning items to compose scenes from sentences (e.g., “the vase is on the table”). As hypothesized, participants placed reference-objects first (e.g., table, then vase). Next, we explored whether this pattern arises in visual processing itself. In Experiment 2, participants were faster to recognize a target scene specified by a sentence when the reference-object (table) appeared before the figure-object (vase) than vice-versa. Notably, this pattern arose regardless of word order (reference- or figure-first) and generalized to different objects and relations. Follow-ups showed that this effect emerges rapidly (within 100ms; Experiment 3), persists in a purely visual task (Experiment 4), and cannot be explained by size or shape differences between objects (Experiment 5). Our findings reveal psychophysical principles underlying visual compositionality: the mind builds relational representations in a canonical order, respecting each element’s role in the relation.

Talk 31.22: Sunday, May 19, 8:30am ET, Talk Room 2, Scene Perception: Behaviour, psychophysics

[VSS Link]

Automatic logical inferences in visual scene processing
Nathaniel Braswell, Chaz Firestone, Nicolò Cesana-Arlotti
[email protected]

The human capacity for logic is responsible for some of our grandest achievements; without it, formal mathematics, economic systems, and architectural marvels would be elusive. Yet logical cognition is not limited to rarefied intellectual challenges—it also arises in everyday contexts, such as inferring that a glass on a table must be yours because your friend is holding theirs. Previous work shows that a primitive logical operation—disjunctive syllogism (p OR q; NOT p; therefore, Q)—is deployed by infants to infer the identities of objects (Cesana-Arlotti et al., 2018). This raises an intriguing question: Do such logical inferences arise automatically in adults, and even impact processing of visual scenes? Experiment 1 showed adults events wherein an ambiguous object was ‘scooped’ by a cup from a two-item set (snake and ball). Upon seeing one of the objects outside the cup (snake), adults responded slower when the revealed object’s identity violated their logical prediction (snake) than when it was consistent (ball). The effect persisted over 40 trials, even though the revealed identity was random—suggesting that adults were executing this inference automatically. Put differently, they ‘couldn’t help’ but infer the hidden object’s identity, even when they knew they shouldn’t. Experiment 2 tested whether this effect resulted from one item’s appearance priming the other. We devised scenes with a third item in the cup, preventing logical inferences about the cup’s contents. A Bayes Factor analysis found strong evidence for the null hypothesis of no response time differences, confirming that logical inference drives the Experiment 1 effect. These findings open avenues in both logical cognition and scene processing. First, our results suggest that logical inferences may be spontaneously deployed to resolve visually uncertain events. Additionally, methods from vision science may serve as a previously unexplored tool for uncovering the nature of our mind's fundamental logical capacities.

Talk 31.26: Sunday, May 19, 9:30am ET, Talk Room 2, Scene Perception: Behaviour, psychophysics

[VSS Link]

When does response duration track performance?
Hanbei Zhou, Rui Zhe Goh, Ian Phillips, Chaz Firestone
[email protected]

A founding insight of psychophysics was to link internal mental processes to the timing of the behaviors they produce. Perhaps the most obvious and well-characterized example is the relationship between performance and response time, as when salient targets are found faster in visual search or when more confident perceptual decisions are made more quickly. But what is “response time”? Whereas nearly all psychophysical studies that measure the timing of behavior focus on the time taken to initiate a response, another potentially relevant magnitude is the duration of the response itself — e.g., not just how long it takes between the appearance of a stimulus and the onset of a keypress, but also how long one holds down the key before letting it go. Recent work makes a theoretical case that response duration may be a neglected source of data about visual processing (Pfister et al., 2023); here, 4 experiments provide empirical support for this proposal. Subjects completed a detection task in which a field of white noise either contained or didn’t contain a face, with difficulty manipulated by varying the face’s opacity. Subjects responded with a keypress (with both keyUp and keyDown events recorded separately). Remarkably, on more difficult trials, subjects not only took longer to initiate a response but also held down the response key for longer, as if answering in a tentative fashion. Response duration also tracked accuracy, with subjects holding down the response key for longer on incorrect as opposed to correct trials. These effects emerged again in a direct replication, but not in follow-up experiments using easier tasks. Overall, our results suggest that response duration may be an untapped source of information about performance — especially in tasks with high uncertainty — raising a wealth of avenues for future investigation.

Poster 33.314: Sunday, May 19, 8:30am – 12:30pm ET, Banyan Breezeway, Decision Making: Perceptual decision making 2

[VSS Link]

Learning or doing? Visual recognition of epistemic vs. pragmatic intent
Sholei Croom, Hanbei Zhou, Chaz Firestone
[email protected]

Whereas some actions are aimed at changing the world, others are aimed at learning about it. For example, someone might press on a door to open it, or to determine whether it’s locked; someone might place their toe into a pool to enter it, or to gauge its temperature; someone might shake a container to shuffle its contents, or to figure out what’s inside. The distinction between ‘pragmatic’ and ‘epistemic’ actions is recognized in other fields, but only recently entered vision science: In previous work (Croom et al., 2023), we found that, when watching videos of someone shaking a box, observers can infer what information they are trying to obtain (e.g., the number of objects inside vs. their shape). Here, we ask a broader question: Do epistemic actions share common visual features that distinguish them from pragmatic actions, even beyond particular action goals? We created a set of 216 videos, each showing a naive participant completing an epistemic action (determining the number, shape, or size of objects in a box) or a pragmatic action (shuffling the box’s contents, making the objects collide, or causing them to jump into the air). Then, 100 observers viewed these videos and were given a different task: To distinguish pragmatic actions from epistemic actions—i.e., who was acting to do something vs. to learn something. While some observers were given details about the specific actions they would see, other observers were simply told that some videos showed ‘learning’ and others showed ‘doing’. Regardless of whether they were informed (Experiment 1) or uninformed (Experiment 2) of the candidate actions, observers correctly distinguished pragmatic from epistemic actions, based purely on the box-shaking dynamics. Thus, learning looks different from doing: Beyond recognizing the particular goals of an action, observers can visually recognize epistemic vs. pragmatic intent.

Poster 36.332: Sunday, May 19, 2:45pm - 6:45pm ET, Banyan Breezeway, Scene Perception: Virtual environments, intuitive physics

[VSS Link]

Tuesday, May 21

phiVis: Philosophy of Vision Science
A VSS satellite event, organized by Kevin Lande and Chaz Firestone

The past decade has seen a resurgence in conversation between vision science and philosophy of perception on questions of fundamental interest to both fields, such as: What do we see? What is seeing for? What is seeing? The phiVIS workshop is a forum for continuing and expanding this interdisciplinary conversation. Short talks by philosophers of perception that engage with the latest research in vision science will be followed by discussion with a slate of vision scientists.

Event: Tuesday, May 21, 12:30pm - 2:30pm ET, Banyan/Citrus Room.
Register: in person; online.

[VSS Link]

Wednesday, May 22

Number: Still a primary visual feature
Caroline Myers, Chaz Firestone, Justin Halberda
[email protected]

Some of the strongest evidence that number is a primary visual feature (like color or contrast) comes from experiments demonstrating visual adaptation to number (e.g., Burr & Ross, 2008), wherein staring at a large number of dots decreases numerosity estimates of subsequent probe displays. Recently, these findings have been challenged by a deflationary account on which these effects reflect spatiotopic attenuation to unchanging information (Yousif et al., 2023). Here, we conduct a crucial comparison of these accounts by testing number adaptation for arrays whose spatial properties constantly change. Centrally fixating observers viewed two large discs subtending 9° that independently and randomly translated on either side (left or right) of the display. During the adaptation phase, varying numbers of dots appeared and faded at changing locations within each continuously-moving disc. After 12 seconds, the dots disappeared and the discs continued moving for an additional 1000ms. Following this delay, a tone signaled the appearance of a new number of probe dots, appearing for 500ms in new locations within each disc; observers judged which disc contained more dots. On critical trials, probe dots were equal in number at the time of the tone. If number is a primary visual feature that can be bound to an object, then subjects should show an adaptation effect (thereby judging the disc previously containing the smaller adapter number as greater). If number adaptation effects are really spatiotopic attenuation to unchanging information, then subjects should not show an adaptation effect. Subjects showed the predicted adaptation effect. This result suggests that number adaptation persists despite drastic changes to spatiotopic and retinotopic position, contra an explanation in which adaptation results from filtering out information that remains unchanged between adapter and probe displays. Our findings re-open the case for genuine number adaptation and numerosity as a primary visual feature more broadly.

Poster 63.307:: Wednesday, May 22, 8:30am - 12:30pm ET, Banyan Breezeway, Perceptual Organization: Segmentation, shapes, objects

[VSS Link]