Perception & Mind Lab Presentations @ V-VSS 2021



Saturday, May 22

 
Human detection of DeepFakes: A role for holistic face processing
Matt Groh, Ziv Epstein, Rosalind Picard, Chaz Firestone
groh@mit.edu

Two of the most significant recent advances in artificial intelligence are (1) the ability of machines to outperform humans on many perceptual tasks, and (2) the ability of machines to synthesize highly realistic images of people, objects, and scenes. Nevertheless, here we report a surprising human advantage at the intersection of these two domains: The ability to detect Deepfakes. Deepfakes are machine-manipulated media in which one person’s face is swapped with another to make someone falsely appear to do or say something they did not — and it is of major theoretical and practical importance to develop methods that can tell Deepfakes from authentic media. Here, we pit the winning computer vision model from the Deepfake Detection Contest (DFDC) against ordinary human participants in a massive online study enrolling 7,241 people. Participants saw authentic and manipulated videos, and were asked to either (a) select which of two videos is a Deepfake (Experiment 1) or (b) share how confidently they think a video is a Deepfake (Experiment 2). In the two-alternative forced-choice design, the average completely untrained participant outperformed the very best computer vision model. In the single-choice design, the average participant outperformed the model on a sample of politically salient videos but underperformed the model on a sample of DFDC holdout videos. (Though approximately one fourth of participants outperformed the model on the DFDC sample.) Follow-up experiments revealed that holistic face processing partly explains this human edge: When the actors’ faces were inverted, misaligned, or occluded, participants’ ability to identify Deepfakes was significantly impaired (whereas the model’s performance was not impaired for misaligned or occluded videos but impaired for inverted videos). These results reveal a human advantage in identifying Deepfakes today and suggests that harnessing specialized visual processing could be a promising “defense” against machine-manipulated media.

Talk: Saturday, May 22, 11:45am–12:00pm EDT, Talk Room 2, Face Perception: Models and Mechanisms

[VSS Link] [Preprint!]

Sunday, May 23

 
Tangled physics: Knots as a challenge for physical scene understanding
Sholei Croom, Chaz Firestone
scroom1@jhu.edu

A resurgence of interest in intuitive physics has revealed a remarkable capacity in humans to make common-sense predictions about the unfolding of physical scenes. For example, recent work has shown that observers can correctly judge properties such as stability, weight distribution, gravity and collision physics, especially in rich naturalistic images. These results have suggested that physical scene understanding recruits a general-purpose “physics engine” that reliably simulates how scenes will unfold. Here, we complicate this picture by introducing knots to the study of intuitive physics. Knots are naturalistic stimuli that appear across cultures and time periods, and are widely used in both mundane scenarios (e.g., closing a bag or tying one’s shoes) and more technical applications (e.g., securing a boat or even supporting a rock-climber). Yet, here we show that even basic judgments about knots strain human physical reasoning. Observers viewed photographs of simple “bends” (i.e., knots joining two lengths of string) that share strong visual similarity but greatly differ in structural integrity. For example, observers saw not only “reef” knots (a common knot used for millennia), but also “thief” knots (which differ only in the position of a single strand but are significantly less secure than “reefs”), along with “granny” and “grief” knots (which share a similar relationship). In a two-alternative forced-choice task, observers judged each knot’s stability relative to every other knot. Strikingly, observers reliably ranked weaker knots as strong and stronger knots as weak, both across knot-families (e.g., incorrectly judging granny knots as stronger than reef knots) and within a given family (e.g., failing to judge “thiefs” as weaker than “reefs”). These failures challenge a general-purpose “physics engine” hypothesis, and perhaps suggest that knots and other examples of soft-body physics recruit different cognitive processes than rigid-body physics.

Poster: Sunday, May 23, 4:00–6:00 pm EDT, Manatee, Scene Perception: Cognitive Processes

[VSS Link]

 
Attention to absences
Jorge Morales, Chaz Firestone
jorge.morales@jhu.edu
Bonus video! (Not available on VSS website)

You return to your locked-up bicycle and immediately notice that the front wheel is missing. (Oh no! It must have been stolen.) As you stare at your incomplete frame, you have a visceral sense of the wheel's *absence*; there isn't just empty space where the wheel should be — there is a missing wheel. What is the nature of this experience? Whereas we typically think of perception and attention as being directed toward (present) objects, here we explore attention to missing or absent parts. Six experiments show that regions of space with missing parts ("absent space") are processed differently than more ordinary empty regions ("empty space"). Subjects saw line drawings of objects missing a part (e.g. a bicycle missing a wheel, a butterfly missing a wing, a jacket missing a sleeve, etc.), and then judged whether a probe appeared on the object or not. Intriguingly, when non-object probes appeared in absent space (e.g. where the front wheel should have been), subjects classified them faster than when probes appeared in empty space (e.g. next to the bicycle). We found this effect with spatially adjacent probes (E1), probes distributed around the stimulus (E2), and when subjects had to discriminate the probe’s color instead of its position (E3 & E4), suggesting that "absent" space attracts attention automatically and efficiently. In contrast, no reaction-time difference was found with scrambled images (destroying the appearance of absence), even though the images' low-level features and the probes' relative positions were preserved (E5). Finally, the absent-part attentional benefit was lost when stimuli were placed closer to the border of a bounding box to create the impression that the absent part couldn’t "fit" (E6). We conclude that, despite not being "objects" at all, absences are prioritized over otherwise identical empty spaces by mechanisms of perception and attention.

Poster: Sunday, May 23, 8:00am–10:00am EDT, Osprey, Visual Memory: Imagery, Drawing, Scenes

[VSS Link]

 
What we’ve been missing about what we’ve been missing: Above-chance sensitivity to inattentional blindness stimuli
Makaela Nartker, Chaz Firestone, Howard Egeth, Ian Phillips
makaela@jhu.edu

Inattentional blindness—the failure to report clearly visible stimuli when attention is otherwise engaged—is among the most striking and well-known phenomena in psychology. But does inattention really render subjects “blind,” or do they see more than their reports suggest? Standardly, IB studies simply ask subjects whether they noticed anything unusual on the critical trial, treating anyone who says “no” as having failed to perceive the stimulus. Yet this yes/no measure is susceptible to bias. Subjects might respond “no” because they were under-confident whether they saw anything (or whether what they saw counted as unusual), because they doubted that they could identify it, etc. Here, we address this problem by modifying the classic IB paradigm to allow derivation of signal-detection measures of sensitivity and bias. Subjects’ primary task was to report which arm of a briefly presented cross was longer. In Experiments 1 and 2, the last trial included an unexpected stimulus. However, after the traditional yes/no question, subjects also answered a two-alternative forced-choice (2AFC) question, e.g., “Was the stimulus on the left or right?” or a forced-response question, e.g., “Was the stimulus red or blue?”. We found that subjects who reported not noticing the IB stimulus could nevertheless discriminate its features (e.g., color, location) well above-chance. In Experiment 3, only two-thirds of subjects were shown an unusual stimulus, providing a false-alarm rate with which to derive detection-theoretic statistics. Subjects also provided confidence ratings for their reports, allowing us to construct confidence-based ROC curves. As predicted, yes/no reports were conservatively biased (i.e., subjects tended to say “no”). Sensitivity did not differ significantly across yes/no and 2AFC tasks, suggesting that standard estimates of IB may be inflated by such biases. These results are consistent with a rarely discussed account of IB: Inattention does not abolish awareness; rather, it degrades it.

Poster: Sunday, May 23, 8:00am–10:00am EDT, Manatee, Attention: Inattention and Lapses

[VSS Link]

 
The evolution of complexity in visual memory
Zekun Sun, Subin Han, Chaz Firestone
zekun@jhu.edu

Memory rarely replicates exactly what we see; instead, it reconstructs past experiences with distortions and errors. In some cases, memories lose their clarity and detail as time passes; in other cases, however, memories “add” details that weren’t originally there. Though such biases are more commonly associated with naturalistic visual scenes (which may recruit higher-level knowledge or schemas), here we show how memory adds content to even the simplest of stimuli: ordinary geometric shapes. We generated a library of smooth-edged shapes, and manipulated their complexity by gradually simplifying their skeletal structure — essentially altering the “amount of information” in the shapes. On each trial of Experiment 1, subjects saw a novel shape; after a brief delay, a version of the same shape appeared at a different level of complexity, and subjects’ task was to “adjust” the new shape to match the one they had just seen, using a slider that altered the adjustable shape’s complexity. Surprisingly, subjects consistently misremembered the shapes as more complex than they really were (i.e., the shapes they produced had increasingly informationally-dense skeletons). Experiment 2 showed that this finding emerges even at wider ranges of complexity, and Experiment 3 expanded this phenomenon further using the method of serial reproduction. In a “telephone game”, one observer’s recalled shape became the presented shape of the next observer, and so on; these reproduction chains amplified our observed complexity biases, such that 300 observers’ chains converged onto shapes much more complex than had initially been presented. Finally, Experiment 4 ruled out certain forms of strategic responding, finding that the patterns remained no matter the subject’s guess about the effect’s expected direction. These findings reveal a new “complexity bias”, whereby even the most basic units of visual processing are remembered as being more information-dense than they really are.

Talk: Sunday, May 23, 10:30am–10:45am EDT, Talk Room 1, Visual Memory: Working and Long-Term

[VSS Link]

 
phiVis: Philosophy of Vision Science
A VSS satellite event, organized by Kevin Lande and Chaz Firestone
www.phivis.org

The past decade has seen a resurgence of interest in the intersection between vision science and the philosophy of perception. But opportunities for conversation between vision scientists and philosophers are still hard to come by. The phiVis workshop is a forum for promoting and expanding this interdisciplinary dialogue. Philosophers of perception can capitalize on the experimental knowledge of working vision scientists, while vision scientists can take advantage of the opportunity to connect their research to long-standing philosophical questions. Short talks by philosophers of perception that engage with the latest research in vision science will be followed by discussion with a slate of vision scientists, on topics such as probabilistic representation in perception, perceptual constancy, amodal completion, multisensory perception, visual adaptation, and much more.

Event: Sunday, May 23, 3:30pm–5:30pm EDT. Register for Zoom info.

[VSS Link]

Tuesday, May 25

 
Through the looking-glass: Visual sensitivity to chirality
Tal Boger, Ziv Epstein, Matt Groh, Chaz Firestone
tal.boger@yale.edu

If you woke up in Wonderland, could you tell? Wonderland, of course, is the mirror-reversed world discovered by Alice in Lewis Carroll's 1871 novel, "Through the Looking-Glass" — and so our question here is whether naive observers are sensitive to patterns that distinguish images from their mirror-reversals. Many patterns in the natural world are "chiral", such that their mirror images are not superimposable. In a series of large online studies (collecting nearly 100,000 judgments), participants were shown a flipped version and an original version of a natural image, and simply had to guess which was which with no other information. (No legible writing was present in the images.) Results revealed a striking sensitivity to chirality; participants were able to identify which image was flipped and which was normal at rates significantly above-chance, even without any obviously distinguishing features. In Experiment 1, we showed participants images from a large database of social media photos. We observed above-chance performance not only in average accuracy across participants, but also on the image-level: Over 80% of the 500 different images had above-chance performance. Experiment 2 revealed that this chiral sensitivity pervaded the space of natural images and was not specific to any one image class: When we showed participants images from published databases of objects, natural scenes, artificial scenes, and faces, we again observed above-chance performance. Taken together, our results show that humans can not only identify visual chirality but also generalize it across different types of images. Chirality plays a role in a wide variety of natural processes, including the growth of seashells, the organization of chemical structures, and even the handedness of bimanual species. Our work here suggests that chirality arises not only in the world around us but also in human visual processing.

Poster: Tuesday, May 25, 3:00pm–5:00pm EDT, Manatee, Scene Perception: Models and Statistics

[VSS Link]

 
Melting ice with your mind: Dynamic representation of physical states
Alon Hafri, Tal Boger, Chaz Firestone
alon@jhu.edu

When a log burns, it transforms from a block of wood into a pile of ash. Such state-changes are among the most dramatic ways objects can change their appearance—going beyond mere changes of position or orientation. How does the mind represent changes of state? A foundational result in visual cognition is that memory extrapolates the positions of moving objects—a distortion called “representational momentum.” Here, we exploited this phenomenon to investigate mental representations in “state-space.” We created realistic animations of objects undergoing state-changes: ice melting, grapes shriveling, logs burning, etc. Participants observed interrupted segments of these animations, and then reported the last frame they saw using a slider. Four experiments showed representational momentum for state-changes, revealing dynamic representation of physical states. In Experiment 1, participants consistently reported a frame more “forward” in time (e.g., more melted) than they had actually seen. Experiment 2 showed that such representations are flexible, arising even for directions rarely encountered before: We included both forward- and backward-playing animations (e.g., both melting and “unmelting”) and observed representational momentum in both directions (e.g., for backward animations, participants remembered the ice as more “unmelted” than it really was). Experiment 3 controlled for low-level motion cues by showing that even a single static frame elicits representational momentum: Participants who saw one frame of each state-change misremembered it as further along its implied state-transformation. This also indicates that the mind privileges the physically natural forward direction. Finally, Experiment 4 ruled out biases that may have arisen from the response method (slider adjustments) by replicating our earlier results using a two-alternative forced-choice paradigm. Taken together, our findings reveal that mental representations of a dynamic world actively incorporate such dynamic changes, and in surprisingly broad ways: Whether in position or state, the mind extrapolates how objects change.

Talk: Tuesday, May 25, 2:15pm–2:30pm EDT, Talk Room 2, Visual Memory: Capacity, Models, Neural and Encoding

[VSS Link] [Preprint!]