Stefano Melacci
University of Siena - Dipartimento di Ingegneria dell'Informazione e Scienze Matematiche Dario Zanca
Department AIBE FAU Erlangen-Nuremberg, Germany
Course Type
Type B
Calendar
Aula 103
May 19-23 h 9-13
Room
Program
Brief abstract
The course on Bridging Human and Machine Vision explores the interdisciplinary convergence of human visual perception and machine vision technologies. This field integrates insights from technical disciplines (computer vision and artificial intelligence), and empirical disciplines (cognitive science, neuroscience and psychology). The course is designed to provide students with a comprehensive understanding of how human visual processing systems can inform and enhance the development of machine vision algorithms, and vice versa.
Syllabus
Introduction to Human Visual System
◦ Overview of the human visual system (HVS): The eye, retina, and visual pathways.
◦ Basic concepts in visual perception
◦ Introduction to Human Visual Attention
Visual Attention and its Role: Selective attention and its neural underpinnings.
Types of Visual Attention: Bottom-up vs. top-down attention mechanisms.
Human attention modeling: Feature Integration Theory.
Computational Models of Human Attention
◦ Saliency prediction
Classical Models of Attention: Itti’s model, GBVS.
Supervised approaches: Learning human attention from gaze data.
◦ Scanpath prediction
Introduction to Artificial Vision Systems
◦ Fundamentals of Computer Vision: Image processing, feature extraction, and object recognition.
◦ Convolutional Neural Networks (CNNs): Architecture, training, and applications.
◦ Vision Transformers (ViT): Architecture, training, and applications.
Human-Inspired Vision Models
◦ CNNs vs. visual processing in the human brain
◦ Biologically-Inspired Architectures
Human-attention-enhanced models
Foveated models
V1-like models
Robustness in Vision Models: real-world and worst-case (i.e., adversarial) distribution shifts.
◦ Evaluating Model Robustness: Metrics and benchmarks for assessing robustness in vision systems.
◦ Metamers: The concept of deep learning metamers.
Are Deep Learning Models Good Models of Human Vision?
◦ Behavioral alignment: analysing deep learning systems as decision makers.
◦ Representations alignment: correlating deep learning activations and brain data.
◦ Future Directions: Opportunities for improving models and bridging gaps between human and artificial vision.