Pauwels, Karl

Ph.D. Thesis, Katholieke Universiteit Leuven, 2008

BibTeX Citation
Publisher Site

Our visual system is bombarded with sensory information. Since the computational resources are limited, it cannot perform all its functions at all locations in the visual field simultaneously. A selection is made by **visual attention**, which serves as a filter that can accentuate relevant and suppress irrelevant information. In computer vision applications, especially those that employ visual motion, additional resources are also often required at specific regions. In this thesis, two studies are described that focus on visual attention from different perspectives, namely explanatory, by formulating and validating hypotheses about the neuronal mechanisms of visual attention, and application-oriented, by using principles derived from attention research to facilitate the analysis of real-world video sequences. Our first study introduces a model for modulatory attention effects at the earliest stages of visual processing. Contrary to the classical view that visual attention enhances metabolic activity and occurs relatively late in the hierarchy, neurophysiological experiments conducted by Vanduffel and co-workers have shown that, at early stages (in particular in the lateral geniculate nucleus, LGN, and primary visual cortex, V1), attention manifests itself as a ring of suppressed activity that surrounds the attended stimulus representation. Current models cannot explain this effect. In this work, we formulate a hypothesis on the underlying mechanism, and implement it as a novel computational model that reproduces, and thus provides an explanation for this phenomenon. Our model predicts that the diffusion of stimulus-driven relay cell activity to the reticular thalamic nucleus (RTN) (both directly and through V1), and its subsequent inhibition of LGN regions that surround the stimulus representation, is the main mechanism of the attentional modulation. Whereas in the first study, the attention control signal results from the task (orientation discrimination), the second study introduces a complex visual feature that automatically draws attention, namely **independent motion** (motion with respect to the stationary environment). Although relatively easy to detect when the observer is static, when the observer is moving, both moving objects and static environment generate visual motion on the retina (optic flow), and multiple cues must be combined to discriminate between both origins. For humans, this detection is quite easy, and additional resources (attention) are typically allocated to the regions where the independent motion is present. We introduce a set of novel biologically-inspired computer vision techniques that are robust to typical nuisance factors observed in real-world video sequences, recorded by moving observers. The combination of these techniques results in a saliency map for independent motion (a map that indicates the independent motion likelihood at each location). First, two novel methods are introduced that enable the extraction of optic flow from unstable sequences. Unlike existing techniques, both methods can deal with the complexities of real-world scenes, where the distance to the scene is small, the range of depths within the scene is large, and moving objects are present. Next, an algorithm is introduced for the computation of self-motion from optic flow. Contrary to traditional methods, the proposed algorithm is insensitive to the presence of local minima (that correspond to sub-optimal solutions). This robustness is achieved without having to sacrifice accuracy. Finally, a novel binocular disparity algorithm is introduced, and it is combined with the other visual cues into a saliency map for independent motion. This map centers around an independent motion measure that provides better discriminability than existing techniques.