Selective attention plays an important role in visual processing in reducing the problem scale and in actively gathering useful information. We propose a modified saliency map mechanism that uses a simple top-down task-dependent cue to allow attention to stay mainly on one object in the scene each time for the first few shifts. Such a method allows the learning of invariant object representations across attention shifts in a multiple-object scene. In this paper, we construct a neural network that can learn position and viewpoint invariant representations for objects across attention shifts in a temporal sequence.
展开▼