Williams and Titsias (2004) have shown how to carry out unsupervised greedy learning of multiple objects from images (GLOMO), building on the work of Jojic and Frey (2001). In this paper we show that the earlier work on GLOMO can be greatly speeded up for video sequence data by carrying out approximate tracking of the multiple objects in the scene. Our method is applied to raw image sequence data and extracts the objects one at a time. First, the moving background is learned, and moving objects are found at later stages. The algorithm recursively updates an appearance model of the tracked object so that possible occlusion of the object is taken into account which makes tracking stable. We apply this method to learn multiple objects in image sequences as well as articulated parts of the human body.
|Title of host publication||Computer Vision and Pattern Recognition Workshop, 2004. CVPRW'04. Conference on|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||1|
|Publication status||Published - 2004|