In: IEEE Conference on Computer Vision and Pattern Recognition (2008), Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: International Conference on Computer Vision, pp. Ramanan, D., Baker, S., Kakade, S.: Leveraging archival video for building face datasets. Huang, G., Jain, V., Learned-Miller, E.: Unsupervised joint alignment of complex images. This process is experimental and the keywords may be updated as the learning algorithm improves. These keywords were added by machine and not by the authors.
#Movie script it movie#
We present quantitative and qualitative results on movie alignment and parsing, and use the recovered structure to improve character naming and retrieval of common actions in several episodes of popular TV series. Scene segmentation, alignment, and shot threading are formulated as inference in a unified generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear time is presented. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more accurate tracking of people, actions and objects. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highly-varied datasets.
Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales “in the wild”.