The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations. The images are freely available from the Action Recognition Task in the PASCAL VOC 2010 data set; our annotations are available for only the trainval data. Descriptions are available for all 2,424 images in the trainval data, and object annotations and Visual Dependency Representations are available for a subset of 341 images.

Data Citation

Elliott, Desmond; Keller, Frank. (2014). Visual and Linguistic Treebank, [dataset].
