
Deep Fragment Embedding for bidirectional Image sentence mapping

1 Introduction: This model works on a finer level and embeds fragments of images(objects) and fragments of sentences into a common space. And the paper states that both global level of images and sentences and the finer level of their resp…

Multimodal Convolutional Neural Network for Matching Image and Sentence (Accepted by ICCV 2015)

This paper provides an end-to-end framework to match "image representation" and "word composition". More specifically, it consists of an image CNN encoding the image content, and one matching CNN learning the joint representation of image …