2016-02-01から1ヶ月間の記事一覧

DenseCap: Fully Convolutional Localisation Networks for Dense Captioning

1 Introduction This paper addresses the object localization and image caption jointly by proposing a fully convolutional localization network. (FCLN). The architecture is composed of a Convnet, a novel dense localization layer, and a RNN l…

Spatial Transformer Networks

1 Introduction A desirable property of a system which is able to reason about images is to disentangle object pose and part deformation from texture and shape. In order to overcome the drawback that CNN's lack of ability to be spatially in…

Generative Images From Captions With Attention

1 Introduction This work propose a sequential deep learning model to generate images from captions, the model draws a patch on a canvas in a time series and attend to relevant word at each time. 2 Related work Deep discriminative Model VS …

DRAW: A Recurrent Neural Network For Image Generation

1 Introduction We draw pictures not at once, but in a sequencial, iterative fashion.This work proposes an architecture to create a scene in a time series, and refine the sketches successively. The core of DRAW is a pair of recurrrent neura…

Show, Attend, and Tell: Neural Image Caption Generation with visual Attention

各种带隐藏变量的机器学习模型学会了吗? 神经网络,RBM,各种概率图 EM算法,变分推演,平均场等等套隐藏变量的求解方法 以上算法背后的凸优化方法 推公式 1 Introduction In the past, to solve image caption task, one always extracts features from im…

Unifying Visual-semantic Embedding with Multimodal Neural Language Models

1 Introduction This work use the framework of encoder-decode models to solve the problem of image-caption generation. For encoder, the model learn the joint sentence-image embedding where sentence embeddings are encoded using LSTM, and ima…

Deep Visual-Semantic Alignments for Generating Image Description

1 Introduction This paper uses CNN to learn image regions embedding, bidirectional RNN to learn sentence embeddings, associate them into a common multimodal space, and a structured objective to align two modalities. Then it proposes a mutl…

Deep Compositional Captioning: Describe Novel Object Categories without Paired Training Data

1 Inroduction In the past, the image caption model can only be trained on paired image-sentence corpora. To address this limitation, the author proposed a Deep Compositional Captioner that can generate descriptions about objects which don'…

Generation and Comprehension of Unambiguous Object Description

1 Inroduction The normal image caption task suffers a problem of the difficulty of evalution. There is no very convincing metric evalution that say one generation is exactly better than others. So this work does not generate deccription fr…