Paper Reading -> Learning like a child ,Fast Novel Concept Learning from sentence Description of Images

1 Introduction      

This paper address the problem of generating descriptions from images. The difference from this paper to other work is that the author proposed a method to deal with the new concept not seen in the training set.  More specifically,  once the model is trained on old image data, and now there comes some new image data that describe new visual concepts not seen in old training data, how to deal with new data? One way is to retrain the model totatlly from scratch and discard the old parameters, well it is fairly not a too terrible method,  but it is a waste of both time and computation resource to train the modal again. So, the author suggest that we can add some parameter corresponding to new data and keep the old parameter fixed while training the new paramters.

   The author present a Normal Visual Concept learning from Sentences(NVCS) framework. First , train the base model on the old data using m-RNN, 

f:id:PDFangeltop1:20160111165058p:plain

  There is a transposed weight sharing(TWS) stragedy, to convert a 1024-dimension multimodal vector to a 512-dimension one. The motivation to do this is to first reduce the parameter numbers. (From N*1024 to N*512 ,N may be more than 100000), and second to easily add parameter corresponding to new data.

f:id:PDFangeltop1:20160111165613p:plain

f:id:PDFangeltop1:20160111170515p:plain  

f:id:PDFangeltop1:20160111170508p:plain discard the Um and replace it with Ui and Ud, where Ui : 1024*512, 

f:id:PDFangeltop1:20160111170512p:plain

For Novel Concept Learning , we first fix the originally learned weights regard to old data.

More specifically, Ud = [Ud_old, Ud_new], we keep Ud_old fixed and train Ud_new.

And then fix the baseline probabilty.

 

f:id:PDFangeltop1:20160111170518p:plain

we set bn' to be the average value of the element in bo' and fix bn'. 

So we fix Ud_old, bo' and bn' , only train on Ud_new.

2 Dataset and Evaluation

They use the MSCOCO dataset, take out object "cat" and "motor" for novel visual dataset, and the remain to be base dataset. The relation of each train, validation ,test set are shown below.

f:id:PDFangeltop1:20160111171341p:plain

They calucate the BLEU and METEOR scores, which evaluate the overall quality of the generated sentences. 

The TWS and Baseline Probability Fixation stragedies are effective.

 f:id:PDFangeltop1:20160111175311p:plain

 And we can use not many training examples (NC training set.)to get a reasonable scores.

f:id:PDFangeltop1:20160111175521p:plain

With about 10-50 trainingimages, the model achieves comparable performance.