With the advent of deeplearning, multi-modal data has been of great interest. One of the multi-modaltasks which can be included in the computer vision domain is visual questionanswering (VQA). In this task, a question and an image are entered into the modeland the model tries to answer the question according to the image afterprocessing both entries. To best of our knowledge, the current techniques lookat the image and only give one answer to the question. However, in somesituations, there are several answers to the asked question. In this thesis, weaddress this problem and define a new domain in the task of VQA in which themodel is able to extract all answers given a question and an image. We proposetwo way of addressing the problem as well as a new computationally efficientapproach to cope with multiple answer VQA. Due to the fact that so far noproper dataset is available for this task, we provide a new dataset as well.The experiments express that our model decreased the number of operations by 94percent. Visual question answering, Deep Learning, Convolution Neural Network, Multiple an- swers, Recurrent Neural Network, Multi-modal Data