In this dissertation, scene understanding is investigated. The main goal in scene understanding is to build a machine such that it can perceive like human and understand the major parts of the image. Scene understanding includes two important tasks: object detection and semantic segmentation. It is shown that many state-of-the-art approaches in object detection and semantic segmentation focus on incorporating the high level information in an effective way. Hence, this dissertation concentrates on finding an effective way to incorporate high level information. To do this, we benefit from human thinking. Hence, high level information is extracted through explicit grouping of low-level information. In many previous research works, the high-level information is extracted implicitly such that it is discriminative in the entire dataset. Whereas, if it is obtained based on one image and then it is completed using other images, then we have better performance. Human does it in the same way. We investigated this idea in both object detection and in semantic segmentation. In the proposed object detection method, a set of discriminative parts are extracted for each object category through explicitly grouping of low-level features. In our approach, in the training phase, the object model is learned incrementally. In semantic segmentation, a new nonparametric approach is proposed which does not require a learning model. Also, regions in test image are grouped to form one semantically meaningful unit. All introduced nonparametric approaches are based on patch correspondence. Our proposed method does not require explicit patch matching which makes it relatively fast and effective. Also, the application of semantic segmentation in content-aware image retargeting is investigated. In image retargeting, each human based on his understanding of image, produces a different retargeted image. This is due to that different semantic kashida; TEXT-ALIGN: justify; TEXT-KASHIDA: 0%; MARGIN: 0cm 0cm 0pt; unicode-bidi: embed; DIRECTION: ltr" Key Words Scene understanding, high-level information, object detection, semantic segmentation, image retargeting.