Human action recognition is a dynamic and challenging field of study in computer vision. It refers to the task of classifying human action based on a video sequence taken from a subject performing that action. This has attracted a lot of attention because of its wide applications in surveillance systems, sport video analysis, intelligent environment, and robot guidance. Cluttered background, camera motion, xerography view point, occlusion, low frame rate, and low resolution are major challenges to the analysis of human actions. Dynamic textures are temporally continuous and infinitely varying sequences of images with certain spatial and temporal stationarity properties. They include sea-waves, smoke, foliage, whirlwind etc. Human action can be considered as a type of dynamic texture since it has statistical variations in spatio-temporal domain. The local interest features contain efficient information of these spatiotemporal variations. The proposed method is based on dynamic texture description for analysis of human motion using visual dictionary. In this thesis, we adopt the idea of spatio-temporal analysis with dynamic textures on local features. For this target, spatio-temporal interest points are extracted according to Laptev strategy. These are points at which a significant change occurs in both space and time domains. This means that features selected by laptev operator not only undergo an intensity change, but also they undergo a change in the magnitude of motion velocity or the direction. Then, these interest points are described by a dynamic texture descriptor: local binary pattern on three orthogonal plane (LBP-TOP). LBP-TOP is an extension of basic LBP operator that is applied on three orthogonal planes: XY, XT, YT. We apply LBP-HF, a novel rotation invariant image descriptor computed from discrete Fourier transforms of local binary pattern histograms, on each plane. Next, we cluster the features with k-means clustering algorithm so that each center of cluster is a candidate of other members of that cluster. In order to compact information and simply compare features to classify actions, we construct a visual dictionary. The concept of visual dictionary is often used for image segmentation and retrieval. Centers of the clusters are the words of our visual dictionary. Each action is described by a histogram according to the number of occurrences of the words of visual dictionary. The number of bins of the histogram is equal to the number of words of visual dictionary and each bin shows the occurrences of one word of the dictionary in the samples of the action. Finally, test sequences are given to the system. We should construct the histogram describing test sequences according to what we have conducted for train sequences. Then, they will be classified by two classification algorithms: K-nearest neighbor (KNN) and support vector machine (SVM) that are among the most popular and powerful classification algorithms in computer vision. In order to verify the proposed method, we applied the experiment to KTH dataset that contains six different actions: boxing, hand clapping, hand waving, jogging, running and walking. This method can recognize the actions of this dataset with the mean accuracy of 90.27%. This is a reasonable performance among other competing methods. Keywords: human action recognition, motion analysis, dynamic texture, spatio-temporal features, visual dictionary, bag of words