The visual analysis of human movements is one of the recent attractive topics in biometric research. Psychological studies indicate that people have statistically significant ability to recognize people by the way they walk. Therefore, gait recognition has recently become a topic of great interest in computer vision. Authentication is the process of accepting or rejecting the claimed identity. Recent efforts propose a way for quick detection of threats without attracting the attention of people in public places like airports, banks and subway stations. Compared to other biometrics, gait has some unique characteristics. Common biometrics is usually limited and time consuming but gait analysis is unobtrusive. The other major advantages of gait are that it requires no contact or cooperation, it can be measured at a distance and is difficult to conceal or replicate. It has gained great interest because of its many applications, such as human movement analysis, surveillance, video indexing, sport video analysis and gait recognition. Typical approaches for human gait recognition have used either motion or shape information. However, it may not be a good idea to rely on a single modality. Therefore, recent researches have gained very good performance by using spatiotemporal analysis that combines both motion and shape information. This thesis is intended to present a bag of video-words method for the analysis of human walking based on dynamic textures. Dynamic texture descriptors naturally encode motion information. Applying them on textured regions, encodes appearance information as well. Therefore, dynamic texture descriptors can be used to describe human motion in both spatial and temporal domains. In this thesis, the Local Binary Patterns from Three Orthogonal Planes (LBP-TOP) as a dynamic texture descriptor is applied to local features for describing human movements in a spatiotemporal way. This dynamic texture descriptor is robust to rotation and scale changes. Since the main prerequisite to gain the best description is extracting as much discriminative features as possible, we use local representation for feature extraction. Local representation is invariant to changes in viewpoint, person appearance and partial occlusions. This representation describes the observation as a collection of local descriptors or patches. Patches are sampled at space–time interest points that have mutation in spatial and temporal domains. Then each patch is described by using LBP-TOP descriptor. Since human walking has statistical variations in both spatial and temporal spaces. Therefore, a walking video sequence can be represented as a collection of video-words after extracting spatiotemporal interest points and describing them by a dynamic texture descriptor. Then hierarchical K-means algorithm as a clustering algorithm is applied to obtain the initial visual dictionary of video-words. Afterwards, by using this visual dictionary, each learning sequence is defined by a feature vector. At last, in pattern matching step, we use a ltr" Key words: human motion analysis, gait recognition, interest point, dynamic texture, local binary pattern from three orthogonal planes, clustering.