شناسایی نوع فعالیت انسان در دنباله ای از تصاویر ویدئویی با استفاده از توصیفگر بافت پویا

STUDENT

DEGREE

YEAR

Human action recognition is a dynamic and challenging field of study in computer vision. It refers to the task of classifying human action based on a video sequence taken from a subject performing that action. This has attracted a lot of attention because of its wide applications in surveillance systems, sport video analysis, intelligent environment, and robot guidance. Cluttered background, camera motion, xerography view point, occlusion, low frame rate, and low resolution are major challenges to the analysis of human actions. Dynamic textures are temporally continuous and infinitely varying sequences of images with certain spatial and temporal stationarity properties. They include sea-waves, smoke, foliage, whirlwind etc. Human action can be considered as a type of dynamic texture since it has statistical variations in spatio-temporal domain. The local interest features contain efficient information of these spatiotemporal variations. The proposed method is based on dynamic texture description for analysis of human motion using visual dictionary. In this thesis, we adopt the idea of spatio-temporal analysis with dynamic textures on local features. For this target, spatio-temporal interest points are extracted according to Laptev strategy. These are points at which a significant change occurs in both space and time domains. This means that features selected by laptev operator not only undergo an intensity change, but also they undergo a change in the magnitude of motion velocity or the direction. Then, these interest points are described by a dynamic texture descriptor: local binary pattern on three orthogonal plane (LBP-TOP). LBP-TOP is an extension of basic LBP operator that is applied on three orthogonal planes: XY, XT, YT. We apply LBP-HF, a novel rotation invariant image descriptor computed from discrete Fourier transforms of local binary pattern histograms, on each plane. Next, we cluster the features with k-means clustering algorithm so that each center of cluster is a candidate of other members of that cluster. In order to compact information and simply compare features to classify actions, we construct a visual dictionary. The concept of visual dictionary is often used for image segmentation and retrieval. Centers of the clusters are the words of our visual dictionary. Each action is described by a histogram according to the number of occurrences of the words of visual dictionary. The number of bins of the histogram is equal to the number of words of visual dictionary and each bin shows the occurrences of one word of the dictionary in the samples of the action. Finally, test sequences are given to the system. We should construct the histogram describing test sequences according to what we have conducted for train sequences. Then, they will be classified by two classification algorithms: K-nearest neighbor (KNN) and support vector machine (SVM) that are among the most popular and powerful classification algorithms in computer vision. In order to verify the proposed method, we applied the experiment to KTH dataset that contains six different actions: boxing, hand clapping, hand waving, jogging, running and walking. This method can recognize the actions of this dataset with the mean accuracy of 90.27%. This is a reasonable performance among other competing methods. Keywords: human action recognition, motion analysis, dynamic texture, spatio-temporal features, visual dictionary, bag of words

تشخیص نوع فعالیت انسان در دنباله ای از تصاویر ویدئویی یک موضوع مهم و پویا در بینایی کامپیوتر می باشد. این امر به برچسب گذاری دنباله تصاویر با برچسبی از فعالیت ها اطلاق می شود و به علت کاربردهای گسترده آن توجه زیادی را به خود جلب نموده است. از جمله کاربردهای آن می توان به سیستم های نظارت هوشمند،‌ هدایت روبات ها،‌ اتومبیل های خودکار و اندیس گذاری ویدئو اشاره کرد. مسئله¬ی تشخیص نوع فعالیت انسان را می توان ترکیبی از نمایش ویژگی های فعالیت های مختلف و دسته بندی آنها دانست. این کار با مشکلاتی از قبیل تغییر حالت فعالیت ها، وضعیت ضبط ویدئو، تفاوت های بین افراد مختلف، تغییر زاویه دید، همپوشانی و تغییر اندازه روبرو است. در قسمت نمایش تصویر، فعالیت انسان را می توان نوع خاصی از یک الگوی بافت پویا در نظر گرفت. در این پایان نامه از ایده ی تحلیل زمانی-مکانی با استفاده از توصیفگر بافت پویا بر روی ویژگی های محلی برای تشخیص نوع فعالیت انسان استفاده شده است. برای این کار ابتدا نقاط کلیدی زمانی-مکانی با استفاده از روش لپتو استخراج می شوند. سپس این نقاط با کمک توصیفگر بافت پویایLBP-TOP توصیف خواهند شد. در مرحله ی بعد با استفاده از یک الگوریتم خوشه بندی، ویژگی ها را در خوشه هایی دسته بندی می کنیم بطوریکه مرکز هر خوشه به عنوان نماینده ی سایر اعضای آن خوشه می باشد. به منظور سهولت مقایسه ی ویژگی ها و دسته بندی فعالیت ها یک فرهنگ لغت ایجاد کرده و مرکز هر خوشه را به عنوان یک کلمه در این فرهنگ لغت لحاظ می کنیم. پس از آن هر فعالیت را با توجه به تعداد تکرار هر کلمه ی موجود در فرهنگ لغت در آن، در قالب یک هیستوگرام توصیف می نماییم. در نهایت نمونه های تست برای ارزیابی روش پیشنهادی وارد سیستم خواهند شد و عملیات دسته بندی توسط دو الگوریتم دسته بندیK نزدیکترین همسایه و ماشین بردار پشتیبان انجام می شود. به منظور ارزیابی الگوریتم پیشنهادی، آن را بر روی پایگاه داده KTH شامل 6 فعالیت مختلف آزمایش کرده ایم. این روش قادر است با دقت 90.27% به تشخیص فعالیت های موجود در این پایگاه داده بپردازد. کلمات کلیدی: تشخیص نوع فعالیت انسان، تحلیل حرکت، بافت دینامیک، ویژگی های زمانی-مکانی، فرهنگ لغت، سبدی از کلمات