تجزیه و تحلیل مبتنی بر سطح ویژگی احساسات بر روی نظرات برخط با استفاده از روش¬های با نظارت بسیار ضعیف

STUDENT

DEGREE

YEAR

With the rapid growth of user-generated content on the internet, automatic sentiment analysis of textual reviews has become a hot research topic recently among researchers of data mining and natural language processing. Sentiment analysis or opinion mining is the computational study of people’s opinions, appraisals, attitudes, and emotions toward entities, individuals, issues, events, topics and their attributes. One of the main problems in sentiment analysis is aspect-based sentiment analysis which its aim is to detect aspects and the sentiments expressed on the aspects.As the number of product and service reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting aspects and determining the sentiments for these aspects. Up to now, different aspect-based sentiment analysis approaches have been developed in the field of relation and frequency-based and model-based approaches. Most of presented methods are domain dependent and produce many unrelated aspects. But, due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical.Additionally, with the lack of labeled training dataset, building sufficient labeled data is often expensive, time consuming and needs much human labor. Hence, a model trained on labeled data in one domain often performs poorly in another domain. Other main challenges in aspect-based sentiment analysis are jointly detecting aspects and sentiments, discovering implicit aspects, detecting multi-word aspects and language-dependency of models. This thesis follows aiming to improve previous works to overcome these challenges to present unsupervised and weakly supervised models in that they require no labeled training data for the aspect-based sentiment analysis. To this matter, in this thesis we proposed two domain-independent weakly supervised models for the aspect-based sentiment analysis on online customer reviews. Unsupervised aspect-sentiment detection model is the first proposed model which requires no training data and detectsboth explicit and implicit aspects. Joint aspect-sentiment detection model, JASE, is the second proposed model which is presented based on topic modeling and tries to detect sentiment and aspect words simultaneously. To mine sentiments, the JASE model uses advantages of both frequency-based approaches and semantic features of text. In experimentsection, the proposed models are evaluated based on standard measures of information retrieval on English and Persian review datasets in document and sentence level. In the experiments, the proposed models have been compared with other approaches such as frequency and relation-based and supervised approaches. Experimental results show considerable improvements of the proposed models over conventional models including unsupervised and supervised approaches. By comparing the results, the best values for precision and recall measure in the proposed unsupervised aspect-sentiment model are %90 and %71 respectively. The best performance of the proposed JASE model in accuracy measure for English reviews is %86.79 and for Persian reviews is %79.58. Keywords: Sentiment analysis, natural language processing, opinion mining, aspect-based sentiment analysis, data mining, weakly supervised model.

با رشد روزافزون اطلاعات متنی تولید شده توسط کاربران در اینترنت، تجزیه و تحلیل احساسات در متون، زمینه کاری جذابی در بین محققان علوم داده کاوی و پردازش زبان طبیعی شده است. تجزیه و تحلیل یا کاوش احساسات، مطالعه محاسباتی احساسات، نظرات، گرایش هاو تمایلات کاربران بر روی موضوعات، اشیا، خصوصیات و ویژگی های آنها در اسناد متنی میباشد.یکی از مهم ترین مسائل درتجزیه و تحلیل احساسات، تجزیه و تحلیل مبتنی بر سطح ویژگی است که هدف آن استخراج ویژگی ها و کلمات احساس ارائه شده بر روی آنها می باشد. با افزایش نظرات برخط کاربران بر روی محصولات یا سرویس ها نیاز به مدل هایی خودکار در جهت تجزیه و تحلیل احساسات مبتنی بر سطح ویژگی پدیدار می شود. تا به حال روش های مختلفی در زمینه تجزیه و تحلیل احساسات در سطح ویژگی در دو دسته روش های مبتنی بر تکرار و رابطه و روشهای مدل گرا ارائه شده اند.بسیاری از روش های ارائه شده وابسته به دامنه موضوع هستند و ویژگی های زیادی را تولید میکنند.با وجود دامنه وسیع و تنوع زیاد محصولات و سرویس ها، استفاده از روش های وابسته به دامنه موضوع راه حل مناسبی نخواهد بود. علاوه بر این با فقدان مجموعه داده آموزشیبرچسب خورده [1] ، تهیه مجموعه داده مناسب برچسب خورده در دامنه های موضوعی مختلف امری طاقت فرسا، هزینه بر و وقت گیر است، در نتیجه نیاز به مجموعه داده بر چسب خورده موجب وابستگی به دامنه موضوع خواهد شد. علاوه بر داشتن این چالش ها، کاستی های دیگری که در روش های موجود در حل مسئله تجزیه و تحلیل احساسات در سطح ویژگی وجود دارند، استخراج همزمان ویژگی و احساس، یافتن ویژگی های پنهان، استخراج ویژگی های چند کلمه ای و وابستگی روش های معرفی شده به زبان می باشند. این رسالهبا هدف بهبود مدل های پیشین و ارائه راهکارهایی در جهت رفع چالش های موجودبه دنبال مدلی است که بتواندبا نظارت بسیار ضعیف و بدون نیاز به مجموعه داده برچسب خورده، تجزیه و تحلیل احساسات در سطح ویژگی را انجام دهد.در این راستا،در این رساله دو مدل با نظارت بسیار ضعیف و با حداقل وابستگی به دامنه برایتجزیه و تحلیل احساسات در سطح ویژگیبر روی نظرات برخط کاربرانارائه شده است. مدل بدون ناظر تشخیص ویژگی و احساس مدل پبشنهادی اول است کهنیاز به مجموعه داده برچسب خورده ندارد و علاوه بر ویژگی های بارز، ویژگی های پنهان را نیز استخراج می کند. مدل تشخیص همزمان ویژگی و احساس JASEبه عنوان مدل پیشنهادی دوم ارائه شده است که بر اساس مدلسازی موضوعی سعی در تشخیص همزمان کلمات احساس و ویژگی ها دارد. این مدل پیشنهادی علاوه بر استفاده از مزایای روش های مبتنی بر تکرار و خصوصیات مفهومی متن، ساختار متن را نیز برای کاوش احساسات در نظر می گیرد.در بخش آزمایش ها، ارزیابی مدل های پیشنهادی در جهت رسیدن به حداقل وابستگی به زبان بر روی مجموعه داده های نظری به زبان های انگلیسی [2] و فارسی [3] در سطح سند و ویژگی احساساتبر اساس معیارهای استاندارد بازیابی اطلاعاتانجام شده است.در این آزمایش ها مدل های پیشنهادی با سایر روش ها از جمله روشهای مبتنی بر تکرار و رابطه و روش های مبتنی بر یادگیری نظارتی مقایسه گردیده است. نتایج ارائه شده از مدل های پیشنهادی و مقایسه آن با سایر مدل های استاندارد نشان از بهبود کارایی مدل های پیشنهادی نسبت به روش های پیشین دارد.با توجه به آزمایش ها، بهترین نتایج به دست آمده از مدل پیشنهادیبدون ناظر تشخیص ویژگی و احساس در معیارهایprecision و recall به ترتیب برابر با 90% و 71% می باشند. همچنین بهترین کاراییدر مدل پیشنهادیJASE در معیار accuracyدر بین مجموعه های داده نظری انگلیسیبرابر با 86.79% و در بین مجموعه های داده نظری فارسی برابر با 79.58% است. [1] Labeled training data [2] English language [3] Persian language