تشخیص و بررسی نقاط پرت در مدل های رگرسیون فازی

STUDENT

DEGREE

YEAR

: In fuzzy linear regression that was introduced by Tanaka in 1982 , some of the strict assumptions of the statistical model are relaxed. In the general fuzzy regression model the input data (for explanatory variables x) and the output data (for dependent variable y) are fuzzy, the relationship between the input and output data is given by a fuzzy function and the distribution of the data is possibilistic. They need not have statistical properties. So the fuzzy regression analysis should be applied to many real life problems in which the strict assumptions of align=left Outliers are sometimes occurred because of big errors during the collection, recording or transferring data. Sometimes they are correct observations that show inadequacy of the model. When an outlier is detected, it should be investigated. We should not automatically omit it and continue the analysis. If outliers are serious observations, they prove inadequacy of the model. Usually they provide valuable keys for analyzer to make better model. It is important for analyzer to detect outliers and investigate their effect on different features of analysis. One of the drawbacks of Tanaka’s model is that it is sensitive to outliers. This sensitivity caused that predicted intervals become wide which is not desired. Over the last years, some methods are presented to remove this problem. One method is to introduce a new variable and construct a fuzzy linear programming problem with fuzzy intervals and obtain reasonable estimate intervals. In this way, estimates are not affected by outliers and effect of outliers will be omitted. This means that all data influence on estimated interval not just outliers. Another method is to add some additional constraints to the main problem’s constraints and detect outliers and modify the constraints of outliers. In this way also, effect of outliers will be omitted. However there are some drawbacks in these methods. For example they have to already determine some values for parameters. To overcome the drawbacks, we use an omission approach that investigate the value changes in the objective function when each observation is omitted. So, a method for detecting outliers is presented that by eliminating every observation, its effect on objective function in linear programming problem is investigated and the outlier is detected. In addition, we use box plot to define the cutoffs for detecting outliers. A certain diagnostic measure is used to see the effect of one observation on the objective function. Then the concentration is on the biggest one. Therefore, a box plot is used to determine whether the biggest measure is an outlier or not.

: مدل های رگرسیونی برای برقراری ارتباط بین یک متغیر وابسته و تعدادی متغیر مستقل به کار می روند. برای ساختن این مدل ها نیاز به مشاهداتی از متغیرهای مورد مطالعه می باشد. در رگرسیون کلاسیک فرض می شود که این متغیرها و مشاهدات مربوط به آن ها دقیق هستند. ممکن است در یک بررسی مشاهدات مربوط به یک یا چند متغیر نادقیق باشند و یا نادقیق گزارش شده باشند. همچنین ممکن است که متغیرهای مورد مطالعه ذاتاً دارای ارتباطی نادقیق و مبهم(تقریبی) باشند. یکی از شیوه های مهم جایگزین رگرسیون کلاسیک در چنین مواقعی استفاده از رگرسیون فازی است. یکی از انواع رگرسیون فازی رگرسیون امکانی است که نخستین بار توسط تاناکا و همکاران پیشنهاد شد. در این پایان نامه به توضیح این رگرسیون پرداخته می شود و مدل های براوردشده در حالتی که ضرایب مدل فازی هستند و خروجی های مشاهده شده فازی و یا غیر فازی هستند تشریح می شوند. یکی از اشکالاتی که به روش تاناکا و همکاران وارد است حساس بودن آن نسبت به داده های پرت می باشد که باعث می شود در حضور داده های پرت فواصل پیش بینی وسیع به دست بیایند که مطلوب نیست. تا کنون چندین روش ارائه شده است که این مشکل را برطرف کنند. یک روش معرفی متغیر جدید و شکل گیری مسأله ی برنامه ریزی خطی فازی با فواصل فازی و به دست آوردن فواصل براورد منطقی می باشد. روش دیگر افزودن تعدادی محدودیت به محدودیت های مسأله ی اصلی و شناسایی نقاط پرت و اصلاح محدودیت های مربوط به نقاط پرت می باشد. در این صورت نیز اثر نقاط پرت حذف می شود.همچنین روش تاناکا و همکاران هنگامی که روند پهناها و نمای داده ها جهت عکس داشته باشند نتایج نامناسبی دارد که با استفاده از روش جدیدی که محدودیت روی علامت پهناها در مسأله ی برنامه ریزی خطی را حذف می کند، این مشکل برطرف می شود. برای شناسایی نقاط پرت روشی ارائه می گردد که با حذف هرکدام از داده ها اثرآن بر تابع هدف مسأله ی برنامه ریزی خطی بررسی می شودو نقطه ی پرت تشخیص داده می شود.