بررسی روش های موجود و ارائه روشی جدید در ارتباط با تغییرات ساختار ژنوم

STUDENT

DEGREE

YEAR

A haplotype is a set of Single Nucleotide Polymorphisms from a given DNA and has valuable genetic information. are the most common genetic structural variants and haplotypes encode in a single DNA. Haplotype has a relevant role in several genetics studies like mapping complex disease gens, drug design and so on. However retrieving directly haplotype from DNA sample using existing technonolgies is expensive and also time consuming. This motivates the increasing interests in techniques for inferring haplotype data from genotypes which can be obtained quickly and economically. Each genotype is a description of the conflated on two haplotypes inherited from both parents. Although a number of association studies can be done using only genotypes, haplotype information is essential for the detailed analysis of the mechanisms of disease. While assessing the genetic contribution to traits, it may often be much more informative to have haplotype data rather than to have only genotype data. The haplotype inference problem under the assumption of pure parsimony consists in finding the smallest number of haplotypes that explain a given set of genotypes. This problem is proved to be NP-hard. Several methods have been proposed since Clarck introduced the first algorithm to solve haplotype inference problem in 1990. Methods for solving this problem are divided into two algorithms, namely exact and inexact methods. Existing exact methods guarantee obtaining purely parsimonious solutions but have exponential time-complexities and are not practical for large number or length of genotypes. However, inexact methods are relatively fast but do not always obtain optimum solutions. In this dissertation at frist an exact algorithm called ISHIPs is proposed which has an improvement on an existing sat-based algorithm namelySHIPs. SHIPs is the best current exact algorithm for haplotype inference probleb by pure parsimony. ISHIPs uses a heuristic algorithm namely Collhaps to compute an upper bound for the problem in the preprocessing phase. Then make a sat-based model given genotypes. And try to solve the model by using an existing sat-solver. In the next part of the dissertation a heuristic method proposed to solve haplotype inference problem called freeCollhaps. freeCollhaps algorithm introduces freedom degree concept and uses it to compute differences between a pair haplotypes.freeCollhaps has an iterative procedure. In the next step we use freeCollhaps as preprocessing phase of ISHIPs algorithm. For improving freeCollhaps algorithm, we add a simplification phase to this algorithm. The simplification phase causes to reduce the complexity of the problem instances.the best current heuristic until now is Collhaps which has a good performance on large scale datasets. The performance of proposed algorithm was evaluated on several datasets. Datasets are divided into two groups, random generated (simulated) da

یک هاپلوتیپ مجموعه ای از چندشکلی های تک نوکلئوتیدی (اسنیپ ها)، و اسنیپ از معمول ترین تغییرات ساختاری می باشد. از آنجایی که برخی از بیماری های ژنتیکی ناشی از تغییرات در ساختار ژنوم است، بررسی هاپلوتیپ‌ها?مورد توجه بسیاری از محققین قرارگرفته است. بدلیل محدودیت های تکنیکی در بدست آوردن هاپلوتیپ ها، در اغلب موارد به‌جای هاپلوتیپ‌ها از ژنوتیپ‌ها که اطلاعات کمتری را در بر دارند استفاده می‌شود. لذا مسئله بدست آوردن یک مجموعه هاپلوتیپ?از روی ژنوتیپ ها، که استنباط هاپلوتیپ?نامیده می‌شود، از اهمیت ویژه‌ای برخوردار است. در صورتی که برای یک مجموعه ژنوتیپ به دنبال یافتن کمترین تعداد هاپلوتیپ مورد نیاز برای توصیف آن مجموعه باشیم مسئله ی استنباط هاپلوتیپ با دیدگاه صرفه جویی کامل را داریم. نشان داده شده است که مسئله استنباط هاپلوتیپ، در حالت صرفه جویی کامل، یک مسئله NP-سخت می باشد. از زمان ارائه اولین الگوریتم توسط کلارک تا کنون روش های متعددی برای حل این مسئله مطرح شده ، که به دو دسته الگوریتم های دقیق و غیر دقیق تقسیم می شوند. در حال حاضر بهترین الگوریتم دقیق برای این مسئله استنباط هاپلوتیپ در دیدگاه صرفه جویی کامل الگوریتمSHI می‌باشد که مسئله را به تعدادی مسئله ارضای بولین کاهش می‌دهد. متاسفانه زمان اجرای این الگوریتم، علی رغم بهبودهای صورت گرفته، برای اندازه های بزرگ مسئله زیاد بوده و هنوز روش های افزایش سرعت برای آن مورد نیاز می‌باشد. در الگوریتم های غیر دقیق نیز الگوریتم Collhaps بهترین کارایی را دارد،که یک الگوریتم مکاشفه ای می باشد. اما می توان هنوز کارایی آن را بهبود بخشید. در این پایان نامه دو الگوریتم پیشنهاد شده است. یکی از این الگوریتم ها دقیق و دیگری غیر دقیق می باشد. در حقیقت الگوریتم دقیق ارائه شده بهبودی بر بهترین الگوریتم دقیق موجود می باشد. الگوریتم مکاشفه ای ارائه شده نیز بهبودی بر الگوریتم Collhaps می باشد. سپس از الگوریتم مکاشفه ای پیشنهادی به عنوان بخش پیش پردازش در الگوریتم پیشنهادی دقیق ارائه شده استفاده می گردد. کارایی هر کدام از الگوریتم های پیشنهادی توسط آزمایشات متعدد برای روی چندین مجموعه داده تایید می‌گردد. کلمات کلیدی: استنباط هاپلوتیپ، دیدگاه صرفه جویی کامل، الگوریتم دقیق، الگوریتم مکاشفه ای، NP-سخت 1- فصل اول