آموزش رفتار دروازه‌بانی ربات انسان‌نمای فوتبالیست شبیه‌سازی شده با استفاده از روش یادگیری تقویتی قطعی عمیق

STUDENT

DEGREE

YEAR

The main focus of this research is the domains of reinforcement learning (RL) and neural network in behavior learning of goalie humanoid robot in a three-dimensional soccer simulation environment. RL is a branch of machine learning to choose action in an unknown environment to maximize the cumulative reward. One of the most important goals of development in robotics and artificial intelligence is the winning of a team of autonomous humanoid robots against human teams in a soccer game. Among all the agent’s behavior, the goalie’s behavior in a soccer match is an important problem. RL in an environment with continuous states and actions provides a proper method for learning the agent’s behavior at any time. According to advances that have been made in this field, the goalie’s humanoid robot has been able to shut out much more ground shots by designed controllers and RL. Achieving better performance requires implementing a method to control the agent’s behavior to perform a proper response in a more complex environment and with various shots, including aerial shots. Therefore, control of agent behavior in complex environments will be necessary. But traditional RL algorithms are inefficient in the environment with two following attributes: 1) high-dimensional state spaces, such as pixels of camera images, 2) high-dimensional continuous action spaces. This research tackles the goalie problem using RL algorithm where two asynchronous RL learners are utilized to achieve better performance. performance on this problem is the number of shots shut out by the goalie in the goalie challenge. Recently, powerful RL methods such as Deep RL Method and RL with Actor-Critic architecture based on Policy Gradients method, have been proposed to solve robot control problems over a wide range of action spaces. Using these two methods and deep neural networks with more robust network architecture, a new hybrid method is proposed that can solve continuous control problems. In this research, first, the problem of goalie’s humanoid soccer robot is modeled using two reinforcement learners. To determine the state of the environment, a method is proposed to predict the trajectory of the ball. Then the skill description language is used to design skills such as dive to cover more area by the goalie and the action space is specified and then, by combining two reinforcement learners, doing behavior control of goalie humanoid robot. Finally, it has been shown that the RL agent in shutting out the ground and aerial shots, is more efficient than the methods implemented by top teams. Keywords Humanoid Soccer Robot, Three Dimensional Soccer Simulation, Goalie Challenge, Deep Reinforcement Learning, Policy Gradient, Deep Neural Networks, Behavior Learning, Continuous Control

تمرکز اصلی این پژوهش در زمینه یادگیری تقویتی و شبکه عصبی در مساله آموزش رفتار ربات انسان‌نمای دروازه‌بان در محیط شبیه‌سازی سه بعدی فوتبال است. یادگیری تقویتی شاخه‌ای از یادگیری ماشین مربوط به یادگیری انتخاب عمل در یک محیط ناشناخته برای بیشینه کردن مجموع پاداش‌های دریافتی است. یکی از اهداف مهم برای توسعه علم رباتیک و هوش مصنوعی، پیروزی تیم ربات‌های انسان‌نمای فوتبالیست خودمختار، در یک بازی فوتبال در مقابل تیم‌های انسانی است. از میان تمام رفتار‌های عامل، رفتار دروازه‌بانی در یک مسابقه فوتبال مساله مهمی است. یادگیری تقویتی یک روش کنترلی مناسب برای یادگیری رفتار عامل‌ در هر زمان فراهم می‌کند. باتوجه به پیشرفت‌هایی که تا کنون در این زمینه انجام شده، ربات دروازه‌بان توانسته با کنترلرهای طراحی شده و یادگیری تقویتی تعداد زیادی از شوت‌های زمینی را دفع نماید. دستیابی به کارایی بیشتر مستلزم پیاده‌سازی روشی برای آموزش رفتار دروازه‌بانی به عامل، برای انجام واکنش‌های مناسب در محیط پیچیده‌تر و با شوت‌های مختلف از جمله شوت‌های هوایی است. بنابراین آموزش رفتار دروازه‌بانی به عامل در محیط‌های پیچیده امری ضروری خواهد بود. الگوریتم‌های یادگیری تقویتی سنتی با محیط‌هایی شامل دو ویژگی‌ 1) فضای حالت پیوسته با ابعاد بالا، مانند پیکسل‌های تصاویر دوربین و 2) فضای عمل پیوسته با بعد بالا، مشکل دارند. این پژوهش به بررسی نحوه مدلسازی مساله دروازه‌بانی با استفاده از الگوریتم یادگیری تقویتی برای کنترل پیوسته و استفاده دو یادگیری تقویتی ناهمزمان برای دست‌یابی به بیشترین کارایی می‌پردازد. کارایی در این مساله تعداد شوت‌های دفع شده توسط دروازه‌بان در چالش دروازه‌بانی است. در سال‌های اخیر روش‌های یادگیری تقویتی قدرتمندی مانند: روش یادگیری تقویتی عمیق و روش یادگیری تقویتی با معماری عملگر-نقاد بر پایه گرادیان سیاست، برای حل مسایل کنترل ربات در ابعاد بالای فضای عمل، ارائه شده است. با استفاده از این دو روش و شبکه‌های عصبی عمیق با معماری شبکه مقاوم، روش ترکیبی جدیدی ارائه شده است که قادر به حل مسایل کنترل پیوسته خواهد بود. در این پژوهش ابتدا مساله دروازه‌بانی ربات انسان‌نمای فوتبالیست با استفاده از دو یادگیر تقویتی برای یک عامل مدلسازی می‌شود. برای تعیین حالت محیط، روشی برای پیش‌بینی مسیر حرکت توپ جهت تعیین فضای حالت مساله ارائه می‌شود. سپس از زبان توصیف مهارت برای طراحی مهارت‌هایی نظیر شیرجه زدن برای پوشش محدوده بیشتر توسط دروازه‌بان ارائه شده و فضای عمل مساله مشخص شده و سپس با ترکیب دو مدل یادگیری تقویتی به آموزش رفتار دروازه‌بانی به ربات انسان‌نمای شبیه‌سازی شده، پرداخته می‌شود. درنهایت نشان داده می‌شود که عامل یادگیری تقویتی در دفع شوت‌های زمین و هوایی که به سمت دروازه‌ زده می‌شود، کارایی بیشتری نسبت به روش‌های پیاده‌سازی شده توسط تیم‌های برتر دارد. کلمات کلیدی: ربات انسان‌نمای فوتبالیست، شبیه‌سازی سه بعدی فوتبال، چالش دروازه‌بانی، یادگیری تقویتی عمیق، گرادیان سیاست، شبکه‌های عصبی عمیق، آموزش رفتار، کنترل پیوسته