Computer networks are important examples of distributed dynamic systems. Distributed control in these systems, especially at the routing level, is necessary to make the network behavior adaptive to changes in topology, data traffic, services, etc. Recently, researchers have investigated new routing algorithms which provide better adaptivity, building on advances in machine learning. Reinforcement Learning is an unsupervised learning method which its goal is to learn a policy, a map from perceptions to actions, based on the feedback received from the environment. This learning task can be viewed as a search of policies which are evaluated through their interactions with the environment. Q-learning is one of the most applicable reinforcement learning algorithms. In this thesis, network is modeled as a multiagent system in which every router represents an agent. Each agent uses q-learning to learn the states of the network to choose the best possible action for each state. In this model, the status of each node is defined as a function of the status of adjacent nodes and its links to them. So any changes in the status of a link or a node affects the states of adjacent nodes (agents) and cause them to take more appropriate actions based on theses changes.