One of the methods used to make control systems intelligent is reinforcement learning. This algorithm based on human intelligence and behavior creates the capability of learning for systems. Q-learning, as one of the methods of reinforcement learning, by trial and error, learns how to control the system. To design and implementaion of Q-learning algorithm, it is needed both increasing the learning convergence and improving control signals amplitude. In this proposed research, based on developing Q-learning, some approaches are introduced to control Cart-Pole systems. In this way, at the first step, by utilizing maximum reward for Q-learning, the speed of convergence is improved. In addition with a condition iired by delayed Q-learning to update the value function, the convergence speed is increased, too. At the second step, to modify the amplitude of the control signals applied to the system, k-nearest neighbor (k-NN) concept is used in standard Q-learning. Convergence of the presented algorithm is verified by Lyapunov method. At the third step,a new method based on combination of delayed Q-learning and k-nearest neighbor is presented and the convergence is proved by lyapunov theorem. Genetic algorithm is applied to achieve the optimal problem parameters such as, learning rate, discount factor, updating condition, number of considered neighbor, parameter decreasing rate and etc. Numerical simulations for Cart-Pole system show speed increasing for converging to a deterministic radius of closed loop system equilibrium point. further, control signal amplitude is decreased in k-nearst neighbor methods based. Finally, all three presented methods are compared with standard Q-learning, delayed Q-learning and methods of k-nearest neighbor for reinforcement learning, and some conclusions and arguments are stated.