Improvements in the fabrication technology have resulted in the exponential growth of FPGA size and complexity in recent years. One of the most critical stages in the FPGA CAD flow is the block placement that tries to place the logic elements on the FPGA surface in order to shorten the wire length and the time delay of the circuit. In most designing tools, annealing methods are widely chosen due to superior quality of results and robustness. In terms of desktop computing power, technology improvements have resulted in an increase in the number of parallel cores and consequently, achieving potentially significant speed ups by applying parallelized programming techniques. In recent years, devices such as General Purpose computing on Graphics Processing Units (GPGPU) have offered a promising solution to improve runtime using only commodity hardware. Therefore, changing the serial version of an algorithm to a parallel one, which is able to act on multiple threads, is a suitable way for gaining a high speed-up. In this thesis we studied the VPR tool, which is one of the most popular FPGA placement and routing tools in research and industry. The VPR placement algorithm works based on the simulated annealing method that acts serially; as a result finding a proper parallel method to achieve a good speed-up on multiple threads and maintain exactly the same deterministic result as the serial version seems difficult. On the other hand, if one is to implement a fully parallel implementation using completely independent annealing moves, the speedup would be great but the quality drastically worse than the serial version. Therefore, our goal was to conquer this serial nature using multiple threads concurrently to obtain proper speed-up without a significant reduction in the quality of result. We proposed and analyzed four parallel methods and finally simulated them on a CPU, mimicking to be able to run on SIMD device architecture such as a GPGPU with hundreds of concurrent threads. The parallel moves method, which tries to distribute the swapping function between the threads, can yield good speed-up on a number of limited threads, but the quality might be reduced when we increase the number of the threads. While the upper bound of the speed-up achieved by a method based on the speculative computation, which is dependent on the rate of the move acceptance in the VPR, was limited. The idea was starting a new move, using the predicted result of the current move. The third method, called the average coordination of the blocks, failed due to irresolvable block conflicts. It tried to derive the new placement by gathering the results of all of Key Words Parallelizing, FPGA, Placement, VPR, Simulated Annealing