many image recognition algorithms has increased significantly. The presence of huge number of computations and data in these networks requires using high-performance accelerators in their hardware implementations. As a result, many efficient accelerators have been proposed for hardware implementation. In the conventional approach of designing the accelerators, the CNN layers proceed iteratively layer by layer. In this approach, due to the large number of intermediate data, the accelerator must use off-chip memory to store data between the layers. In this work, by exploiting the dataflow mechanism across the convolutional layers, some parts of input data are stored in the internal memory, and by using an appropriate calculations approach, adjacent CNN layers are computed in a pipeline structure without a need to store intermediate data. In this approach, only the output data of the last layer is needed to be stored in an off-chip memory. To evaluate the performance of the proposed accelerator which is named MLCP architecture, 3 adjacent convolution layers were processed concurrently in a pipeline structure. Results are compared with those of the SLCP architecture, in which calculations were performed layer by layer. Both SLCP and MLCP architectures are designed at RTL level by using Verilog HDL, and implemented on the FPGA Zynq-7000 family chip. The results of MLCP architecture show a 73% on-chip storage reduction in the case of storing intermediate data on the on-chip memory, and a 6.6 times lower off-chip memory access rate in the case of storing intermediate data on an off-chip memory. Also, by applying optimization techniques and using parallel computation, the throughput of the MLCP architecture has been 2.7 times higher than that of the SLCP architecture. This approach is also used to implement two first convolution layers of VGG-16 model network. Along with achieving 232 GOPS performance, the number of BRAMs and the number of external memory accesses are reduced compared to those of traditional implementations. This has increased the energy efficiency of this implementation compared to other works. Key Words: Convolutional Neural Network (CNN), Multi-Layer Processing, Pipeline Processing, Off-chip Memory Access, Hardware Implementation, FPGA.