Purpose/Objective For online IGRT, rapid image processing is needed. Fast parallel computations using graphics processing units (GPUs) have recently been made more accessible through general purpose programming interfaces. We present a GPU implementation of the Horn and Schunck method for deformable registration of 4DCT lung acquisitions to exemplify the use of GPUs in IGRT. Materials/methods The registration is evaluated on the POPI-model acquired at the Léon Bérard Cancer Center, France. It consists of thorax CT image series (resolution 482×360×141 and voxel size 0.98×0.98×2.0 mm3) from 10 respiration phases in a free breathing volunteer and 41 anatomical landmark points in each image series. The registration method used is a multi-resolution GPU implementation of the 3D Horn and Schunck algorithm. It is based on the CUDA framework from Nvidia. Results On an Intel Core 2 CPU at 2.4GHz each registration took 30 minutes. On an Nvidia Geforce 8800GTX GPU in the same machine this registration took 37 seconds, making the GPU version 48.7 times faster. The nine image series of different respiration phases were registered to the same reference image (full inhale). Accuracy was evaluated on landmark distances before and after deformable registration. Original average landmark distance was 3.5 mm ± 2.0 mm (max = 14.0 mm). After registration, this average distance was equal to 1.1 mm ± 0.6 mm (max = 3.6 mm) which is well below the slice thickness of 2 mm. Conclusion Using the GPU has led to a very significant reduction of the registration time due to the parallelized architecture of the GPU. Considering the slice spacing we find the registration result acceptable. The accuracy is comparable to previous results for the Demons algorithm in the POPI model (Vandemeulenboucke et al, ICCR 2007). The processing power of GPUs can be used for many image processing tasks in IGRT making it a useful and cost-effecient tool to help us towards online IGRT.