Diffusion-based Data Augmentation for Object Counting Problems
CoRR(2024)
摘要
Crowd counting is an important problem in computer vision due to its wide
range of applications in image understanding. Currently, this problem is
typically addressed using deep learning approaches, such as Convolutional
Neural Networks (CNNs) and Transformers. However, deep networks are data-driven
and are prone to overfitting, especially when the available labeled crowd
dataset is limited. To overcome this limitation, we have designed a pipeline
that utilizes a diffusion model to generate extensive training data. We are the
first to generate images conditioned on a location dot map (a binary dot map
that specifies the location of human heads) with a diffusion model. We are also
the first to use these diverse synthetic data to augment the crowd counting
models. Our proposed smoothed density map input for ControlNet significantly
improves ControlNet's performance in generating crowds in the correct
locations. Also, Our proposed counting loss for the diffusion model effectively
minimizes the discrepancies between the location dot map and the crowd images
generated. Additionally, our innovative guidance sampling further directs the
diffusion process toward regions where the generated crowd images align most
accurately with the location dot map. Collectively, we have enhanced
ControlNet's ability to generate specified objects from a location dot map,
which can be used for data augmentation in various counting problems. Moreover,
our framework is versatile and can be easily adapted to all kinds of counting
problems. Extensive experiments demonstrate that our framework improves the
counting performance on the ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS
datasets, showcasing its effectiveness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要