Abstract:Person re-identification is the task of retrieving a specified target from multiple data sources. The difference between infrared (IR) and visible light (VIS) images is large, and cross-modal retrieval of visible light and infrared images is one of the main challenges. In order to have the same retrieval ability even in low light or at night, the judgment needs to be achieved by combining cross-modal modeling of infrared images. In this paper, we propose a new method of guiding attention through human keypoints, where global features are split into local features by keypoint guidance, and then the original model is retrained with the generated local masks to strengthen the attention to different local information. Using this method, the model can better understand and utilize the key regions in the image, thus improving the accuracy of the person re-identification task.