Research on application of a Faster R-CNN based on upper and lower layers in face detection

姜 琳 富山大学



At present, face detection technology has been widely used in many fields such as security, campus and finance. As society progresses and technology further matures, face detection technology will inevitably be applied to more fields . However, the human face is a very common yet complex pattern that contains a large amount of information . It is a challenge to distinguish faces from other objects in complex background images, and face detection becomes more difficult due to variations in the proportions, pose, facial expressions, lighting, image quality, age, occlusion and other factors of faces. Therefore, in order to accomplish robustness of the detection method, the designed detection algorithm must consider the possible interference caused by various complex backgrounds of faces .

 Faster R-CNN is a fast region convolutional network based method. It can effectively improve detection efficiency and accuracy by using deep convolutional networks to efficiently extract and classify the object to be detected. Compared to traditional face detection techniques, Faster R-CNN uses Region of Interest Pooling (ROI Pooling) techniques to allow the network to share computational results, thus speeding up the model . Traditional CNN structures can maintain a degree of translation and rotation invariance to specific locations in the input image. This spatial invariance exists only for local regions of the input image, and overall spatial rotational invariance for superimposed local regions cannot be achieved for the entire image. This is because the pooling layer in the CNN structure has many limitations, such as the fact that much useful information is lost when extracting features and the input data is only a local operation . The feature maps in the middle of the CNN framework produce large distortions, making it difficult for CNNs to implement spatial transformations such as image rotation and scaling. the feature maps produced during feature extraction in CNNs are not an overall transformation of the input data, which is more restrictive. Also, when the training face dataset has a large amount of data, it generates a large number of candidate regions that occupy disk space. And when the candidate regions are transferred to the CNN, they are normalised in advance, causing a loss of information . Each candidate region is placed in the network, resulting in the same feature being extracted repeatedly and wasting resources .

 The network in this paper is an improvement of the traditional Faster R-CNN, proposing a multi-scale fast RCN method based on the upper and lower layers that can robustly detect small targets at different scales, poses and environments . Experimental results show that the multiscale fast RCNN based on both upper and lower layers has better detection performance than fast RCNN while maintaining the same testing cost. Compared with the current state-of-the-art face detection methods, the improved method in this paper can accurately detect faces with different poses in various environments.

 The contributions and originality of this paper are summarised as follows.
 (1) A new spatial radiometric transform is proposed for the first time to improve the detection capability of the original Faster-RCNN. The spatial affine transform identifies face parts by detecting meaningful regions in the image, thus improving the detection of small face parts by the original network. The experimental results also verify that the face detection effectiveness of UPL-RCNN is improved compared with other networks.
 (2) A method combining the upper and lower layers is proposed. The upper layer adopts the bionic transformation strategy and the lower layer adopts the feature region strategy. It enables the original network to robustly detect small targets at different scales, in different poses and under different environments.
 (3) The bionic spatial transform uses feature fusion to enhance the continuity of the action, which can better improve face recognition.



