Vehicle Detection based on Spatial Saliency and Local Image Features in H.265 (HEVC) 4K Video and Evaluation Model for Quality of Detection Accuracy
概要
In order to realize a safe and secure road transportation system, research on intelligent transportation systems (ITS) is widely conducted. In the optimization and management of traffic, technology for detecting vehicles is important, and research on detecting objects using information obtained from images, still images, and sensors has been widely conducted. In this study, one of the main challenges is to develop vehicle detection. Most of the existing visual saliency models use the input images, in which salient objects are to be detected, are free from complex background and overlapping areas. Moreover, they are very sensitive to the complex scene and different illuminations. They cannot detect their interest objects from the input video. This study develops a vehicle detection method by using spatial saliency and local image features. The Scale Invariant Feature Transform (SIFT) and Harris features in combination with spatial saliency model play an important role to detect vehicle from the scene. One-to-one symmetric search is performed on the descriptors to select a set of matched interest point pairs for vehicle detection. The one- to-one symmetric search on the descriptors is useful for detection of the interest object in the context of saliency detection. We use 4K video of a road scene with different types of vehicles. The propose method is able to detect desired overlapping objects from the road scene without heavy computation like other training based methods. In the second, the detection performance is analyzed with another saliency based methods. Our methods have better performance as compared to the other conventional methods.
In the images/videos based applications over internet are typically stored in the compressed domain such as MPEG2, H.264, MPEG4, since they can reduce the storage space and greatly increase the delivering speed for Internet users. Most of the systems require transmission of data to some central server and have to deal with some issues such as limited bandwidth and quality. Consequently, they require to transmit videos with a reasonable high-quality in compressed domain for further processing by vision-based systems, such as person identification, fraud detection, and vehicle detection for road monitoring. Furthermore, existing saliency detection models are implemented in uncompressed domain and lack of analysis their performance. Therefore, there still have challenging research issues to detect interest objects with the conventional saliency based methods, and determined the reasonable high-quality video in compressed domain. From these contexts, we analyze the proposed detection method in compressed domain, and it shows better result in compared with conventional methods and single feature based detection.
During vehicle detection, it is necessary to know the correct vehicle position considered as “ground truth” in order to evaluate the vehicle detection method. For this reason, many detection models define areas of the targeted object, where people considered areas of the objects. In many studies, the ground truth is represented by a rectangle. We consider the relationship between Intersection over Union (IoU) and subjective vehicle detection by considering shifted from the ground truth position. In this study, subjective evaluation experiments have been carried out with respect to misalignment from ground truth in vehicle detection. We also investigate subjective evaluation model with respect far and near view in vehicle detection. Based on the experimental results, we see that there is a significant difference in left and right misalignment even if the Intersection over Union (IoU) value was the same. Finally, we propose indices considering subjective evaluation model in vehicle detection utilizing IoU.