De-identification is a process to remove all identification information of the person from an image or video, while maintaining as much information on the action and its context. Recognition and de-identification process are opposite. Identifying information captured on video can include face, silhouette, posture and gait. This presentation on De-Identification discuss about general framework which protect the privacy of the individuals while providing sufficient feel for the human activities in the space being imaged. It is easy to hide the identity of individuals by replacing a conservative area around them by say, black pixels.
Different Scenarios and De-identification
- Casual videos : captured for other purposes and get shared.
- Public surveillance videos : come from cameras watching spaces such as airports, streets, stores, and so on.
- Private surveillance videos : cameras placed at the entrances of semi-private spaces like offices.
Different Criteria for De-Identification are Face which plays a dominant role in automatic and manual identification. The body silhouette and the gait are important clues available in videos along with Race and gender. De-identification can be subverted or “attacked” to reveal the identity of individuals involved. Reversing the de-identification transformation is the most obvious line of attack. Recognizing persons from face, silhouette, gait, and so on, is being pursued actively in computer vision. Manual identification is another way to subvert de-identification, though it is considerably more expensive. Brute force verification is a way to attack a de-identified video.
Storage of Videos
The de-identification should be selectively reversed when needed. Only the transformed video is transmitted or recorded. Another approach is to store the original video, with sufficiently hard encryption, along with the de-identified video.
The system is comprised of three modules. They are Detect and Track, Segmentation and De-identification. The first step is to detect the presence of a person in the scene. Patch-based recognition approach for object tracking. The output of the human detector becomes the input to the tracking module. The value of F depends on the amount of movement in the video.
The bounding boxes of the human in every frame, are stacked across time to generate a video tube of the person. Multiple video tubes are formed if there are multiple people in the video. The video space is first divided into fixed vowels of size (X × Y × T). Segmentation assigns each vowel, a label, 1 for foreground and 0 for background. The energy term E associated with the graph is of the form.
There are two de-identification transformations. They are Exponential blur of pixels of the vowel and line integral convolution (LIC). In Exponential blur of pixels of the vowel, all neighbouring vowels of a foreground vowel within the distance a participate in de-identification. The parameter a controls the amount of de-identification; more the value of a, more is the de-identification.
In line integral convolution, imaging vector fields on a texture. A long and narrow filter kernel is generated for each vector in the field whose direction is tangential to that of the vector and length is 2L. LIC distorts the boundaries of the person which tends to obfuscate silhouettes. LIC have a Saddle shaped vector field.