This work extends our preliminary work in IEEE ICIP’14 conference and has been published in 2016 in a CSVT journal paper.
Principle
Since digital content spread on public networks, concerns arise about their protection against miscellaneous unwanted behaviors; undesirable visualisation, modification, reproduction, etc. Amongst the different research tracks, Selective Encryption (SE) methods propose a robust and efficient scheme for video protection. These methods exploit a proper understanding of encoding and decoding protocols to encrypt a selected part of the stream.
Selective Encryption does not pursue message privacy protection. Instead, the objective is to modify the message content in a way that an unauthorized receiver (i. e., here, a standard decoder for the source viceo) can normally decode any ciphered stream. Only the displayed content is distorted and made non intelligible by SE. A good exemple of a valid syntax element in the High Efficiency Video Coding (HEVC) standard is the sign of Motion Vectors (MV). Encoded with one bin treated by the by-pass mode of the CABAC, its modification has no effect on the decoder (apart from the inversion of the MV). On the other hand, modifying the block type of a Coding Unit (CU) may often result in a stream that is not format compliant. Indeed, a decoder processing a CU whose type is set to Inter instead of Intra will expect some motion information. This data does not exist, and the mismatch will evidently provoke a rupture in the decoding process.
The objective of SE techniques is thus to identify all (or a selected part) of the elements of the stream that can be scrambled without disrupting a standard decoding process.
The proposed extension
One constraint set in the literature for a proper encryption of CABAC encoded streams is the limitation of Encryption Space (ES) to bins treated in by-pass (or model-free) mode only ([1], [2], [3], [4], [5]). These bins do not exploit and update probability contexts during the arithmetic coding. Thus, their modification does not cause any asynchronism between encoders/decoders.
Some syntax elements are into this first category of by-pass encoded codewords. We can thus consider the following syntax elements as possible candidates for encryption: sign and level of Motion Vectors (MV) difference, MV reference index, MV prediction index, sign and level of residual information and SAO related codewords. Note that some additional constraints may be needed in order to ensure the exact preservation of the source format.Yet, this list can be considered as the State-of-the-art (SoA) Encryption Space (ES) [4].
In our recent work [6], we propose an extension of the ES by the inclusion of a new cipherable syntax element: the prediction modes for intra blocks/units. In a more global point of view, we provide a theoretical description of CABAC regular mode encryption, based on a specific monitoring of CABAC context modeling.
Doing so, we showed that their encryption significantly improves the structural deterioration of video contents – especially on I frames -, bringing a solid complement to previous SE schemes relying on residual encryption (mainly affecting the texture of objects) and motion vector encryption (disrupting the movement of objects). In the view of fully hiding video contents, the proposed scheme shows both numerical and visual improvements regarding state-of-the-art by-pass only encryption schemes.
Efficient decomposition of the ES
In order to correctly understand the effect of scrambling some particular syntax elements in the compressed domain, we propose the subsequent decomposition of the ES:
– Motion Encryption Domain (SoA): Sign and level of motion vector difference, MV ref_idx, MV_pred_idx.
– Texture Encryption Domain (SoA): Sign and level of residual information, SAO related codewords
– Structure Encryption Domain (proposed): Luma prediction modes for intra blocks/units.
One can easily note that motion encryption (gathering all MVD related codewords) has no effect on I-frames, limiting encryption in SoA approaches to texture encryption in these frames. We will thus show in this section that the inclusion of the structure encryption significantly improves the visual scramble of I-frames. Plus, we highlight the fact that structure encryption performs greatly throughout the GOP, contributing to the dissimulation of structural content, which remained one of the weak points of the SoA results.
Numerical results use the metrics developped in [7]: the Edge Similarity Score (ESS) and the Luminance Similarity Score (LSS). Specifically adapted to partially encrypted contents, these metrics allow a more precise analysis than PSNR/SSIM scores.
The first thing we observe is that motion encryption has a very little effect on the scramble efficiency comparing to texture and structure encryptions, and absolutely no effect on I-frames (GOP of 16 frames in the shown experiments). The second thing one can note is that structure encryption gathers better ESS scores (related to edges) while performing less well in terms of LSS (related to luminance). This result is quite straightforward according to the theoretical effect of each encryption domain: texture encryption modifies the residual data, which has a great impact on the average luminance of each block. Conversely, structure encryption mainly modifies the direction of edges, whilst preserving the color components of each block.
The visual results in this section confirms that the deterioration of structural contents (lines on the floor, background, player outlines…) is mainly the result of the proposed encryption of luma prediction modes as a structural encryption domain. Our approach thus brings a solid complement to the two SoA encryption domains, allowing to protect more efficiently any video stream.
1. Numerical comparison of SE efficiency for each encryption domain over a few GOPs.
2. Visual comparison of SE efficiency for each encryption domain.
from left to right: original, motion, texture and stucture encryption domains
from top to bottom: 1st (I) and 10th (P) frame of BBpass
1. Comparison of the overall SE efficiency on a zommed part of some frames of BBPass sequence.
from left to right: original, state of the art, proposed encryption
from top to bottom: 1st (I), 5th (P), 10th (P) and 15th (P) frame of BBpass
2. Comparison of the overall SE efficiency on the 1st (I) frame of stefan sequence.
From left to right: Original, state of the art, proposed encryption
[1] – Z. Shahid, W. Puech, and M. Chaumont, “Fast protection of H.264/AVC by selective encryption of cabac for i and p frames,” in European signal processing conference, Glasgow, Scotland, 2009.
[2] – M. Asghar and M. Ghambari, “An efficient security system for CABAC bin-strings of H.264/SVC,” in IEEE Transactions on Circuits and systems for Video Technology, vol. 23, no. 3, June 2013, pp. 425–437.
[3] – F. Dufaux and T. Ebrahimi, “H.264/AVC video scrambling for privacy protection,” in IEEE ICIP, San Diego, CA, United States, Oct. 2008.
[4] – Z. Shahid and W. Puech, “Visual protection of HEVC video by selective encryption of CABAC binstrings,” IEEE Transactions on Multimedia, vol. 16, no. 1, pp. 24–36, Jan. 2014.
[5] – G. Van Wallendael, A. Boho, J. De Cock, A. Munteanu, and R. Van de Walle, “Encryption for High Efficiency Video Coding with video adaptation capabilities,” IEEE Transactions on Consumer Electronics, vol. 59, no. 3, pp. 634 – 642, Aug. 2013.
[6] – B. Boyadjis, C. Bergeron, B. Pesquet-Popescu, F. Dufaux, “Extended Selective Encryption of H.264/AVC (CABAC) and HEVC encoded video streams”, in IEEE Transactions on Circuits and Systems for Video Technology , no.99, 2015.
[7] – Y. Mao and M. Wu, “A joint signal processing and cryptographic approach to multimedia encryption,” in IEEE Transactions on image processing, vol. 15, no. 7, July 2006, pp. 2061–2075.