This white paper provides an explanation of CDVA standard specification, which defines a descriptor exploiting the temporal redundancy in video, and a descriptor component based on features extracted using a convolutional neural network (CNN) in order to benefit from the recent progress made in deep learning.