THESIS
2020
Abstract
Steganography has been largely studied in the computer vision community for digital watermarking
or secret messaging. Recently, various deep learning-based approaches have been
proposed to tackle the steganography task. However, in terms of capacity, existing methods are
not enough to hide large data types like video content due to its high bitrate. In this thesis, we
push this limit by investigating methods for hiding video content inside audio files. To solve this
novel cross-modal steganography, we first introduce several models based on existing methods
as the baseline. However, we discovered the clear limitations of those models in evaluation.
Finally, to mitigate these limitations, we devise a new optimization-based method named HvFO
to improve not only the reconstructed v...[
Read more ]
Steganography has been largely studied in the computer vision community for digital watermarking
or secret messaging. Recently, various deep learning-based approaches have been
proposed to tackle the steganography task. However, in terms of capacity, existing methods are
not enough to hide large data types like video content due to its high bitrate. In this thesis, we
push this limit by investigating methods for hiding video content inside audio files. To solve this
novel cross-modal steganography, we first introduce several models based on existing methods
as the baseline. However, we discovered the clear limitations of those models in evaluation.
Finally, to mitigate these limitations, we devise a new optimization-based method named HvFO
to improve not only the reconstructed video quality but also embedded audio fidelity.
The HvFO uses recent advances in flow-based generative models that enable effective mapping
audio to latent codes with further optimization so that nearby codes correspond to perceptually
similar signals. We show that compressed video data can be concealed in the latent codes
of audio sequences while preserving the fidelity of the hidden video and the original audio.
We can embed 128x128 video inside same-duration audio, or higher-resolution video inside
longer audio sequences. Quantitative experiments show that our approach outperforms relevant
baselines in steganographic capacity and fidelity.
Post a Comment