The audios are in the backdrop. v1 typically indicates a lower voice, with v2 as a higher version.
I hope you like it!