Averaging length is in frames so a second is ~30 frames blue line is raw sound, orange is the smoothed curve, white line is the average sound used as a baseline, the grey line is the threshold for speaking, and the green background is when you are speaking
i might try and remake talking ben using this