I have long audio recordings of users reading a script. Sometimes if the
user thinks they messed up they will repeat a prompt in the script.
My goal is to analyze adjacent waveforms to find out how similar they
are to one another to see if they are a repeated prompt.
So far, I have a script which isolates the words from silence and other
noises, and then compares adjacent waveforms by simplifying the waveform
to just simple hills and compares the number of hills and the slope of
those hills to the adjacent waveforms. This finds about 73% of the
repeats, which is pretty good, but I’m hoping to do better. Does anyone
know of a more scientific method of doing this?