The following images compare the first few notes of "Somewhere, Over the Rainbow," (sung by mezzo-soprano Lisa Turetsky) as represented by, first, a conventional speech spectrogram, then a conventional musical score, and then the VoiceTracker spectrogram (with the musical score repeated).

The conventional spectrogram is a type of representation, often used by speech researchers, that shows the amount of energy through a range of frequencies. The phonemes that make up human speech are distinguished by (among other things) the amount of energy in various fixed frequency regions, known as formants. For example, the 'eh' sound of the syllable '-where' has energy in the 3-5 KHz range that is not present in the 'oh' sound of the word 'over.' This relationship would hold true regardless of who was speaking or singing, or the frequency of their speech fundamental. (In fact, this difference is even discernable in whispering, where there is no speech fundamental frequency -- only filtered white noise.)

Since the conventional spectrogram has to show energy at a wide range of frequencies, it can't devote much vertical space to the frequency of the fundamental (seen in the conventional spectrogram as the bottommost black line, 2-5 mm. from the bottom of the graph). Because of this, small variations in the frequency of the fundamental are hard to discern. Another shortcoming of the conventional display, from the musician's point of view, is that the amount of energy at various frequencies (shown by the degree of blackness) is difficult to judge quantitatively -- it's hard to see exactly how much energy is present at a given frequency.

The VoiceTracker spectrogram (below) starts with the following simplifying assumption: that only energy at the frequency of the fundamental and its harmonics is of interest. (This of course assumes that there is a fundamental, that it can be found, et cetera -- which is true for normal singing, but not necessarily for speech and other sounds.) The scale is expanded to show the frequency of the fundamental more accurately, visible as the wiggly white line between the gray and the dark blue. (Sorry, it's not really much of a "line" at this resolution; see full-screen image to remedy.)

Below and above the line indicating the frequency of the fundamental, bands of color are added to show the amplitude of the fundamental and its harmonics, making it easier to judge the relative strength of these timbral components. (Unfortunately, the absolute frequency of these elements cannot be seen, so this display is not suitable for studying speech -- at least, not in the conventional way.)

A conventional spectrogram shows absolute frequency over a wide range and can show non-harmonic energy (such as the filtered white noise of many consonants), not just the fundamental and its overtones. The VoiceTracker, by sacrificing information about non-harmonic sounds, is able to provide an expanded view of the frequency of the fundamental, useful for examining vibrato and attack (as well as for identifying the melodic line). It also shows how much energy is in each of the harmonics of a tone, which is what largely determines tone "color."

The VoiceTracker is thus able to provide a picture of the elements that most concern singers in the process of shaping vocal tone.

Please note: The VoiceTracker is an idea and a prototype, not (yet) a piece of hardware or software. Please write with feedback! Click here to email us your responses and questions.