In dealing with sound, and especially, musical sound, there's the concept of pitch, or frequency. The idea is that there are higher and lower sounds, and that there is some way to measure the distance between them. There are a lot of complications to this idea, having to do with timbre; for example, if you sing a Middle C with the "ooooo" vowel, you might say it sounds "lower" than if you sing it with the "eeeee" vowel. That kind of difference is outside the scope of this discussion; here, I'm just dealing with the difference between singing a Middle C and, say a C-sharp.
There are two ways of measuring these distances that are commonly in use, and they correspond (more or less) to the words "pitch" and "frequency". The units of pitch are what we typically use to describe distances in music, for example, the distance of an octave, or a half-step, or a perfect fifth, etc. The units of frequency are typically used outside of a musical context, and are expressed as cycles per second or hertz (Hz).
Q: What is the relationship between these two ways of measuring?
A: In "frequency space," equal distances are defined as equal differences in Hz. In "pitch space," equal distances are defined as equal ratios in Hz.
For example, the series 100 Hz, 200 Hz, 300 Hz, 400 Hz ... is progressing by steps of equal
distance in frequency space (a distance of 100 Hz), and the series 100 Hz, 200 Hz, 400 Hz, 800 Hz ...
is progressing by steps of equal distance in pitch space (the ratio 1:2, or an octave).
Once you've established a scale by which to measure these differences, you can make a 2D graph of sound, with the horizontal dimension indicating time and the vertical dimension indicating pitch or frequency. For example, here is a picture of a single tone (a sine wave) changing frequency at a constant rate, dropping from 8000 Hz to 80 Hz in four seconds:
By comparison, here is a picture of a tone changing pitch at a constant rate, dropping from 8000 Hz to 80 Hz in four seconds:
In these pictures, the scale is in Hz, so the one that changes frequency at a constant rate looks like a straight line, and the one that changes pitch at a constant rate doesn't. We could do it the other way, of course. Here's what the constant frequency change sweep looks like if the vertical scale is pitch:
And here's what the constant pitch sweep looks like in that view:
Q: Which one is better?
A. Better for what?
Depending on what you're using the picture (which is usually called a spectrogram) for, either method can be more useful. For example, here is a spectrogram of my wife Lisa singing the opening of "Somewhere, Over the Rainbow":
This view very clearly shows that the harmonics of her voice are equally spaced in frequency space (since they are whole-number multiples of the fundamental frequency).
However, if you're interested in pitch in a musical context, the frequency-oriented view is problematic, because a given musical interval is a different size in each frequency range. For example, 4000 Hz to 2000 Hz, the pitch distance of one octave, is about 16 mm (on my screen; your mileage may vary), whereas 2000 Hz to 1000 Hz (also an octave), is 8 mm, 1000 Hz to 500 Hz is 4 mm, 500 to 250 (Middle C) is 2 mm, 250 to 125 is 1 mm, etc.
If you want to overlay musical notation onto a spectrogram, then, you're better off using the pitch-based scale.
The frequency-based scale is usually called linear, and the pitch-based scale is called log (for "logarithmic"), because of this relationship between frequency (f) and pitch (p):