Home | Site Map | Watch | FAQ | History | Store | Contact

Conversation between Alex and Stephen (prev top next)

2007feb22

Alex,

Let me respond to the final line of your last letter

 I am curious as to how you yourself approach the issue of emotional analogy.

I don't have anything like a comprehensive answer.  I've been moved to tears by music, buoyed up spiritually, made dizzy, felt like a hero ... I have only the vaguest idea how these things happen ... not to mention that I don't have a coherent framework in which an explanation for them could be formulated.  There are big holes ... like: what is sensation, what is consciousness, what is conscious experience?  When I think about what I know I'm doing when experience music, there are lots of things I can describe mechanically (which is to say that I can imagine, for example, building a machine that would do those things), but these descriptions don't explain what I feel (they only correspond to it).

Fortunately, I don't need that kind of explanation to make some progress.  Regardless of how emotion works, how sensation works, how consciousness works ... the fact is, they all do work, so I can at least study the relations between stimuli and experiences.

For example, we have the ability to recognize curves as related, across all modes of perception.  (On this subject: if you haven't read Manfred Clynes' bookSentics, The Touch of Emotions, you might want to.)  We can order all these things on a spectrum from smooth to sharp: rocks, rhythms, melodies, harmonic progressions, timbres, curves drawn on a piece of paper, plot lines in a story, numbers in a series ... and all smoothness reminds us of all other smoothness, seems like it in some way.  We might be tempted to say "of course they all seem alike --- they're all smooth," but of course that begs the question.  If I want to make a tool that translates the smoothness of a melody into the smoothness of a line drawn on a computer screen, I need to be able to detect, to measure, smoothness algorithmically.

When you start trying to design a "smoothness detector," one of the first issues you have to address is the matter of scale.  If you look at an audio waveform sample by sample, it might look smooth or irregular.  Is this the right scale to look?  It depends.  If you look at a single sine wave, a pure tone, it will look pretty smooth, and it will sound pretty smooth ... timbrally, anyway.  What about melody?  Can you tell the difference between a smooth melody and a jagged melody by looking at a waveform?  Maybe.  Could you distinguish a smooth harmonic progression from a chaotic one?  No --- it's the wrong scale (not to mention the wrong domain).  To build a reasonable system, you need to have a bunch of smoothness detectors so that you have ones that work at all sorts of scales and in all sorts of domains.

However, once you get beyond the waveform domain, you run into the next problem: grouping.  If you want to determine whether a melody is smooth, you need to figure out which notes are in the melody.  Which pieces of periodic energy belong to a given note?  Which notes belong to which instruments?  Which notes in an instrument belong to a given melodic line?

Of course, you could side-step the problem of finding melodies and just tell the system where the melodies are, but you're not saving yourself a lot of the (conceptual) work, because grouping happens within melodies, too.  How do you know where a phrase ends?  How do you determine which notes make up a motive, where the dividing line between motives are?  You could side-step this, too, and tell the system "this is a motive, this is a phrase," etc. but eventually, you've got to let the system do some useful (read "hard") work, or else you end up with the "pushy" system we talked about before, where you have to tell it what effect you want to convey at each point: "here feel sad, here have an epiphany ..."

The way I think about turning raw audio/score data into something more meaningful can be characterized as extracting features and doing transforms.  This is also a way to describe what animals do when they hear.  Feature extraction can be done at lots of different scales, and it's often convenient to think of it hierarchically (though there are places where that doesn't work).  In an animal, the first transform that happens is the conversion of mechanical energy into neural pulses.  This also involves a transform from a 1-dimensional quantity (changes in air pressure over time) into a 2-dimensional quantity (changes in neural activity at different resonant frequencies over time).  From this 2D quantity I can extract various features.  For example, I can derive changes in energy; positive changes are the precursors to onsets of acoustic events.  Places in the frequency spectrum where energy remains constant are the precursors to harmonics, which are the precursors to pitched things (like notes).

Imagine that you were able to see what was happening inside your head when you listen to music.  That is, for every stage of neural processing, you were able to see an animated picture of the features that your auditory system was extracting (and the transforms of those features).  What would that be like?  Since we're able to look at some of those (especially from the earlier stages of processing), we can get an idea.  The first thing that's striking is: even with the most raw data, the data straight out of the cochlea, it's very recognizable; if you look at a cochleagram while you're listening to the sound it represents, you immediately see the correspondence.  Depending on the scale at which you examine the output, you can see the difference between rough and smooth things.

However, in a cochleagram, there are things you can't see so easily, and things you can't see at all.  That's because they are only obvious when you've done the necessary transforms.  For example, understanding rhythm requires establishing a context of predictability, of expectation.  For this, the "features" are more complex; they no longer map one-to-one to things you're perceiving; a rhythm is a many-to-one mapping --- many rhythmic events mapped to one rhythmic template --- and there are many templates operating at a time.

How does all this relate to emotion?  These features correspond to things we perceive, and our sense of the emotional content arises from them.  At the highest level, emotion might be a feature.  I don't expect to get there any time soon, though; there are a lot of mid-level features that I need to learn about first.

Since it's about time for me to start my day job, let me stop here; I'll respond to the rest of your letter this weekend.

S.

(prev top next)