Tuesday, December 19, 2006

Depth Perception and the wave

Mike Jorgenson

I tried to do a little analysis of the frequencies in the waveforms in relationship to the sense of percieved distance/depth in the recording. First, a few disclaimers. 1) I'm not 100% sure I'm analyzing the frequencies right. 2) The sounds I've chosen are based on what I feel to be good (or poor also) examples of depth. That said, here we go.
I looked at the frequencies using the spectral frequency view in Adobe Audition. I took tests 03-01, 03-14, 08-3, and 10-11 to try to compare a few things. First, I took test 03-01 and 03-14. These clips are from the same test and the particular clicks I compared are the same click. In addition, I chose 03-14 because it's position was not too far off of the 03-01 position. So, in theory they should sound generally the same. However, they are from different rigs. I found a few things. First, I thought in general the 03-01 sounded like there was a little more distance to the clicks. From just hearing it I thought that one of the major differences was the middle and background noise that affected this depth perception. It seems that the brighter the middle and background tones the closer the object is perceived to be. Comparing the two waveforms this seemed to be confirmed. (This sound is the 03-01 rig click 2x, followed by the 03-14 rig click 2x). The 03-14 rig has much more green and orange in this picture, signifying higher energy sound, in the 4k-10k frequency range. In comparison to the other tests I looked at it seems that the tests that sounded closer (08-03 and 10-11) had more frequency response in this same range. The lower frequencies seemed to roll off in the decay whereas in the 03 tests they did not. They seemed to remain consistent from about 6khz down. In the 08 and 10 tests you can visually see this as the green dissipates bulges in the middle (looking vertically at the click) and then sucks in at the lower frequencies. However, there is also something else to it. Looking horizontally across a small range of frequencies (say roughly one 'spike' horizontally) there are different shapes produced by the high energy and low energy colors through the decay. For example, the 03-01 test has a fat, thin base on the left that spikes fairly quickly (see pic). Test 08-03 has a more bulbous base with only a small spike. And test 10-11 has a fairly consistent slope to it. I viewed test 08-03 as having better depth percieved than test 10-11 and worse then 03-01. So it seems that the more gradual the slope of the energy, the closer the sound appears to be.
The rigs that I have chosen to look at actually says something about this I think. Test 03-01 is one of the tests on the dowel with only a simple baffle. Test 08-03 is a double boundary. Test 10-11 is a triple boundary. It seems that the more boundaries you have, the closer the sound is percieved to be at. Like I said though, I'm no expert. But this is what I found to be true.

1 comment:

us said...

Hi Mike--

This is a heck of an ambitious study. Let it be known that everything that is confusing in it can be traced to my bad advice.

Here are some crude building blocks as best as I can tell, quickly:

(1) Assuming the difference will be attributed to the mic rig, can one be sure that we can identify one recording as better at portraying a sense of "distance" compared to another? (This at least assumes same place, same mics, same sound impulse)

(2) If so, can we still discern the difference when two recording levels are used?

As recording level is increased (without over-modulation), one's sense of "reach" into space also increases. Playback loudness is very much connected to the amount of distance one can hear in a recording, because things in the distance are inherently quieter and quieter.

Lets assume we can tell that one rig works better because we can hear the reverb "tail" for a longer period of time before it dissolves into a constant field of background drones at the barn.

To test if we can discern this between two rigs would require having the two rigs as close as possible to each other, the same mics and the exact same stimulus. Why? Because a louder impulse will produce an echo that remains above the masking of the background sounds longer.

In short, this test is a very tall order. We'd need a very large, extremely quiet location and perfectly repeatable impulses. Ideally, we like the reverb tail to blend into the self noise of the mics, and not into any background sounds, but short of that, not a loud factory next door. At this level of subtlety, I bet we could hear things in the recordings that the sonograms won't show.

Sorry about the wild reverb tail chase, Mike. We still learned something and most important, have justified going camping in order to perform some critical research. Rob D.