Spatial Sound in Virtual Reality

10 Ambisonics Encoding

Considering direction vectors are of unit length ||θ|| = 1 and their inner product corresponds to Θ1Tθ=cos(ϕ), where ϕ is the angle enclosed by the direction of arrival θ and the microphone direction Θ1, a pickup pattern of a cardioid microphone aiming at Θ1 can be described as 12+12Θ1Tθ.

To encode the direct output (A-Format) of the capsules into something that can be used to reproduce the soundfield through a speaker array, it is unnecessary to convert the signals into which is called the B-Format 1515Zotter, Franz, and Matthias Frank. Ambisonics: A Practical 3d Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality. , 2019. Internet resource. :


A straight forward conversion can be expressed as:

W = 0.5(LF + LB + RF + RB)

X = 0.5(LF − LB + RF − RB)

Y = 0.5(LF + LB − RF − RB)

Z = 0.5(LF − LB − RF + RB)

but given the non-coincident arrangement of the capsules, filters are needed for the conversion to be able to reproduce the soundfield with spatial accuracy. Considering the tetrahedron array of the Zoom H3, which a spacing between capsules of approximately 28mm and the speed of sound 343 m/s, waves with frequencies above 12250 Hz will shorter than the distance between the capsules. The microphone is no longer coincident and it is necessary to apply a series of filters. The result is that while a B-format microphone has very good polar patterns at toward medium frequencies, the frequency response of the B-format signals at high frequencies depends on the direction of the sound. A second reason why these filters are necessary is to ensure that the B-format signals remain exactly in phase over the entire useful frequency range. 1616Adriaensen, Fons. “A Tetrahedral Microphone Processor for Ambisonic Recording” (2007).

A tetrahedron 1st Order Ambisonic microphone array should be used for creating soundscapes and/or background ambiance, and not for accuracy in the localization of sound sources.

"The SoundField by RØDE plug-in uses a new time-frequency adaptive approach for A to B-format conversion. This complex mathematical process means the phase between the A-format channels are aligned prior to application of the conversion matrix – essentially correcting for the non-coincidence of the capsules prior to any further processing. This makes correction filters unnecessary and yields significantly improved frequency responses and directivity patterns – as well as delivering a more natural sound that allows the exceptional quality of the NT-SF1 capsules to shine." 1717

There are two B-format specifications. They differ by the sequence in which the four channels are arranged:

  • Furse-Malham (Fuma) = WXYZ

  • ACN (AmbiX) = WYZX