What is Realistic Audio in Gaming and XR Environments?

Aspects of realistic audio are referred to by many different terms, including immersive audio, spatialized audio, positional audio and 3D audio, to name a few. Regardless of the terminology used, this type of audio has several different features which may or may not be implemented in a gaming environment. The most common feature, spatialization, enables the simulation of sound from any direction. This is not just from the left and right like stereo, but from anywhere surrounding you, including in front, above and behind you. Environmental effects features are usually found in more advanced audio environments, and these refer to simulation of reverberation, which is the reflection of sound against different surfaces or the simulation of other properties of sound, such as occlusion, that reproduces how sound is perceived when traveling through or around an object. Other effects, such as the Doppler shift or distance attenuation are additional features that can be implemented in advanced realistic audio simulations.

Realistic Audio In Gaming XR

One of the most advanced features of realistic audio is implementation of individualized Head Related Transfer Functions (HRTFs). While the technical explanation of HRTFs is beyond the scope of this blog, they basically are used to take into account the unique shape of a user’s ears to provide the most accurate perception of sound in a 3D space. Commonly HRTFs are calculated in a special room using in-ear microphones and a sound source moving around a research subject’s or a dummy’s head. Now, advances in computer vision are allowing individualized HRTFs to be calculated using photos or video of your upper torso, head and ears to recreate a 3D model of these parts of your body and then using mathematical formulas to predict how sound would be captured, thereby creating the HRTF. Currently, most games that use true spatialized audio are encoded with an HRTF averaged from a database of subjects or using one of the standard dummy heads which are thought to be of average dimensions. Such HRTFs provide nominal spatialization, but with individualized HRTFs, the sound becomes very close to the real thing. That being said, research has shown that subjects can adapt to more generic HRTFs as measured by localization tasks, for example (1).

Stay tuned for future blogs where we talk about how a player can enhance the delivery of realistic audio by choice of headset and application of increasingly more individualized HRTFs.

C. Mendonça, G. Campos, P. Dias, J. Vieira, JO. P.. Ferreira, and JO. A.. Santos, "On the Improvement of Localization Accuracy with Non-Individualized HRTF-Based Sounds," J. Audio Eng. Soc., vol. 60, no. 10, pp. 821-830, (2012 October.).