
The Physics of Sound in Extended Reality
As anyone who consumes media knows, sound and audio are essential components to any media experience. With the exception of content made for those who are hearing impaired, every movie, series, documentary, TVC and creative film project goes through an intense sound editing process that cuts unwanted noise and perfects the rest.
Sound is also an essential component of any extended reality (XR) experience, whether it be through virtual reality (VR), augmented reality (AR) or a mixed reality (MR) production. However, when the visual media moves away from traditional 2D platforms, such as TVs and computer screens, and enters a 3D space, sound is having to adapt to achieve a level of realism that is now expected.
Extended reality aims to provide users with interactive and immersive virtual experiences that are as realistic as possible, and to achieve this, the whole experience – from visuals to sound – must be dynamically rendered at a high quality, adapting to the viewer’s position or physical movements. However, sound behaves differently in virtual environments than in the real world, making it important to understand the physics of sound in XR.
The Physics of XR Sound Editing
In traditional audio systems, sound is typically recorded, mixed, and played back in either mono (0 dimensions), stereo (1D), or surround (2D) formats. However, with the rise of XR technologies, sound is now being played from various directions simultaneously, including the top and bottom. For most people, the best example of how top and bottom sound is incorporated into traditional media is the Dolby Atmos system used in more modern cinemas. This three-dimensional audio helps to create a more realistic and immersive experience for the user, but has so far only been utilised for a traditional screen.
To bring all this together into a more interactive media space, such as through virtual reality or augmented reality, sound waves must be accurately simulated in real-time and adapt to the audience’s place in the virtual environment.
This is achieved using spatial audio techniques, such as head-related transfer function (HRTF), which models how sound waves interact with the listener’s head and ears. By simulating the unique acoustic properties of each listener’s head, HRTF helps to create a more realistic sound field, allowing sounds to be perceived as coming from specific locations in the XR environment. Anyone with a new pair of Apple iPod Pros can experience a basic form of this with its new “spatial audio” feature.
As visual tech develops at a fast pace, so must the audio. Which is why XR related editing – from sound to visuals – is becoming the leading challenge for filmmakers today.

Spatial Audio in Extended Reality
Spatial audio plays a different role in each of the XR fields. In VR, spatial audio is used to create a convincing and immersive audio experience that matches the virtual environment being enjoyed by the user.
In AR, spatial audio can be used to provide additional context and information about the user’s surroundings, and is often more targeted or specific in range.
In MR, spatial audio is used to combine virtual and real-world sounds, creating a seamless audio experience for users based on the experience being presented. This could range from realistic world sounds to otherworldly sounds that build atmosphere or inform.
Depending on the media type, the approach to sound editing must then be adapted to fit. While none of this is out of reach for sound editors today, it can create a multitude of difficulties along the way and often results in a lot more back and forth between editors during post – with adaptations and concessions made on each side to find the best result.
Real Time Rendered Audio in Extended Reality
Despite the many benefits of spatial audio in XR, there are still several challenges to overcome. For example, audio must be dynamically rendered in real-time to match the user’s movements, which can be computationally intensive. Additionally, the user’s position and orientation must be accurately tracked to ensure that the spatial audio is correctly aligned with the virtual environment.
So how can filmmakers provide users with a more realistic and engaging XR experience? Well, apart from leaps and bounds being made in spatial audio development, researchers are now exploring ways to incorporate other sensory modalities, such as haptic feedback and smell, into XR environments. At MOONJI Production, our full-suite XR studio is primed and ready to experiment with such concepts, both new and proven.
The idea here is by combining multiple sensory modalities, it is possible to create a more convincing and immersive virtual experience that more closely matches the real world. The joke “smellovision” of the past may actually be a real thing in the near future. As XR technologies continue to evolve and become more widespread, the importance of understanding the physics of sound in XR will only increase, leading to even more realistic and immersive XR experiences in the future.