Estimation of Direction of Arrival of Multiple Sound Sources in 3 D Space Using B-Format

This contribution deals with sound sourcelocalization in the three-dimensional space. An energeticanalysis method based on B-format signals processing ispresented in this paper. This method is able to localize multiplesound sources in the three dimensional space. A singleSoundField microphone can be used to pick-up B-formatsignals indirectly. The method has been simulated in Matlaband tested in a real environment. Experimental resultsdemonstrate the validity of this method.


I. INTRODUCTION
In the last years, several sound source localization methods have been invented to localize targets.They can be mainly divided into active and passive systems.Active systems send a sound pulse and receive the echo coming back after reaching a target, and then calculate the distance between the target and the main station.This method is used in active SONAR (sound navigation and ranging) [1].The passive systems listen to the sound coming from the targets to locate them.Such method is used in passive SONAR.The passive systems can be divided into groups depending on the physical principle they use to localize the sound sources.The most physical principles used to localize the sound sources are the time delay estimation [2] and the phase difference [3].Physical principle of the phase difference and time delay is essentially the same but the methods differ in approach to the estimation.Two or more microphones are used to pick-up the sound coming from the sound sources and then some methods are used to calculate the time delay.The time delay can be calculated as the time which gives the maximum correlation between the sound signals that picked up by the microphones.In case where the method is used to localize several sound sources, more microphones are needed.The phase difference depends on the frequency of the sound signal and on the propagation path difference.The phase difference should be calculated in the frequency domain after using short time Fourier transform with H. Khaddour is with the Department of Telecommunication FEEC, Brno University of Technology, Brno, Czech Republic (phone: +420-541-149-210; fax: +420-541-149-192; e-mail: xkhadd00@stud.feec.vutbr.cz).
Hanning window for instance.The corresponding outputs for each signal are then multiplied to achieve the cross spectrum.The cross spectrum is then overlapped and averaged to get the phase difference spectrum [4].
Many sound source localization methods have been proposed in the last decade.They differ in the number of sound sources they can localize and the ability of localization in the three dimensional space.The new methods try to reduce the number of used microphones.A method proposed in [5] uses three microphones to localize the sound sources in three dimensional space.However, that method needs special reflector and source counting, and it is used to localize a dominant sound source.Other methods can be used to localize multiple sound sources, whereas they use more microphones.For instance, in [6] an array of eight microphones is used for sound source localization and tracking.However, the previous method is able to estimate the distance of the sound source too.
This paper presents an approach referred to as sound source direction estimation using energetic analysis, which aims at estimating the direction of arrival for multiple sound sources in three dimensional space depending on energetic analysis of B-format signals, i.e., the direction of the sound sources.Three B-format signals are needed to estimate the direction of the sound sources in the horizontal plane only, while four B-format signals are needed to estimate the direction of the sound sources in three dimensional space.
The paper is organized as follows: B-format signals are described in Section 2. The energetic analysis method is introduced in Section 3. Section 4 presents the simulation results.Experimental results in both horizontal and vertical planes are presented in Section 5 and conclusion can be found in Section 6.

A. B-format Principle B-format signals consist of four signals namely , , and
, which carry the information about the acoustic field near to the microphone [7].The signals and carry information about horizontal plane, carries information about vertical plane and is an omnidirectional signal, see Fig. 1.The encoding equations for B-format signals are [7] (1)

Estimation of Direction of Arrival of Multiple Sound Sources in 3D Space Using B-Format
Hasan Khaddour, JiříSchimmel,andMichal Trzos √ where represents the azimuth angle of the source, represents the elevation angle of the source and s represents the sound signal.

III. ENERGETIC ANALYSIS METHOD
The principle of energetic analysis method is that the sound source direction is the opposite direction of the intensity vector of the sound.This principle is used also in directional audio coding (DirAC) [10].
In time domain, the instantaneous acoustic intensity can be written as [11] ⃗ ⃗ where is the acoustic pressure and represents the particle velocity vector.
In energetic analysis method, the sound signals are first divided in time and then in frequency using short Fourier transform method (STFT).For each time frame, the intensity vectors are computed in frequency domain.The instantaneous intensity vector can be derived from the Bformat signals, it can be written as [12] [ ] where its component can be derived from the equations where Z 0 is the acoustic impedance of the air, t is time, f is frequency, * denotes complex conjugate, , , and are the Fourier transform for the Bformat signals , , and respectively.After calculating the intensity vector for each time frame, the direction of sound can be calculated using these equations for the azimuth [11] { and this equation is used to estimate the elevation As it can be seen from the previous equations, the azimuth and the elevation is calculated for each frequency bin in each time frame, and then the azimuth and the elevation can be determined, see Fig. 3.During a single time frame, each frequency bin carries information about direction of one sound source with dominant intensity in given frequency bin.We assume that only one single sound source is dominant in this case.This assumption can be hold since the sound signals differ from each other, and they have different spectral intensity in each time frame.After calculating the azimuth and elevation, a statistical process should be done to choose the most likelihood angles, from which the sound comes from as follows: assuming we have only one sound source, the estimation of direction or arrival of sound is determined as the angle that maximizes the summation of function (

B-format signals
) on the whole frequency interval for each time frame, and it could be written as and the elevation as where , are the estimated sound source angles (azimuth and elevation respectively), K is the number of the frequency bins for and is the vector of azimuths, t denotes the time frame index, k is the frequency bin, and is the probability that this signal comes from the direction α which is estimated from each frequency bin according to (6).

IV. SIMULATION RESULTS
Simulation results show the ability of this method to estimate direction of arrival of sound sources in both vertical and horizontal planes.Assuming we have three sound sources around the microphone, B-format signals can be generated from these signals according to (1).In the first simulation scenario, three sound sources were assumed to be around the microphone, with absence of noise.As can be seen in Fig. 4, the method was able to estimate the sound sources directions correctly, where the peaks denote the three estimated angles.microphones in both horizontal and vertical planes.The signals were assumed to be equidistantly separated (i.e. 4 degrees from each other in the horizontal plane and 5 degrees from each other in the vertical plane).Simulation results are shown in Fig. 6.As can be seen, the method is able to determine the direction of the sound sources in both vertical and horizontal plane, where the peaks denote the sound sources direction of arrival.The present of the noise signals affected the accuracy of the method, where some frequency bins denote to the direction of the noise signal sources.The SNR between and the noise signal in our simulation is about 26 dB.Fig. 6.Simulation result with the present of pseudo-random noise signal and a fan noise signal.

V. EXPERIMENTAL RESULTS
The measurements were carried out in the acoustic laboratory at Department of Telecommunications FEEC, Brno University of Technology, where the conditions of the experiment were same as in sound control rooms, listening rooms, or in living rooms with high quality listening environment; the laboratory provides semi-diffuse field with reverberation time RT60 < 0.3 s in all octave bands.

SoundField microphone
Fig. 7. Recording the sound using soundfield microphone.The measurements were carried out in both horizontal and vertical plane.The recording was made for three speakers (three men), who stood around the microphone in different arbitrary positions, see Fig. 7. Soundfield microphone was used to pick-up the sound, after recording the A-format signals, the B-format signals were derived according to (1).In the first part of our experiment, three men were talking simultaneously in three arbitrary positions around the microphones, see Fig. 7; the measurements were repeated forty times.The results for those forty measurements in the horizontal plane are shown in Fig. 8.The results are shown using box plots.The boxes have lines at lower quartile, median, and upper quartile values.The whiskers show the extent of the rest of the data.The outliers are presented by red cross outside of the whiskers.As can be seen in Fig. 8, the median error for the speakers was about 5 degrees for the first speaker, and 4 degrees for the second and the third speaker.In the second part of the experiment, the same three men, as in the first part, were talking simultaneously in vertical plane; the measurement was repeated twenty times.The absolute angle error in the vertical plane is shown in Fig. 9, it can be seen that the median error in this case was about 5 degrees for the first and second speakers and 4 degrees for the third speaker.The error that happens when this method is used comes mostly from the reverberation in the room and from the noise signals.
As can be seen in Fig. 8 and Fig. 9 the method is able to estimate the direction of arrival for multiple sound sources in both horizontal and vertical plane, the median error was about 4 degrees.Compared to our method, eight microphones are used in a method presented in [6] for three dimensional localization and tracking of sound sources, whereas our method is able to estimate the direction of the sound sources in three dimensional space using four signals.However, the absolute angle error is bigger in our method.The angle absolute error in our method is about 4 degrees whereas the angular accuracy was better than one degree for a stationary source at 1.5 meter distance in the method presented in [6].The simulation results for the method presented in [5] showed that the method was able to localize a dominant sound source using three microphones.The angle of arrival absolute error for this method differs depending on the kind of added noise and the SNR.The simulation results for this method showed that the angle error in absence of white Gaussian noise was about 3% when SNR was about -20 dB, and the angle error was 100% in absence of pink noise and SNR less than 0 dB.However, our method is able to localize multiple sound source using only three signals in the horizontal plane and four signals in the three dimensional spacewithabsenceofmixedfan'snoiseandpseudorandom noise and SNR about -26 dB.

VI. CONCLUSION
A method for three dimensional sound sources direction estimation was presented.This method is able to estimate the direction of multiple sound sources in both horizontal and vertical plane.Simulation results showed the affectivity of this method in both absence and presence of the noise signals.Experimental results showed that this method was able to estimate the direction of sound sources in three dimensional space.

Fig. 1 .
Fig.1.Polar patterns of B-format components.In order to record B-format signals directly, a combination of coincident conventional microphones is needed, whereas three figure-of-eight microphones are used to pick-up the signals , , and , an omnidirectional microphone is used to pick up the signal.B.A-format SignalsB-format signals can be derived from A-format signals.A single SoundField microphone can be used to pick-up Aformat signals[8].As can be seen in Fig.2, the microphone consists of four capsules to pick up the sound in the directions left-front, right-front, left-back and right-back.

Fig. 5 .
Fig.5.Spectral density distribution for a fan noise sound signal.

Fig. 8 .
Fig.8.Average absolute angle error for the three speakers in the horizontal plane.

Fig. 9 .
Fig.9.Average absolute angle error for the three speakers in the vertical plane.