Direction-of-arrival estimation of animal vocalizations for monitoring animal behavior and improving estimates of abundance

Autonomous recording units (ARUs) show promise for improving the spatial and temporal coverage of biodiversity monitoring programs, and for improving the resolution with which the behaviors of animals can be monitored on small spatial scales. Most ARUs, however, provide the user with little to no ability to determine the direction of an incoming sound, a shortcoming that limits the utility of ARU recordings for assessing the abundance of animals. We present a recording system constructed from two Wildlife Acoustics SM3 recording units that can estimate the direction-of-arrival (DOA) of an incoming signal with high accuracy. Field tests of this system revealed that 95% of sounds were estimated within 12° of the true DOA in the azimuth angle and 9° in the elevation angle, and that the system was largely robust to background noise and accurate to at least 30 m. We tested the ability of the system to discriminate up to four simulated birds singing simultaneously and show that the system generally performed well at this task, but, as expected, fainter and longer sounds were more likely to be overlapped and therefore undetected by the system. We propose that a microphone system that can estimate the DOA of sounds, such as the system presented here, may improve the ability of ARUs to assess abundance during biodiversity surveys by facilitating more accurate localization of sounds in three dimensions. Estimation de la direction d'arrivée des vocalisations animales pour le suivi comportemental d'animaux et l'amélioration des estimations d'abondance RÉSUMÉ. Les instruments d'enregistrement autonomes (IEA) sont prometteurs pour améliorer la couverture spatiale et temporelle des programmes de suivi de la biodiversité et améliorer la résolution à laquelle le comportement des animaux peut être suivi sur de petites échelles spatiales. Toutefois, la plupart des IEA ne permettent pas (ou très peu) à l'utilisateur de déterminer la direction d'un son; cette lacune limite l'utilité des enregistrements issus d'IEA pour ce qui est de l'évaluation de l'abondance des animaux. Dans la présente étude, nous présentons un système d'enregistrement conçu à l'aide de deux unités d'enregistrement « Wildlife Acoustics SM3 » qui peuvent estimer la direction d'arrivée (DA) d'un son avec une grande précision. Des tests de ce système sur le terrain ont révélé que 95 % des sons ont été estimés à l'intérieur de 12° de la DA réelle dans l'angle d'azimut et de 9° dans l'angle d'élévation; le système était très fiable malgré les bruits de fond et précis au moins jusqu'à 30 m. Nous avons testé la capacité du système à discriminer jusqu'à quatre oiseaux chantant simultanément (que nous avons simulés) et avons montré que le système performe généralement bien, mais comme attendu, les sons plus faibles et plus longs étaient davantage susceptibles d'être superposés et donc non détectés par le système. Nous croyons qu'un système de microphone qui permet d'estimer la DA des sons, comme le système ici présenté, peut améliorer la capacité des IEA à évaluer l'abondance lors de relevés de biodiversité, grâce à une meilleure localisation des sons dans les trois dimensions.


INTRODUCTION
Studies of the ecology and behavior of free-living animals have traditionally relied upon human observers for data collection.For practical reasons, however, human observers are limited in their ability to observe animals across both spatial and temporal scales.Technological advances hold promise to improve coverage through the adoption of autonomous sensors for the collection of biological data (Porter et al. 2009).One type of automated sensor is the autonomous recording unit (ARU) for passive acoustic data collection, which is increasingly being employed for studies on marine mammals (Stafford et al. 1998, Laurinolli et al. 2003, Bonnel et al. 2008, Mouy et al. 2012, Rone et al. 2012), terrestrial mammals (Heinicke et al. 2015), bats (MacSwiney González et al. 2008), and birds (Mennill et al. 2006, Brandes 2008, Collier et al. 2010).ARUs confer several advantages over traditional methods of observation.For instance, they can continuously record for many hours, provide a permanent record of sounds, reduce interobserver variation, and are minimally intrusive to the animals under study.In many cases, ARUs yield data that would be otherwise impossible or impractical to collect, such as measures of song output from a single bird spanning days or weeks (Taff et al. 2014), or recordings of nocturnal singing behavior (Celis-Murillo et al. 2016a,b).
At least one drawback of ARUs relative to human observers is the lack of directionality in the sound recordings, a shortcoming that has only rarely been addressed (Rone et al. 2012).This may be particularly limiting when ARUs are deployed for biodiversity assessments, when multiple birds of the same species may vocalize simultaneously.Human observers in the field can more easily discriminate sounds arriving from different directions, allowing them to count several individuals of a species from a single point, and to track the movements of vocalizing individuals over time.Commercially available ARUs, in contrast, typically include one (e.g., Sieve Analytics Arbimon Touch) or two (e.g., Songmeter SM3 and River Forks E3A-CM) microphones, and are thus limited in the spatial information they can provide; a recording made with a single microphone provides no information regarding the direction of arrival of a sound, while a binaural recording can only discriminate along a single dimension, i.e., whether a sound arrived from the left side or the right side.The lack of directionality in sound recordings makes abundance difficult to quantify from ARU data, leading to occasional underestimates of the abundance of some species (Hobson et al. 2002, but see Lambert andMcDonald 2014, Drake et al. 2016).Data derived from ARUs, as a result, are often reduced to documenting the presence or absence of a given species (Haselmayer and Quinn 2000, Hutto and Stutzman 2009, Celis-Murillo et al. 2012, Tegeler et al. 2012, Digby et al. 2013, Wimmer et al. 2013, Klingbeil and Willig 2015, Leach et al. 2016), thereby discarding much of the information available to a field observer and reducing their utility for inferring population numbers (Nur et al. 1999).
We present the results from experiments using a system for estimating the direction-of-arrival (DOA) of an incoming sound to a microphone in both the azimuth and elevation directions.This system builds upon previous DOA estimation systems that have been used for the study of vocalizing animals.In marine systems, sonobuoys developed by the U.S. Navy are able to estimate the DOA of an incoming sound source, and were used by Rone et al. (2012) to triangulate the calls of Northern Pacific right whales (Eubalaena japonica) during marine mammal surveys.DOA estimation of marine mammals has also been accomplished from small microphone arrays mounted near the seafloor (Wiggins et al. 2012) and from hydrophone arrays towed behind a boat (Miller andTyack 1998, Leaper et al. 2000).In terrestrial ecosystems, Celis-Murillo et al. (2009) created a system that used four orthogonal microphones to record the soundscape.The soundscape was subsequently recreated in the laboratory by setting up speakers around an observer, and experimentation revealed that similar or better estimates of abundance could be made for many bird species in the laboratory than were obtained on point counts by observers in the field.
In contrast to the system of Celis-Murillo et al. (2009), our system employs signal processing algorithms to estimate DOA in both the azimuth and elevation directions.This approach is similar to that employed by Wang et al. (2005) for localizing woodpeckers, Kojima et al. (2017) for analyzing soundscapes, and by Ali et al. (2009) for localizing marmots and birds.Although these systems proved accurate, they have been used sparingly in the field, likely because of a reliance on custom-built equipment or software that is costly and requires considerable expertise to operate.We attempted to address this by constructing a system from commercially available ARUs (two SM3 recording units; Wildlife Acoustics, Inc., Maynard, MA, USA) that are robust to field conditions, easy to use, and widely used by biologists for bioacoustic surveys.Here we present experiments aimed at (1) testing the accuracy of DOA estimation, and (2) testing the ability of the device to count a small number (up to four) simultaneously vocalizing birds from a single point.

Recording system
The basic design of our system is shown in Figure 1, and a brief description of the recording system can also be found in Taylor et al. (2016).The system uses four simultaneously-recording microphones to estimate the direction from which a sound arrived, based on the phase differences of the incoming sound waves at the microphones (Fig 1a).We attached four microphones (SMM-A1 microphones; Wildlife Acoustics, Inc.) to two GPSenabled SM3 recording devices (Wildlife Acoustics, Inc.), each with two recording channels.Microphones were attached to a central mount, as shown in Fig. 1b.The ideal dimensions of the microphone mount vary as a function of the frequency of the sounds to be recorded (Wang et al. 2005), and the dimensions Fig. 1.The recording setup used for experiments.(a) A view from above showing the recording setup.The four microphones were attached to a microphone mount in a precise, isotropic configuration (center).Sound waves originating from a source (red and blue birds) propagate from the source outward (red and blue circles).Sound waves reaching the microphones are in different phases at each of the four microphones.Signal processing algorithms, like the MUSIC algorithm used in our experiment, can use these phase differences to determine the angle from which each sound arrived (α and β for the red and blue birds, respectively, relative to an arbitrary reference angle, labeled 0).Though only the azimuth angle is depicted here, the algorithm also determines the elevation angle.(b) A photograph of the four microphones and 3-D printed microphone mount used in the experiment.In the field, the microphone mount is affixed to a tripod.Not shown: the GPSenabled SM3 sound recording units to which these microphones were attached.must be known with high precision to facilitate accurate determination of the DOA of a signal.To accomplish this, we 3-D printed the microphone mount specifically for the SMM-A1 microphones, such that the microphones were positioned in an isotropic configuration and each microphone was separated by 68 mm from the three other microphones (Taylor et al. 2016).The geometry of the microphones was intended for analysis of frequencies between 1000 and 6000 Hz, which includes the frequency of the songs of most songbird species.For all recordings collected here, devices were set to record at a sampling rate of 96000 Hz with a high-pass filter at 1000 Hz, and with a gain of 30 dB.

Recording synchronization
Complicating matters in this iteration of the recording system was the need for precise synchronization between the four recording channels.In our device, the four channels were connected to two separate SM3 devices that could record no more than two channels each.Though GPS attachments were used to bring these two devices into synchrony, experimentation revealed that even this level of synchrony was not sufficient for our needs.
A workaround was achieved by fixing a single earbud headphone to the microphone mount on an attachment designed to place the earbud equidistant from one of the channels from each recording device.This earbud was attached to a cell phone programmed to emit a synchronization signal (in our case, a recording of 40 single notes played on a piano over the course of 30 seconds) every five minutes, which could subsequently be used to ensure synchronization of the two devices.The files needed to 3-D print both the main mount and the earbud attachment can be found on Figshare (http://dx.doi.org/10.6084/m9.figshare.3792780).

SDEer, software for DOA analysis
We created a Matlab App, called SDEer, to conduct analysis of recordings.This software includes a user-friendly Graphical User Interface (GUI), allowing straightforward use by inexperienced users.The software and detailed instructions for its use can be found on Figshare (see link above).
The extraction of DOA from a raw sound recording consists of three steps: synchronization, segmentation, i.e., event detection, and DOA estimation.SDEer allows synchronization to be conducted either manually or automatically.Manual synchronization can be used if the magnitude of the time offset between the two SM3 units is known, and this offset can be specified in SDEer.If the offset between the two units is not known, SDEer can calculate it automatically.To do so, the raw sound file used as a synchronization signal must be provided to the software, and SDEer uses waveform cross-correlation to locate this signal in the recordings from the two SM3 devices.Because the synchronization signal was broadcast from a point equidistant from one of the channels from each SM3 unit, these detected synchronization signals can be used to calculate the offset between the two devices, and then to bring the two devices into precise synchrony.
Segmentation can also be accomplished either automatically or manually.If automatic segmentation is desired, the user can select from among 10 different end-point detection algorithms to automatically detect the start-and end-points of sounds of interest from the recording.If manual segmentation is preferred, the program can read in a .TextGrid file created using the free online linguistics software Praat (Boersma and Weenink 2014), containing manually detected start-and end-points for the signals of interest.
DOA estimation is thereafter accomplished by specifying the geometry of the microphones, selecting a DOA estimation algorithm, and specifying the desired precision of DOA estimation, the range of possible angles from which the source might originate, and the expected frequency bandwidth of the sounds of interest.Though the DOA algorithms differ in their details, all rely on phase differences in the signal of interest to estimate the relative delay of the same signal arriving at each microphone, from which the most likely direction of the source can be inferred.The selected algorithm "searches" through the possible angles from which the signal could have originated and outputs the direction with the highest likelihood of having generated the signal.Specifying a higher precision of estimation or a greater range of angles to be searched has the consequence that the DOA takes more time to compute.Once DOAs have been estimated, the results can be manually copied from SDEer to another program, or SDEer can output a .TextGrid file with additional tiers showing the direction of each signal of interest.

DOA estimation experiments
Experiments to test the accuracy of the system were carried out over two days at the UCLA La Kretz Field Station in Malibu, CA, USA (34° 5'49.33"N, 118° 48'57.03"W).On 9 February 9 2016, we tested the accuracy of DOA estimation for single sources at 10 m, 15 m, 20 m, 25 m, and 30 m from the microphone ("distance experiments"); on 16 February 16 2016, we tested the performance of the system when up to four sources played simultaneously at a distance of 15 m ("multisource experiments").On both days, the recording device was placed in the middle of a gradually sloped, grassy field, with playback speakers placed at various predetermined azimuth angles.Elevation angles were not varied systematically, but varied according to the topography, such that the range of elevation angles included in this analysis range from approximately -10° to +15° from horizontal.
Each of the four playback stimuli was an unaltered recording of one of four songbird species: Bewick's Wren (BEWR, Thryomanes bewickii), California Thrasher (CATH, Toxostoma redivivum), Black-headed Grosbeak (BHGR, Pheucticus melanocephalus), and Cassin's Vireo (CAVI, Vireo cassinii), singing their typical songs during the breeding season in California.The recordings, therefore, contained bursts of song interspersed with periods of silence, as is typical for these species.Thus, even when multiple recordings played at the same time in the multisource experiments, not every single burst of sound overlapped with a song from another speaker; many, by chance, were unimpeded by background noise.
The four species differed in the acoustic and temporal properties of their songs, most strikingly in the length of each song, the length of the silent intervals between songs, and the frequencies of the songs, which are summarized in Table 1.During playbacks, sound files were played on a loop for a variable period (30-300 s), before being turned off and moved to another location.During the distance experiments, the loop consisted of the recordings of all four species played consecutively.During the multisource Table 1.Temporal and frequency characteristics of recordings from four species used as playback stimuli to test the accuracy of the microphone system for direction-of-arrival estimation.experiments, a single species was played on a loop from each location.At times during the multisource experiments, up to four speakers broadcasted the songs of up to four different species, while at other times the four speakers all broadcasted CAVI song to simulate a situation, such as might commonly be encountered during point-count surveys or studies of counter-singing interactions, where multiple individuals of the same species sing at the same time from different territories.An effort was made to standardize the amplitude of the broadcasts so that each recording registered approximately 80 dB when measured at 1 m from the microphone using a Radioshack Sound Level Meter 33-2055 (Radioshack Corporation, Fort Worth, TX, USA), which is accurate ± 2 dB.
We measured the angle from the microphone mount to each speaker using an Alton AT0132300 Tripod Multi-Beam and Rotary Laser Level set (Alton Industries Group, Ltd., Batavia, IL, USA).We pointed the laser level toward the speaker to estimate the azimuth angle from which each source originated, ranging from 0° to 359°.Elevation angles were approximated using trigonometry based on the known distance to the speaker and an estimated elevation relative to the center of the four microphones.

Analysis
All analyses were conducted in Matlab using the SDEer package.
For current purposes, we elected to use manually segmented recordings because the automated system often combined multiple sounds into one, even when they did not overlap.During manual segmentation, we noted the number of distinct sound sources that appeared to have contributed to the sound in question, allowing us to assess the performance of the algorithm on overlapping versus nonoverlapping sounds.The resulting TextGrid file was used to provide temporal boundaries for the signals of interest, which were then analyzed to extract the DOA of each signal.We used the MUSIC algorithm for DOA estimation (Schmidt 1986), though we recognize that different algorithms may differ in their accuracy and sensitivity to recording conditions.The algorithm searched angles from 0° to 359° in the azimuth angle, and from -30° to +75° in the elevation angle, with 1° of resolution.The analysis was restricted to the frequency band from 1000 to 6000 Hz.

DOA estimation at different distances
In the distance experiments, we tested the accuracy of DOA estimation at five-meter increments between 10 and 30 m from the microphone, at two different angles (45° and 285°).During these experiments, the sounds of all four species were broadcast at each distance from the microphone and from each angle.The error distributions at these distances are shown in Figure 2, and show that the vast majority of sounds were localized accurately within 10° at distances up to 30 m from the microphones.Unexpectedly, the lowest accuracy was obtained at 15 m at an angle of 285°.Of 78 sounds originating from this location, 16 of these were estimated to have arrived from ~315° in the azimuth angle, an error of about 30°.All 16 of these errors were BHGR  song, which may indicate a lower performance on localizing the songs of some species than others.In particular, previous research has found that tonal sounds lacking frequency modulation, such as those often delivered by BHGR, are often less easily localizable than less tonal sounds (McGregor et al. 1997, Bower and Clark 2005, Mennill et al. 2012).Errors of this magnitude were not common at any other distance or angle, nor were they commonly encountered in the subsequent experiments outlined below, so further experiments will be required to determine the extent to which the acoustic properties of sounds influence the accuracy of DOA estimation.

DOA estimation for monitoring behavior and counting individuals
During the multisource experiments, we tested the performance of the system for counting multiple vocalizing birds.Here, a sound comprising four overlapping sounds was annotated equivalently to a sound that was originated from a single source, though the number of signals contributing to each recorded sound was noted.Results from all trials are visualized in Figures 3 and 4. Across trials, the estimated direction of the source was typically within a few degrees of one of the true directions.The azimuth angle showed an accuracy of 5.1 ± 7.6° (mean ± SD).This error distribution was highly skewed by relatively few highly erroneous DOA estimates, and 92% of all sounds were localized within 10° of the true direction.Accuracy in the elevation angle was 3.5 ± 3.1°, and 96% of all localizations were within 10° of the true source in the elevational direction.
For the purposes of counting multiple birds of the same species, it is clear from Figs 4a-4e that up to four individuals singing simultaneously can be counted using this system.Fig 4b reveals a caveat, namely that it would be challenging to separate the two birds if they are too close to each other in angular distance.When birds were separated by more than about 20°, estimated directions clustered together in the vicinity of the true DOA, and could be readily counted with a high level of confidence.

Overall accuracy of DOA estimation
To summarize the above results, we combined all DOA estimations from the distance experiments and the multisource experiments to assess the overall accuracy of the system under the various conditions tested here.The mode, median, and mean errors in the azimuth angle were 2°, 4°, and 5.4°, respectively, and in the elevation angle were 0°, 3°, and 3.6°, respectively.Ninety-five percent of all signals were estimated within 12° of the true DOA in the azimuth angle, and within 9° in the elevation angle (Fig. 5).

Overlapping signals and acoustic masking
A consideration when analyzing sounds is the effects of background noise or overlapping sounds on the performance of the system.In terrestrial ecosystems, soundscapes can be complex, making background noise the norm rather than the exception.
Our data was suitable to analyze the effects of overlapping sounds on the performance of the DOA algorithm: of the 1387 sounds analyzed during the multisource experiments, 839 were isolated sounds from a single speaker, 343 included overlapping sounds from two speakers, 144 included overlapping sounds from three speakers, and 60 included overlapping sounds from all four speakers.Accuracy measures under these varying levels of signal overlap are provided in Table 2.
In general, the algorithm selected as the true DOA one of the speakers that had contributed to the incoming signal, seemingly selecting the signal with the most energy, though the accuracy declined slightly as the amount of overlap increased (Table 2).An important consequence of this was that, when signals overlapped,  one or a few speakers tended to "win out" over others, effectively masking the presence of one or more of the other speakers.This acoustic masking effect is most evident in Figures 4h-j, where one or more speakers were not detected at all.This resulted from the predominance of the BHGR signal during these experiments, which was both positioned uphill of the microphones and broadcasted songs that were very short and frequent.It is also possible that subtle differences in signal amplitude between speakers exacerbated this effect.The BEWR signal, in contrast, contained long songs that were prone to being overlapped (Fig. 6), leading BEWR songs to be less frequently detected.
There are a few ways this issue can be addressed.First, some algorithms, including the MUSIC algorithm employed here (Schmidt 1986), have been shown elsewhere to successfully discriminate the directions of two or more overlapping signals (Zhang et al. 2014).Once the directions of sources have been identified, beamforming can be used to amplify a signal from a particular direction while filtering out sounds from other directions, as a means of enhancing the sound of interest (Zhang et al. 2014).An alternative approach, feasible when using manual segmentation for the detection of sounds, is to ignore signals that are obviously overlapped and only localize sounds that occur in isolation.In the case of the long songs of the BEWR, this may mean localizing a portion of a given song when parts of the song are overlapped, as demonstrated in Figure 6.We employed this approach to recreate the results from Figures 4h-4j in Figures 7a-7c, showing that the BEWR signal can be recovered from a complex soundscape by focusing on nonoverlapped parts of the songs.

Computational costs
We tested the time needed for DOA estimation of 100 CAVI sounds on a standard laptop (Toshiba Satellite R845-S95).We estimated the DOA of these sounds by setting the algorithm to search with varying levels of angular resolution, from 1° of resolution in the azimuth and elevation angles, to 10° in both directions.When estimating DOA with the highest angular resolution (i.e., 1° in azimuth x 1° in elevation), this task took 6365 seconds, or 64 seconds per sound.The time required for DOA estimation declined as a function of the resolution of the angles to be searched, such that a resolution of 2° x 2° took onequarter as long, a 3° x 3° resolution took one-ninth as long, etc. DOA estimation with 10 x 10 resolution took just 32 seconds.The decline in computation time was nearly perfectly described by the equation = 6365 * (1) Fig. 6.Spectrogram representation of a short period of song during the experiment with four sound sources.BHGR is highlighted in blue, CAVI in yellow, CATH in pink, and BEWR in red.When sounds were segmented coarsely, i.e., the area bounded by the two vertical opaque green lines, short bursts of BHGR song often overlapped the much longer BEWR song, leading to masking of the BEWR song.This effect could be minimized, and the BEWR sounds analyzed, by selecting nonoverlapped portions of BEWR song for analysis (transparent green boxes).The results of this approach are shown in Figure 7.
(R² = 0.999).Given that DOA estimation was only accurate within about 10°, it is likely that computation could be expedited by searching a coarser grid of angles, with minimal effect on overall performance.Alternatively, if only azimuth angles are of interest, computational costs could be reduced by searching a coarser grid in the elevation angle, while maintaining high resolution in the azimuth angle.Other algorithms have also been shown to reduce computation costs significantly (Zhang et al. 2014), and may be incorporated into future versions of the SDEer software.

DOA estimation for surveying population abundance
Our results illustrate that our DOA estimation system can successfully discriminate and determine the bearing to a small number of incoming sources, and that DOA estimation remains accurate to at least 30 m from the speaker and in the presence of overlapping sounds.In both the azimuth and elevation angles, our system could reliably locate incoming sounds with an accuracy that typically fell within 10° of the true source.As a comparison, human listeners under laboratory conditions have been shown to be capable of discriminating azimuth angles within 2°, and elevation angles within 3.5°, though errors increase to as high as 20° depending on the location of the source relative to the head (Middlebrooks and Green 1991).This ability to determine the directionality of incoming sounds forms a critical component of auditory scene analysis, the process by which the human brain successfully analyzes complex and overlapping sounds, extracting meaningful information from noisy auditory inputs (Bregman 1990).DOA estimation using multiple microphones, as shown here, may yield similar benefits when applied in digital recording systems, bringing the capabilities of ARU technology more in line with the abilities of human observers.
We believe that the DOA estimation errors presented here are conservative because of the difficulties of assessing the "true DOA" in the field.For instance, careful inspection of Figures 3  and 4 reveals that estimation errors for some sources were often biased in one direction or another.The speaker positioned at an azimuth of 10° was consistently estimated closer to 5°.Similarly, elevation angle estimates for the speaker positioned at an azimuth of 100° were consistently a few degrees low.We suspect that our estimations of the true locations of the speakers were accurate within about 1° or 2°, on average.Accordingly, the errors in accuracy reported here may be 1° or 2° higher than the true errors.Future experiments should explore more accurate methods for measuring the true DOA of sound sources, to minimize the potential influence of measurement error on accuracy estimates.
Given the performance of our system at estimating DOA, we suggest that this system, or one like it, can help advance ARU surveys beyond a reliance on the presence/absence of species, and toward more accurate assessments of abundance.Indeed, given the known shortcomings of human observers, including inaccurate assessments of abundance (Simons et al. 2007), and considerable variation between observers (Alldredge et al. 2007, Simons et al. 2007, Celis-Murillo et al. 2009), it is plausible that such a system may someday surpass human observers in their ability to count the number of vocalizing birds within audible distance of a microphone.
Some studies have reported high agreement between ARU-and field-based counts, even without DOA information, which may raise the question of whether DOA is a necessary feature for counting birds from ARU recordings.As an example, Venier et al. (2012) derived very similar counts from ARU-based surveys as were obtained in the field.On 220 surveys, they detected an average of about 14 individuals of 10 species on each point with both methods.The modest number of individuals per species (~1.4 individuals per species per point) in their dataset may have contributed to the high performance of ARUs on these surveys.When species are represented by one or a few individuals, counting them from a stereo recording is expected to be straightforward, and ARUs likely suffice in their current form.Such situations may be the norm, but are not universal: Drake et al. ( 2016) report challenging situations where > 6 Yellow Rails (Coturnicops noveboracensis) could be heard from a single point; listening to a stereo recording in such circumstances is largely unhelpful because of the chaotic nature of the soundscape.The likelihood of encountering such high densities of birds varies considerably from one species to the next.Yellow Rails may be an exceptional case, because their locally restricted breeding habitat mean they are sometimes found at high densities where suitable habitat presents itself (Leston and Bookhout 2015).In this situation, we expect DOA to confer the highest benefit.In more typical circumstances where one or a few individuals of a species can be heard from a single point, the benefits of DOA may be reduced, but should increase as a function of population density.
For DOA estimates to be used for counting individuals, a protocol is needed to convert DOA to counts for each species, for example by clustering sounds arriving from similar angles and assuming they came from the same bird.Simple clustering of DOA estimates risks overcounting individuals, especially when an individual moves during a survey, but incorporating information regarding species-specific calling rates (Drake et al. 2016) or mobility could help clarify whether two sounds likely originated from the same individual.Conversion of raw counts to estimates of population density will further necessitate estimates of detection probability and the effective survey radius of the microphones (Marques et al. 2013).

DOA estimation for monitoring movements
In addition to counting individuals, this system appears suitable for tracking the movements of vocalizing individuals relative to the microphone.Information regarding the location and movements of an animal can provide crucial context when examining variation in signaling behaviors that may vary according to the location of the signaler (Simpson 1985, Haff et al. 2015), and to studies of vocal interactions involving multiple individuals (Vehrencamp et al. 2014).To fully realize this potential, it will be necessary to estimate not only the DOA of a source, but also its absolute location.Our system could be used to accomplish this in two ways.The first, and simplest, way would be to combine the direction estimate from a single node, i.e., a set of four microphones, as shown in Figure 1b, with the relative amplitude of the sound at the microphone.Using amplitude as a proxy for distance, an absolute location can be roughly estimated.This method would require calibration for each species, and accuracy will be affected by variables such as the direction the animal is facing (Patricelli et al. 2007), vegetation structure (Morton 1975), and the amplitude of production at the source, which is known to vary as a function of social context (Akçay et al. 2015, Reichard andWelklin 2015).
The second way of estimating absolute location would be to combine DOA estimates from multiple recording nodes to triangulate the vocalizing bird in three-dimensional space (Wang et al. 2005, Ali et al. 2009, Griffin et al. 2015).The principles of this approach are similar to those in radio telemetry studies, where an animal's location is estimated based on the intersection of bearings to the animal's radio transmitter measured from different locations (White and Garrott 1990).Compared with single-node localization, triangulation is expected to be more accurate, with accuracy increasing as more nodes are added to a recording array.
Concomitant with these increases in accuracy, however, are increases in equipment costs and the need for more involved analyses; the ideal method for any study will depend on the research questions being asked and the resources available to carry out the research.
Localization of animal sounds is not new, having been accomplished in many studies of marine mammals (Janik et al. 2000), terrestrial vertebrates (Blumstein et al. 2011), and migrating birds (Stepanian et al. 2016).Localization has most commonly been accomplished by assessing differences in the arrival time of a signal at multiple, widely spaced microphones, a technique commonly referred to as the time-differences-ofarrival (TDOA) approach to localization (Stafford et al. 1998, Janik et al. 2000, Bower and Clark 2005, Mennill et al. 2006, 2012, Collier et al. 2010, Stepanian et al. 2016).The microphone picking up the sound first is presumed to be closest to the sound source, and the relative time delay of the signal at all other microphones is used to estimate the absolute location based on the speed of sound transmission through the relevant medium, i.e., air or water.A similarity between the TDOA-and DOA-based approaches is that both rely on differences in the arrival time of a signal at multiple microphones; the TDOA approach uses time differences between widely spaced microphones to directly calculate the location of a source, whereas the DOA approach uses time differences between closely spaced microphones to estimate direction.Though the two approaches are based on the same fundamental principles, they likely confer distinct benefits.
A benefit of DOA for localization is that it does not require precise synchronization between nodes of an array, only between microphones within a node.In theory, precise synchronization within a node should be easier than between nodes, because all microphones could be wired to a single four-channel recording system.Localization using TDOA, in contrast, requires synchronization between the nodes in an array.This has been accomplished in the past either by wiring the nodes together, an expensive and labor-intensive task (Mennill et al. 2006), or, more recently, through the use of GPS (Mennill et al. 2012) or wireless communication (Collier et al. 2010).Any errors in the synchronization between nodes will affect the accuracy of TDOAbased location estimates.
A benefit of TDOA localization is that each node in an array can comprise just a single microphone (e.g., Mennill et al. 2006), while DOA estimation requires at least three microphones per node (in this study, four were used).The requirement of more microphones per node for DOA is at least partially offset by the fewer number of nodes required to estimate locations using DOA: TDOA requires at least three distinct nodes to estimate an absolute location, while DOA can accomplish this with as few as one selfcontained unit.Both methods are expected to produce more accurate location estimates when using a larger number of nodes.
A further consideration is that TDOA approaches are typically limited in their ability to estimate the vertical position of a vocalizing animal, and for this reason are generally used for twodimensional localization (Janik et al. 2000, Laurinolli et al. 2003, Bower and Clark 2005, Mennill et al. 2006, 2012, Collier et al. 2010, but see Stepanian et al. 2016).Our results suggest that DOAbased localizations should be accurate in both the horizontal and vertical dimensions (Fig. 5b).
These differences between TDOA-and DOA-based localization approaches are largely hypothetical.In practice, considerations of cost, ease-of-use, the relevance of the vertical dimension, and the need for synchronization must be evaluated in light of the research questions being asked.Most importantly, additional experiments are clearly needed to test the accuracy of DOA for localization and to directly compare its accuracy with TDOAbased localizations on the same data because accuracy will surely be a critical piece of information for most applications.

Future directions
DOA technology, although promising, has infrequently been deployed for practical surveying and behavioral monitoring purposes (Wiggins et al. 2012).As a result, several methodological issues associated with its use remain to be investigated, in addition to those outlined above.Tests of the system in a more complex ecosystem would be desirable.Our experiments took place in an open field, and it is possible that performance would decline in forest environments, as has been the case for other localization systems (McGregor et al. 1997, Mennill et al. 2012).One explanation for the frequent errors in DOA estimation in Figure 2c-2d, for example, is that they may have been caused by echoes or reverberations leading to erroneous estimations of direction; echoes and reverberations are expected to be more prominent as the number of obstacles and surfaces in the habitat increases, but the extent to which this will hinder the performance of this system remains to be examined.
Distances were only tested to 30 m in our experiment because a fence surrounded our study site, preventing testing at a greater range of distances.At these distances, there was no clear relationship between DOA accuracy and distance (Fig. 2).In exceptional circumstances, sounds can sometimes be detectable by an ARU at distances up to 400 m (Lambert and McDonald 2014), so there is a need to test the performance of DOA estimation across larger distances.The results of such an experiment would have direct implications for the maximal size of a localization array, and for the maximal distance to which this system could be used to count birds.We expect that signal-tonoise ratio, rather than distance, will be the most critical determinant of DOA accuracy.If so, the effective radius for DOA estimation will likely vary by species, with weather conditions, and as a function of background noise and microphone quality.
In addition, we only tested the system on sounds originating at vertical angles within 15° of horizontal, and the algorithm only searched vertical angles between -30° and +75°.In practical deployments, sounds may originate from any angle.The microphones used here were arranged in an isotropic configuration, so we expect the system to show similar performance, regardless of the angle of incidence of the sound.It is possible, however, that echoes off the ground may be more pronounced for sounds originating from above, thereby affecting DOA accuracy.Such considerations are expected to be most important for studies aimed at tracking birds in forest environments or when they are flying overhead (Stepanian et al. 2016).
Given the above methodological concerns, experiments testing the ability of the system to track the movements of, and count, real birds in the field under a broader range of conditions, e.g., weather, background noise, or habitat structure, are clearly needed.These could entail simultaneous point counts and ARU surveys at a variety of points, like the approach taken by Venier et al. (2012) but with the addition of a spatial component, where the ARU and human are tasked with estimating the number, direction and location of calling birds.Performance could be assessed by comparing ARU-based estimates with simultaneous human estimates of bird locations and distances.Moreover, the relative importance of DOA estimation for counts could be isolated by comparing counts derived from a stereo recording (using two of the microphones) with counts from the DOAenabled system and those of the human observer.Experiments of this sort are planned for the near future.
The most important barrier to the widespread adoption of this system remains the lack of suitable hardware.Although our system used commercially available SM3 recording units, synchronization of the two recording units proved challenging.The use of an earbud headphone to broadcast the synchronization signal required minimal background noise, conditions that may not be attainable in the field, especially during the breeding season when biotic noise is at a maximum.We anticipate that a future iteration of this system can overcome this issue either by connecting the two devices to a single time source, or by using a system that records four or more channels by default.The success of our software at extracting DOA from incoming sounds suggests that neither software nor analytical techniques currently limit the adoption of these methods for acoustic monitoring; we hope that by demonstrating the benefits of this capability and by discussing potential applications, future acoustic monitoring systems might be designed to include four recording channels in a particular geometry, and that DOA estimation may someday become a basic feature of ARU systems for bioacoustic monitoring.

CONCLUSION
We presented a system for estimating the DOA of an arriving sound.The generally high performance of our system suggests that DOA estimation will likely be useful for biologists seeking to employ passive acoustic recordings in their research.DOA estimation may provide at least two primary benefits: to contribute toward more accurate estimates of abundance and to track vocalizing animals through space.The accuracy of our system appears sufficient for both of these purposes, but widespread adoption of systems such as ours is limited by the lack of hardware designed for this particular task.Promisingly, however, the primary hardware limitation is related to synchronization of two independent recording units, which we expect can be addressed in a future version of the system.We hope that DOA estimation capabilities will be a standard feature of ARUs in the future, allowing biologists to count, track, and study animals with greater precision than ever before.

AUTHOR CONTRIBUTIONS
Richard W. Hedley and Yiwei Huang contributed equally to this study.Richard W. Hedley conducted experiments, carried out data analysis, and wrote and revised the manuscript.Yiwei Huang conducted experiments, designed and constructed hardware, and wrote the software needed for data analysis.
Responses to this article can be read online at: http://www.ace-eco.org/issues/responses.php/963

Fig. 2 .
Fig. 2. Error distributions for direction-of-arrival (DOA) estimation carried out at five distances from the microphone system.Each azimuth and elevation distribution represents a combination of errors from trials at two locations (45° and 285°i n azimuth relative to the microphone system).Gray bars indicate counts of azimuth (a, c, e, g, and i) and elevation errors (b, d, f, h, and j) relative to the true DOA.Dotted vertical lines indicate the accuracy threshold below which 95% of errors occurred.Numbers at the lower right of each plot indicate the number of sounds with errors greater than 35°, and total sample sizes are indicated in each plot window.Sample sizes include sounds from all four species.

Fig. 3 .
Fig. 3. Visual representation of the results of experiments testing the ability of the microphone system to determine the direction-ofarrival (DOA), in both the azimuth and elevation directions, to two sources being broadcast at the same time from varying directions.All sounds were broadcast from speakers placed 15 m from the microphones.Xs denote the DOA of each source, as measured in the field, and the species name is indicated beneath each X.Colored rectangles indicate the number of sounds estimated to have originated from each direction (blue: 1 sound; orange: 2-5 sounds; red: > 5 sounds).DOA estimates for 535 sounds are shown.Though DOA was estimated with a resolution of 1°, rectangles are shown with a resolution of 5° for clarity.

Fig. 4 .
Fig. 4.Visual representation of the results of experiments testing the ability of the microphone system to determine the direction-ofarrival (DOA), in both the azimuth and elevation directions, to multiple sources being broadcast at the same time from varying directions.The number of sources was either two (f), three (a, d, and g), or four (b, c, e, h, i, and j).All sounds were broadcast from speakers placed 15 m from the microphones.Xs denote the DOA of each source, as measured in the field, and the species name is indicated beneath each X.Colored rectangles indicate the number of sounds estimated to have originated from each direction (blue: 1 sound; orange: 2-5 sounds; red: > 5 sounds).DOA estimates for 852 sounds are shown.Though DOA was estimated with a resolution of 1°, rectangles are shown with a resolution of 5° for clarity.

Fig. 5 .
Fig. 5. Summary of the magnitude of errors in the azimuth (a) and elevation (b) directions across all experiments conducted from varying distances from the microphone (10, 15, 20, 25, or 30 m) and with varying numbers of sources (1, 2, 3, or 4).Numbers at the lower right of each plot indicate the number of sounds with errors greater than 35°.In total, direction-ofarrival was estimated for 2129 sounds.

Fig. 7 .
Fig. 7. Reanalysis of the sounds recorded and analyzed in Figure 4h (a), 4i (b), and 4j (c).By focusing on nonoverlapped portions of BEWR song, the DOA of BEWR could be accurately estimated, even though every song was overlapped by the shorter and more numerous BHGR songs in the original analysis.
Values denote the frequencies that divide each signal into two parts with equal energy.Values were calculated in Raven Pro (Bioacoustics Research Program 2014).
‡Frequencies beneath which lie 5% and 95% of the signal's energy.

Table 2 .
Summary of accuracy of direction-of-arrival (DOA) estimation when different numbers of distinct sources contributed to an analyzed sound, from one source (no overlap) to four sources (maximal overlap).