Studies of the ecology and behavior of free-living animals have traditionally relied upon human observers for data collection. For practical reasons, however, human observers are limited in their ability to observe animals across both spatial and temporal scales. Technological advances hold promise to improve coverage through the adoption of autonomous sensors for the collection of biological data (Porter et al. 2009). One type of automated sensor is the autonomous recording unit (ARU) for passive acoustic data collection, which is increasingly being employed for studies on marine mammals (Stafford et al. 1998, Laurinolli et al. 2003, Bonnel et al. 2008, Mouy et al. 2012, Rone et al. 2012), terrestrial mammals (Heinicke et al. 2015), bats (MacSwiney González et al. 2008), and birds (Mennill et al. 2006, Brandes 2008, Collier et al. 2010). ARUs confer several advantages over traditional methods of observation. For instance, they can continuously record for many hours, provide a permanent record of sounds, reduce interobserver variation, and are minimally intrusive to the animals under study. In many cases, ARUs yield data that would be otherwise impossible or impractical to collect, such as measures of song output from a single bird spanning days or weeks (Taff et al. 2014), or recordings of nocturnal singing behavior (Celis-Murillo et al. 2016a,b).
At least one drawback of ARUs relative to human observers is the lack of directionality in the sound recordings, a shortcoming that has only rarely been addressed (Rone et al. 2012). This may be particularly limiting when ARUs are deployed for biodiversity assessments, when multiple birds of the same species may vocalize simultaneously. Human observers in the field can more easily discriminate sounds arriving from different directions, allowing them to count several individuals of a species from a single point, and to track the movements of vocalizing individuals over time. Commercially available ARUs, in contrast, typically include one (e.g., Sieve Analytics Arbimon Touch) or two (e.g., Songmeter SM3 and River Forks E3A-CM) microphones, and are thus limited in the spatial information they can provide; a recording made with a single microphone provides no information regarding the direction of arrival of a sound, while a binaural recording can only discriminate along a single dimension, i.e., whether a sound arrived from the left side or the right side. The lack of directionality in sound recordings makes abundance difficult to quantify from ARU data, leading to occasional underestimates of the abundance of some species (Hobson et al. 2002, but see Lambert and McDonald 2014, Drake et al. 2016). Data derived from ARUs, as a result, are often reduced to documenting the presence or absence of a given species (Haselmayer and Quinn 2000, Hutto and Stutzman 2009, Celis-Murillo et al. 2012, Tegeler et al. 2012, Digby et al. 2013, Wimmer et al. 2013, Klingbeil and Willig 2015, Leach et al. 2016), thereby discarding much of the information available to a field observer and reducing their utility for inferring population numbers (Nur et al. 1999).
We present the results from experiments using a system for estimating the direction-of-arrival (DOA) of an incoming sound to a microphone in both the azimuth and elevation directions. This system builds upon previous DOA estimation systems that have been used for the study of vocalizing animals. In marine systems, sonobuoys developed by the U.S. Navy are able to estimate the DOA of an incoming sound source, and were used by Rone et al. (2012) to triangulate the calls of Northern Pacific right whales (Eubalaena japonica) during marine mammal surveys. DOA estimation of marine mammals has also been accomplished from small microphone arrays mounted near the seafloor (Wiggins et al. 2012) and from hydrophone arrays towed behind a boat (Miller and Tyack 1998, Leaper et al. 2000). In terrestrial ecosystems, Celis-Murillo et al. (2009) created a system that used four orthogonal microphones to record the soundscape. The soundscape was subsequently recreated in the laboratory by setting up speakers around an observer, and experimentation revealed that similar or better estimates of abundance could be made for many bird species in the laboratory than were obtained on point counts by observers in the field.
In contrast to the system of Celis-Murillo et al. (2009), our system employs signal processing algorithms to estimate DOA in both the azimuth and elevation directions. This approach is similar to that employed by Wang et al. (2005) for localizing woodpeckers, Kojima et al. (2017) for analyzing soundscapes, and by Ali et al. (2009) for localizing marmots and birds. Although these systems proved accurate, they have been used sparingly in the field, likely because of a reliance on custom-built equipment or software that is costly and requires considerable expertise to operate. We attempted to address this by constructing a system from commercially available ARUs (two SM3 recording units; Wildlife Acoustics, Inc., Maynard, MA, USA) that are robust to field conditions, easy to use, and widely used by biologists for bioacoustic surveys. Here we present experiments aimed at (1) testing the accuracy of DOA estimation, and (2) testing the ability of the device to count a small number (up to four) simultaneously vocalizing birds from a single point.
The basic design of our system is shown in Figure 1, and a brief description of the recording system can also be found in Taylor et al. (2016). The system uses four simultaneously-recording microphones to estimate the direction from which a sound arrived, based on the phase differences of the incoming sound waves at the microphones (Fig 1a). We attached four microphones (SMM-A1 microphones; Wildlife Acoustics, Inc.) to two GPS-enabled SM3 recording devices (Wildlife Acoustics, Inc.), each with two recording channels. Microphones were attached to a central mount, as shown in Fig. 1b. The ideal dimensions of the microphone mount vary as a function of the frequency of the sounds to be recorded (Wang et al. 2005), and the dimensions must be known with high precision to facilitate accurate determination of the DOA of a signal. To accomplish this, we 3-D printed the microphone mount specifically for the SMM-A1 microphones, such that the microphones were positioned in an isotropic configuration and each microphone was separated by 68 mm from the three other microphones (Taylor et al. 2016). The geometry of the microphones was intended for analysis of frequencies between 1000 and 6000 Hz, which includes the frequency of the songs of most songbird species. For all recordings collected here, devices were set to record at a sampling rate of 96000 Hz with a high-pass filter at 1000 Hz, and with a gain of 30 dB.
Complicating matters in this iteration of the recording system was the need for precise synchronization between the four recording channels. In our device, the four channels were connected to two separate SM3 devices that could record no more than two channels each. Though GPS attachments were used to bring these two devices into synchrony, experimentation revealed that even this level of synchrony was not sufficient for our needs.
A workaround was achieved by fixing a single earbud headphone to the microphone mount on an attachment designed to place the earbud equidistant from one of the channels from each recording device. This earbud was attached to a cell phone programmed to emit a synchronization signal (in our case, a recording of 40 single notes played on a piano over the course of 30 seconds) every five minutes, which could subsequently be used to ensure synchronization of the two devices. The files needed to 3-D print both the main mount and the earbud attachment can be found on Figshare (http://dx.doi.org/10.6084/m9.figshare.3792780).
We created a Matlab App, called SDEer, to conduct analysis of recordings. This software includes a user-friendly Graphical User Interface (GUI), allowing straightforward use by inexperienced users. The software and detailed instructions for its use can be found on Figshare (see link above).
The extraction of DOA from a raw sound recording consists of three steps: synchronization, segmentation, i.e., event detection, and DOA estimation. SDEer allows synchronization to be conducted either manually or automatically. Manual synchronization can be used if the magnitude of the time offset between the two SM3 units is known, and this offset can be specified in SDEer. If the offset between the two units is not known, SDEer can calculate it automatically. To do so, the raw sound file used as a synchronization signal must be provided to the software, and SDEer uses waveform cross-correlation to locate this signal in the recordings from the two SM3 devices. Because the synchronization signal was broadcast from a point equidistant from one of the channels from each SM3 unit, these detected synchronization signals can be used to calculate the offset between the two devices, and then to bring the two devices into precise synchrony.
Segmentation can also be accomplished either automatically or manually. If automatic segmentation is desired, the user can select from among 10 different end-point detection algorithms to automatically detect the start- and end-points of sounds of interest from the recording. If manual segmentation is preferred, the program can read in a .TextGrid file created using the free online linguistics software Praat (Boersma and Weenink 2014), containing manually detected start- and end-points for the signals of interest.
DOA estimation is thereafter accomplished by specifying the geometry of the microphones, selecting a DOA estimation algorithm, and specifying the desired precision of DOA estimation, the range of possible angles from which the source might originate, and the expected frequency bandwidth of the sounds of interest. Though the DOA algorithms differ in their details, all rely on phase differences in the signal of interest to estimate the relative delay of the same signal arriving at each microphone, from which the most likely direction of the source can be inferred. The selected algorithm “searches” through the possible angles from which the signal could have originated and outputs the direction with the highest likelihood of having generated the signal. Specifying a higher precision of estimation or a greater range of angles to be searched has the consequence that the DOA takes more time to compute. Once DOAs have been estimated, the results can be manually copied from SDEer to another program, or SDEer can output a .TextGrid file with additional tiers showing the direction of each signal of interest.
Experiments to test the accuracy of the system were carried out over two days at the UCLA La Kretz Field Station in Malibu, CA, USA (34° 5'49.33" N, 118° 48'57.03" W). On 9 February 9 2016, we tested the accuracy of DOA estimation for single sources at 10 m, 15 m, 20 m, 25 m, and 30 m from the microphone (“distance experiments”); on 16 February 16 2016, we tested the performance of the system when up to four sources played simultaneously at a distance of 15 m (“multisource experiments”). On both days, the recording device was placed in the middle of a gradually sloped, grassy field, with playback speakers placed at various predetermined azimuth angles. Elevation angles were not varied systematically, but varied according to the topography, such that the range of elevation angles included in this analysis range from approximately -10° to +15° from horizontal.
Each of the four playback stimuli was an unaltered recording of one of four songbird species: Bewick’s Wren (BEWR, Thryomanes bewickii), California Thrasher (CATH, Toxostoma redivivum), Black-headed Grosbeak (BHGR, Pheucticus melanocephalus), and Cassin’s Vireo (CAVI, Vireo cassinii), singing their typical songs during the breeding season in California. The recordings, therefore, contained bursts of song interspersed with periods of silence, as is typical for these species. Thus, even when multiple recordings played at the same time in the multisource experiments, not every single burst of sound overlapped with a song from another speaker; many, by chance, were unimpeded by background noise.
The four species differed in the acoustic and temporal properties of their songs, most strikingly in the length of each song, the length of the silent intervals between songs, and the frequencies of the songs, which are summarized in Table 1. During playbacks, sound files were played on a loop for a variable period (30–300 s), before being turned off and moved to another location. During the distance experiments, the loop consisted of the recordings of all four species played consecutively. During the multisource experiments, a single species was played on a loop from each location. At times during the multisource experiments, up to four speakers broadcasted the songs of up to four different species, while at other times the four speakers all broadcasted CAVI song to simulate a situation, such as might commonly be encountered during point-count surveys or studies of counter-singing interactions, where multiple individuals of the same species sing at the same time from different territories. An effort was made to standardize the amplitude of the broadcasts so that each recording registered approximately 80 dB when measured at 1 m from the microphone using a Radioshack Sound Level Meter 33-2055 (Radioshack Corporation, Fort Worth, TX, USA), which is accurate ± 2 dB.
We measured the angle from the microphone mount to each speaker using an Alton AT0132300 Tripod Multi-Beam and Rotary Laser Level set (Alton Industries Group, Ltd., Batavia, IL, USA). We pointed the laser level toward the speaker to estimate the azimuth angle from which each source originated, ranging from 0° to 359°. Elevation angles were approximated using trigonometry based on the known distance to the speaker and an estimated elevation relative to the center of the four microphones.
All analyses were conducted in Matlab using the SDEer package. For current purposes, we elected to use manually segmented recordings because the automated system often combined multiple sounds into one, even when they did not overlap. During manual segmentation, we noted the number of distinct sound sources that appeared to have contributed to the sound in question, allowing us to assess the performance of the algorithm on overlapping versus nonoverlapping sounds. The resulting TextGrid file was used to provide temporal boundaries for the signals of interest, which were then analyzed to extract the DOA of each signal. We used the MUSIC algorithm for DOA estimation (Schmidt 1986), though we recognize that different algorithms may differ in their accuracy and sensitivity to recording conditions. The algorithm searched angles from 0° to 359° in the azimuth angle, and from -30° to +75° in the elevation angle, with 1° of resolution. The analysis was restricted to the frequency band from 1000 to 6000 Hz.
In the distance experiments, we tested the accuracy of DOA estimation at five-meter increments between 10 and 30 m from the microphone, at two different angles (45° and 285°). During these experiments, the sounds of all four species were broadcast at each distance from the microphone and from each angle. The error distributions at these distances are shown in Figure 2, and show that the vast majority of sounds were localized accurately within 10° at distances up to 30 m from the microphones. Unexpectedly, the lowest accuracy was obtained at 15 m at an angle of 285°. Of 78 sounds originating from this location, 16 of these were estimated to have arrived from ~315° in the azimuth angle, an error of about 30°. All 16 of these errors were BHGR song, which may indicate a lower performance on localizing the songs of some species than others. In particular, previous research has found that tonal sounds lacking frequency modulation, such as those often delivered by BHGR, are often less easily localizable than less tonal sounds (McGregor et al. 1997, Bower and Clark 2005, Mennill et al. 2012). Errors of this magnitude were not common at any other distance or angle, nor were they commonly encountered in the subsequent experiments outlined below, so further experiments will be required to determine the extent to which the acoustic properties of sounds influence the accuracy of DOA estimation.
During the multisource experiments, we tested the performance of the system for counting multiple vocalizing birds. Here, a sound comprising four overlapping sounds was annotated equivalently to a sound that was originated from a single source, though the number of signals contributing to each recorded sound was noted. Results from all trials are visualized in Figures 3 and 4. Across trials, the estimated direction of the source was typically within a few degrees of one of the true directions. The azimuth angle showed an accuracy of 5.1 ± 7.6° (mean ± SD). This error distribution was highly skewed by relatively few highly erroneous DOA estimates, and 92% of all sounds were localized within 10° of the true direction. Accuracy in the elevation angle was 3.5 ± 3.1°, and 96% of all localizations were within 10° of the true source in the elevational direction.
For the purposes of counting multiple birds of the same species, it is clear from Figs 4a–4e that up to four individuals singing simultaneously can be counted using this system. Fig 4b reveals a caveat, namely that it would be challenging to separate the two birds if they are too close to each other in angular distance. When birds were separated by more than about 20°, estimated directions clustered together in the vicinity of the true DOA, and could be readily counted with a high level of confidence.
To summarize the above results, we combined all DOA estimations from the distance experiments and the multisource experiments to assess the overall accuracy of the system under the various conditions tested here. The mode, median, and mean errors in the azimuth angle were 2°, 4°, and 5.4°, respectively, and in the elevation angle were 0°, 3°, and 3.6°, respectively. Ninety-five percent of all signals were estimated within 12° of the true DOA in the azimuth angle, and within 9° in the elevation angle (Fig. 5).
A consideration when analyzing sounds is the effects of background noise or overlapping sounds on the performance of the system. In terrestrial ecosystems, soundscapes can be complex, making background noise the norm rather than the exception. Our data was suitable to analyze the effects of overlapping sounds on the performance of the DOA algorithm: of the 1387 sounds analyzed during the multisource experiments, 839 were isolated sounds from a single speaker, 343 included overlapping sounds from two speakers, 144 included overlapping sounds from three speakers, and 60 included overlapping sounds from all four speakers. Accuracy measures under these varying levels of signal overlap are provided in Table 2.
In general, the algorithm selected as the true DOA one of the speakers that had contributed to the incoming signal, seemingly selecting the signal with the most energy, though the accuracy declined slightly as the amount of overlap increased (Table 2). An important consequence of this was that, when signals overlapped, one or a few speakers tended to “win out” over others, effectively masking the presence of one or more of the other speakers. This acoustic masking effect is most evident in Figures 4h-j, where one or more speakers were not detected at all. This resulted from the predominance of the BHGR signal during these experiments, which was both positioned uphill of the microphones and broadcasted songs that were very short and frequent. It is also possible that subtle differences in signal amplitude between speakers exacerbated this effect. The BEWR signal, in contrast, contained long songs that were prone to being overlapped (Fig. 6), leading BEWR songs to be less frequently detected.
There are a few ways this issue can be addressed. First, some algorithms, including the MUSIC algorithm employed here (Schmidt 1986), have been shown elsewhere to successfully discriminate the directions of two or more overlapping signals (Zhang et al. 2014). Once the directions of sources have been identified, beamforming can be used to amplify a signal from a particular direction while filtering out sounds from other directions, as a means of enhancing the sound of interest (Zhang et al. 2014). An alternative approach, feasible when using manual segmentation for the detection of sounds, is to ignore signals that are obviously overlapped and only localize sounds that occur in isolation. In the case of the long songs of the BEWR, this may mean localizing a portion of a given song when parts of the song are overlapped, as demonstrated in Figure 6. We employed this approach to recreate the results from Figures 4h-4j in Figures 7a-7c, showing that the BEWR signal can be recovered from a complex soundscape by focusing on nonoverlapped parts of the songs.
We tested the time needed for DOA estimation of 100 CAVI sounds on a standard laptop (Toshiba Satellite R845-S95). We estimated the DOA of these sounds by setting the algorithm to search with varying levels of angular resolution, from 1° of resolution in the azimuth and elevation angles, to 10° in both directions. When estimating DOA with the highest angular resolution (i.e., 1° in azimuth x 1° in elevation), this task took 6365 seconds, or 64 seconds per sound. The time required for DOA estimation declined as a function of the resolution of the angles to be searched, such that a resolution of 2° x 2° took one-quarter as long, a 3° x 3° resolution took one-ninth as long, etc. DOA estimation with 10 x 10 resolution took just 32 seconds. The decline in computation time was nearly perfectly described by the equation
(R² = 0.999). Given that DOA estimation was only accurate within about 10°, it is likely that computation could be expedited by searching a coarser grid of angles, with minimal effect on overall performance. Alternatively, if only azimuth angles are of interest, computational costs could be reduced by searching a coarser grid in the elevation angle, while maintaining high resolution in the azimuth angle. Other algorithms have also been shown to reduce computation costs significantly (Zhang et al. 2014), and may be incorporated into future versions of the SDEer software.
Our results illustrate that our DOA estimation system can successfully discriminate and determine the bearing to a small number of incoming sources, and that DOA estimation remains accurate to at least 30 m from the speaker and in the presence of overlapping sounds. In both the azimuth and elevation angles, our system could reliably locate incoming sounds with an accuracy that typically fell within 10° of the true source. As a comparison, human listeners under laboratory conditions have been shown to be capable of discriminating azimuth angles within 2°, and elevation angles within 3.5°, though errors increase to as high as 20° depending on the location of the source relative to the head (Middlebrooks and Green 1991). This ability to determine the directionality of incoming sounds forms a critical component of auditory scene analysis, the process by which the human brain successfully analyzes complex and overlapping sounds, extracting meaningful information from noisy auditory inputs (Bregman 1990). DOA estimation using multiple microphones, as shown here, may yield similar benefits when applied in digital recording systems, bringing the capabilities of ARU technology more in line with the abilities of human observers.
We believe that the DOA estimation errors presented here are conservative because of the difficulties of assessing the “true DOA” in the field. For instance, careful inspection of Figures 3 and 4 reveals that estimation errors for some sources were often biased in one direction or another. The speaker positioned at an azimuth of 10° was consistently estimated closer to 5°. Similarly, elevation angle estimates for the speaker positioned at an azimuth of 100° were consistently a few degrees low. We suspect that our estimations of the true locations of the speakers were accurate within about 1° or 2°, on average. Accordingly, the errors in accuracy reported here may be 1° or 2° higher than the true errors. Future experiments should explore more accurate methods for measuring the true DOA of sound sources, to minimize the potential influence of measurement error on accuracy estimates.
Given the performance of our system at estimating DOA, we suggest that this system, or one like it, can help advance ARU surveys beyond a reliance on the presence/absence of species, and toward more accurate assessments of abundance. Indeed, given the known shortcomings of human observers, including inaccurate assessments of abundance (Simons et al. 2007), and considerable variation between observers (Alldredge et al. 2007, Simons et al. 2007, Celis-Murillo et al. 2009), it is plausible that such a system may someday surpass human observers in their ability to count the number of vocalizing birds within audible distance of a microphone.
Some studies have reported high agreement between ARU- and field-based counts, even without DOA information, which may raise the question of whether DOA is a necessary feature for counting birds from ARU recordings. As an example, Venier et al. (2012) derived very similar counts from ARU-based surveys as were obtained in the field. On 220 surveys, they detected an average of about 14 individuals of 10 species on each point with both methods. The modest number of individuals per species (~1.4 individuals per species per point) in their dataset may have contributed to the high performance of ARUs on these surveys. When species are represented by one or a few individuals, counting them from a stereo recording is expected to be straightforward, and ARUs likely suffice in their current form. Such situations may be the norm, but are not universal: Drake et al. (2016) report challenging situations where > 6 Yellow Rails (Coturnicops noveboracensis) could be heard from a single point; listening to a stereo recording in such circumstances is largely unhelpful because of the chaotic nature of the soundscape. The likelihood of encountering such high densities of birds varies considerably from one species to the next. Yellow Rails may be an exceptional case, because their locally restricted breeding habitat mean they are sometimes found at high densities where suitable habitat presents itself (Leston and Bookhout 2015). In this situation, we expect DOA to confer the highest benefit. In more typical circumstances where one or a few individuals of a species can be heard from a single point, the benefits of DOA may be reduced, but should increase as a function of population density.
For DOA estimates to be used for counting individuals, a protocol is needed to convert DOA to counts for each species, for example by clustering sounds arriving from similar angles and assuming they came from the same bird. Simple clustering of DOA estimates risks overcounting individuals, especially when an individual moves during a survey, but incorporating information regarding species-specific calling rates (Drake et al. 2016) or mobility could help clarify whether two sounds likely originated from the same individual. Conversion of raw counts to estimates of population density will further necessitate estimates of detection probability and the effective survey radius of the microphones (Marques et al. 2013).
In addition to counting individuals, this system appears suitable for tracking the movements of vocalizing individuals relative to the microphone. Information regarding the location and movements of an animal can provide crucial context when examining variation in signaling behaviors that may vary according to the location of the signaler (Simpson 1985, Haff et al. 2015), and to studies of vocal interactions involving multiple individuals (Vehrencamp et al. 2014). To fully realize this potential, it will be necessary to estimate not only the DOA of a source, but also its absolute location. Our system could be used to accomplish this in two ways. The first, and simplest, way would be to combine the direction estimate from a single node, i.e., a set of four microphones, as shown in Figure 1b, with the relative amplitude of the sound at the microphone. Using amplitude as a proxy for distance, an absolute location can be roughly estimated. This method would require calibration for each species, and accuracy will be affected by variables such as the direction the animal is facing (Patricelli et al. 2007), vegetation structure (Morton 1975), and the amplitude of production at the source, which is known to vary as a function of social context (Akçay et al. 2015, Reichard and Welklin 2015).
The second way of estimating absolute location would be to combine DOA estimates from multiple recording nodes to triangulate the vocalizing bird in three-dimensional space (Wang et al. 2005, Ali et al. 2009, Griffin et al. 2015). The principles of this approach are similar to those in radio telemetry studies, where an animal’s location is estimated based on the intersection of bearings to the animal’s radio transmitter measured from different locations (White and Garrott 1990). Compared with single-node localization, triangulation is expected to be more accurate, with accuracy increasing as more nodes are added to a recording array. Concomitant with these increases in accuracy, however, are increases in equipment costs and the need for more involved analyses; the ideal method for any study will depend on the research questions being asked and the resources available to carry out the research.
Localization of animal sounds is not new, having been accomplished in many studies of marine mammals (Janik et al. 2000), terrestrial vertebrates (Blumstein et al. 2011), and migrating birds (Stepanian et al. 2016). Localization has most commonly been accomplished by assessing differences in the arrival time of a signal at multiple, widely spaced microphones, a technique commonly referred to as the time-differences-of-arrival (TDOA) approach to localization (Stafford et al. 1998, Janik et al. 2000, Bower and Clark 2005, Mennill et al. 2006, 2012, Collier et al. 2010, Stepanian et al. 2016). The microphone picking up the sound first is presumed to be closest to the sound source, and the relative time delay of the signal at all other microphones is used to estimate the absolute location based on the speed of sound transmission through the relevant medium, i.e., air or water. A similarity between the TDOA- and DOA-based approaches is that both rely on differences in the arrival time of a signal at multiple microphones; the TDOA approach uses time differences between widely spaced microphones to directly calculate the location of a source, whereas the DOA approach uses time differences between closely spaced microphones to estimate direction. Though the two approaches are based on the same fundamental principles, they likely confer distinct benefits. A benefit of DOA for localization is that it does not require precise synchronization between nodes of an array, only between microphones within a node. In theory, precise synchronization within a node should be easier than between nodes, because all microphones could be wired to a single four-channel recording system. Localization using TDOA, in contrast, requires synchronization between the nodes in an array. This has been accomplished in the past either by wiring the nodes together, an expensive and labor-intensive task (Mennill et al. 2006), or, more recently, through the use of GPS (Mennill et al. 2012) or wireless communication (Collier et al. 2010). Any errors in the synchronization between nodes will affect the accuracy of TDOA-based location estimates.
A benefit of TDOA localization is that each node in an array can comprise just a single microphone (e.g., Mennill et al. 2006), while DOA estimation requires at least three microphones per node (in this study, four were used). The requirement of more microphones per node for DOA is at least partially offset by the fewer number of nodes required to estimate locations using DOA: TDOA requires at least three distinct nodes to estimate an absolute location, while DOA can accomplish this with as few as one self-contained unit. Both methods are expected to produce more accurate location estimates when using a larger number of nodes. A further consideration is that TDOA approaches are typically limited in their ability to estimate the vertical position of a vocalizing animal, and for this reason are generally used for two-dimensional localization (Janik et al. 2000, Laurinolli et al. 2003, Bower and Clark 2005, Mennill et al. 2006, 2012, Collier et al. 2010, but see Stepanian et al. 2016). Our results suggest that DOA-based localizations should be accurate in both the horizontal and vertical dimensions (Fig. 5b).
These differences between TDOA- and DOA-based localization approaches are largely hypothetical. In practice, considerations of cost, ease-of-use, the relevance of the vertical dimension, and the need for synchronization must be evaluated in light of the research questions being asked. Most importantly, additional experiments are clearly needed to test the accuracy of DOA for localization and to directly compare its accuracy with TDOA-based localizations on the same data because accuracy will surely be a critical piece of information for most applications.
DOA technology, although promising, has infrequently been deployed for practical surveying and behavioral monitoring purposes (Wiggins et al. 2012). As a result, several methodological issues associated with its use remain to be investigated, in addition to those outlined above. Tests of the system in a more complex ecosystem would be desirable. Our experiments took place in an open field, and it is possible that performance would decline in forest environments, as has been the case for other localization systems (McGregor et al. 1997, Mennill et al. 2012). One explanation for the frequent errors in DOA estimation in Figure 2c-2d, for example, is that they may have been caused by echoes or reverberations leading to erroneous estimations of direction; echoes and reverberations are expected to be more prominent as the number of obstacles and surfaces in the habitat increases, but the extent to which this will hinder the performance of this system remains to be examined.
Distances were only tested to 30 m in our experiment because a fence surrounded our study site, preventing testing at a greater range of distances. At these distances, there was no clear relationship between DOA accuracy and distance (Fig. 2). In exceptional circumstances, sounds can sometimes be detectable by an ARU at distances up to 400 m (Lambert and McDonald 2014), so there is a need to test the performance of DOA estimation across larger distances. The results of such an experiment would have direct implications for the maximal size of a localization array, and for the maximal distance to which this system could be used to count birds. We expect that signal-to-noise ratio, rather than distance, will be the most critical determinant of DOA accuracy. If so, the effective radius for DOA estimation will likely vary by species, with weather conditions, and as a function of background noise and microphone quality.
In addition, we only tested the system on sounds originating at vertical angles within 15° of horizontal, and the algorithm only searched vertical angles between -30° and +75°. In practical deployments, sounds may originate from any angle. The microphones used here were arranged in an isotropic configuration, so we expect the system to show similar performance, regardless of the angle of incidence of the sound. It is possible, however, that echoes off the ground may be more pronounced for sounds originating from above, thereby affecting DOA accuracy. Such considerations are expected to be most important for studies aimed at tracking birds in forest environments or when they are flying overhead (Stepanian et al. 2016).
Given the above methodological concerns, experiments testing the ability of the system to track the movements of, and count, real birds in the field under a broader range of conditions, e.g., weather, background noise, or habitat structure, are clearly needed. These could entail simultaneous point counts and ARU surveys at a variety of points, like the approach taken by Venier et al. (2012) but with the addition of a spatial component, where the ARU and human are tasked with estimating the number, direction and location of calling birds. Performance could be assessed by comparing ARU-based estimates with simultaneous human estimates of bird locations and distances. Moreover, the relative importance of DOA estimation for counts could be isolated by comparing counts derived from a stereo recording (using two of the microphones) with counts from the DOA-enabled system and those of the human observer. Experiments of this sort are planned for the near future.
The most important barrier to the widespread adoption of this system remains the lack of suitable hardware. Although our system used commercially available SM3 recording units, synchronization of the two recording units proved challenging. The use of an earbud headphone to broadcast the synchronization signal required minimal background noise, conditions that may not be attainable in the field, especially during the breeding season when biotic noise is at a maximum. We anticipate that a future iteration of this system can overcome this issue either by connecting the two devices to a single time source, or by using a system that records four or more channels by default. The success of our software at extracting DOA from incoming sounds suggests that neither software nor analytical techniques currently limit the adoption of these methods for acoustic monitoring; we hope that by demonstrating the benefits of this capability and by discussing potential applications, future acoustic monitoring systems might be designed to include four recording channels in a particular geometry, and that DOA estimation may someday become a basic feature of ARU systems for bioacoustic monitoring.
We presented a system for estimating the DOA of an arriving sound. The generally high performance of our system suggests that DOA estimation will likely be useful for biologists seeking to employ passive acoustic recordings in their research. DOA estimation may provide at least two primary benefits: to contribute toward more accurate estimates of abundance and to track vocalizing animals through space. The accuracy of our system appears sufficient for both of these purposes, but widespread adoption of systems such as ours is limited by the lack of hardware designed for this particular task. Promisingly, however, the primary hardware limitation is related to synchronization of two independent recording units, which we expect can be addressed in a future version of the system. We hope that DOA estimation capabilities will be a standard feature of ARUs in the future, allowing biologists to count, track, and study animals with greater precision than ever before.
We thank Charles Taylor for constructive comments on the manuscript. This research was funded by National Sciences Foundation Award Number 1125423. We thank the La Kretz Center for California Conservation Science for the use of their field station to conduct experiments.
Akçay, Ç., R. C. Anderson, S. Nowicki, M. D. Beecher, and W. A. Searcy. 2015. Quiet threats: soft song as an aggressive signal in birds. Animal Behaviour 105:267-274. http://dx.doi.org/10.1016/j.anbehav.2015.03.009
Ali, A. M., S. Asgari, T. C. Collier, M. Allen, L. Girod, R. E. Hudson, K. Yao, C. E. Taylor, and D. T. Blumstein. 2009. An empirical study of collaborative acoustic source localization. Journal of Signal Processing Systems 57:415-436. http://dx.doi.org/10.1007/s11265-008-0310-7
Alldredge, M. W., T. R. Simons, and K. H. Pollock. 2007. Factors affecting aural detections of songbirds. Ecological Applications 17:948-955. http://dx.doi.org/10.1890/06-0685
Bioacoustics Research Program. 2014. Raven pro: interactive sound analysis software (Version 1.5). Cornell Lab of Ornithology, Ithaca, New York, USA. [online] URL: http://www.birds.cornell.edu/raven
Blumstein, D. T., D. J. Mennill, P. Clemins, L. Girod, K. Yao, G. Patricelli, J. L. Deppe, A. H. Krakauer, C. Clark, K. A. Cortopassi, S. F. Hanser, B. McCowan, A. M. Ali, and A. N. G. Kirschel. 2011. Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus. Journal of Applied Ecology 48:758-767. http://dx.doi.org/10.1111/j.1365-2664.2011.01993.x
Boersma, P., and D. Weenink. 2014. Praat: doing phonetics by computer. [online] URL: http://www.praat.org/
Bonnel, J., G. Le Touze, B. Nicolas, J. I. Mars, and C. Gervaise. 2008. Automatic and passive whale localization in shallow water using gunshots. Oceans 2008:1-6. http://dx.doi.org/10.1109/oceans.2008.5151937
Bower, J. L., and C. W. Clark. 2005. A field test of the accuracy of a passive acoustic location system. Bioacoustics 15:1-14. http://dx.doi.org/10.1080/09524622.2005.9753535
Brandes, T. S. 2008. Automated sound recording and analysis techniques for bird surveys and conservation. Bird Conservation International 18:S163-S173. http://dx.doi.org/10.1017/s0959270908000415
Bregman, A. S. 1990. Auditory scene analysis. MIT Press, Cambridge, Massachusetts, USA.
Celis-Murillo, A., T. J. Benson, J. R. Sosa-López, and M. P. Ward. 2016a. Nocturnal songs in a diurnal passerine: attracting mates or repelling intruders? Animal Behaviour 118:105-114. http://dx.doi.org/10.1016/j.anbehav.2016.04.023
Celis-Murillo, A., J. L. Deppe, and M. F. Allen. 2009. Using soundscape recordings to estimate bird species abundance, richness, and composition. Journal of Field Ornithology 80:64-78. http://dx.doi.org/10.1111/j.1557-9263.2009.00206.x
Celis-Murillo, A., J. L. Deppe, and M. P. Ward. 2012. Effectiveness and utility of acoustic recordings for surveying tropical birds. Journal of Field Ornithology 83:166-179. http://dx.doi.org/10.1111/j.1557-9263.2012.00366.x
Celis-Murillo, A., K. W. Stodola, B. Pappadopoli, J. M. Burton, and M. P. Ward. 2016b. Seasonal and daily patterns of nocturnal singing in the Field Sparrow (Spizella pusilla). Journal of Ornithology 157:853-860. http://dx.doi.org/10.1007/s10336-015-1318-y
Collier, T. C., A. N. G. Kirschel, and C. E. Taylor. 2010. Acoustic localization of antbirds in a Mexican rainforest using a wireless sensor network. Journal of the Acoustical Society of America 128:182-189. http://dx.doi.org/10.1121/1.3425729
Digby, A., M. Towsey, B. D. Bell, and P. D. Teal. 2013. A practical comparison of manual and autonomous methods for acoustic monitoring. Methods in Ecology and Evolution 4:675-683. http://dx.doi.org/10.1111/2041-210X.12060
Drake, K. L., M. Frey, D. Hogan, and R. Hedley. 2016. Using digital recordings and sonogram analysis to obtain counts of Yellow Rails. Wildlife Society Bulletin 40:346-354. http://dx.doi.org/10.1002/wsb.658
Griffin, A., A. Alexandridis, D. Pavlidi, Y. Mastorakis, and A. Mouchtaris. 2015. Localizing multiple audio sources in a wireless acoustic sensor network. Signal Processing 107:54-67. http://dx.doi.org/10.1016/j.sigpro.2014.08.013
Haff, T. M., A. G. Horn, M. L. Leonard, and R. D. Magrath. 2015. Conspicuous calling near cryptic nests: a review of hypotheses and a field study on White-browed Scrubwrens. Journal of Avian Biology 46:289-302. http://dx.doi.org/10.1111/jav.00622
Haselmayer, J., and J. S. Quinn. 2000. A comparison of point counts and sound recording as bird survey methods in Amazonian southeast Peru. Condor 102:887-893. http://dx.doi.org/10.1650/0010-5422(2000)102[0887:ACOPCA]2.0.CO;2
Heinicke, S., A. K. Kalan, O. J. J. Wagner, R. Mundry, H. Lukashevich, and H. S. Kühl. 2015. Assessing the performance of a semi-automated acoustic monitoring system for primates. Methods in Ecology and Evolution 6:753-763. http://dx.doi.org/10.1111/2041-210X.12384
Hobson, K. A., R. S. Rempel, H. Greenwood, B. Turnbull, and S. L. Van Wilgenburg. 2002. Acoustic surveys of birds using electronic recordings: new potential from an omnidirectional microphone system. Wildlife Society Bulletin 30:709-720.
Hutto, R. L., and R. J. Stutzman. 2009. Humans versus autonomous recording units: a comparison of point-count results. Journal of Field Ornithology 80:387-398. http://dx.doi.org/10.1111/j.1557-9263.2009.00245.x
Janik, V. M., S. M. Parijs, and P. M. Thompson. 2000. A two-dimensional acoustic localization system for marine mammals. Marine Mammal Science 16:437-447. http://dx.doi.org/10.1111/j.1748-7692.2000.tb00935.x
Klingbeil, B. T., and M. R. Willig. 2015. Bird biodiversity assessments in temperate forest: the value of point count versus acoustic monitoring protocols. PeerJ 3:e973. http://dx.doi.org/10.7717/peerj.973
Kojima, R., O. Sugiyama, K. Hoshiba, K. Nakadai, R. Suzuki, and C. E. Taylor. 2017. Bird song scene analysis using a spatial-cue-based probabilistic model. Journal of Robotics and Mechatronics 29:236-246. http://dx.doi.org/10.20965/jrm.2017.p0236
Lambert, K. T. A., and P. G. McDonald. 2014. A low-cost, yet simple and highly repeatable system for acoustically surveying cryptic species. Austral Ecology 39:779-785. http://dx.doi.org/10.1111/aec.12143
Laurinolli, M. H., A. E. Hay, F. Desharnais, and C. T. Taggart. 2003. Localization of North Atlantic right whale sounds in the Bay of Fundy using a sonobuoy array. Marine Mammal Science 19:708-723. http://dx.doi.org/10.1111/j.1748-7692.2003.tb01126.x
Leach, E. C., C. J. Burwell, L. A. Ashton, D. N. Jones, and R. L. Kitching. 2016. Comparison of point counts and automated acoustic monitoring: detecting birds in a rainforest biodiversity survey. Emu 116:305-309. http://dx.doi.org/10.1071/MU15097
Leaper, R., D. Gillespie, and V. Papastavrou. 2000. Results of passive acoustic surveys for odontocetes in the Southern Ocean. Journal of Cetacean Research and Management 2:187-196.
Leston, L., and T. A. Bookhout. 2015. Yellow Rail (Coturnicops noveboracensis). In P. G. Rodewald, editor. The birds of North America. Cornell Lab of Ornithology, Ithaca, New York, USA. [online] URL: https://birdsna.org/Species-Account/bna/species/yelrai
MacSwiney González, M. C., F. M. Clarke, and P. A. Racey. 2008. What you see is not what you get: the role of ultrasonic detectors in increasing inventory completeness in Neotropical bat assemblages. Journal of Applied Ecology 45:1364-1371. http://dx.doi.org/10.1111/j.1365-2664.2008.01531.x
Marques, T. A., L. Thomas, S. W. Martin, D. K. Mellinger, J. A. Ward, D. J. Moretti, D. Harris, and P. L. Tyack. 2013. Estimating animal population density using passive acoustics. Biological Reviews 88:287-309. http://dx.doi.org/10.1111/brv.12001
McGregor, P. K., T. Dabelsteen, C. W. Clark, J. L. Bower, and J. Holland. 1997. Accuracy of a passive acoustic location system: empirical studies in terrestrial habitats. Ethology Ecology and Evolution 9:269-286. http://dx.doi.org/10.1080/08927014.1997.9522887
Mennill, D. J., M. Battiston, D. R. Wilson, J. R. Foote, and S. M. Doucet. 2012. Field test of an affordable, portable, wireless microphone array for spatial monitoring of animal ecology and behaviour. Methods in Ecology and Evolution 3:704-712. http://dx.doi.org/10.1111/j.2041-210X.2012.00209.x
Mennill, D. J., J. M. Burt, K. M. Fristrup, and S. L. Vehrencamp. 2006. Accuracy of an acoustic location system for monitoring the position of duetting songbirds in tropical forest. Journal of the Acoustical Society of America 119:2832-2839. 10.1121/1.2184988 http://dx.doi.org/10.1121/1.2184988
Middlebrooks, J. C., and D. M. Green. 1991. Sound localization by human listeners. Annual Review of Psychology 42:135-159. http://dx.doi.org/10.1146/annurev.ps.42.020191.001031
Miller, P. J., and P. L. Tyack. 1998. A small towed beamforming array to identify vocalizing resident killer whales (Orcinus orca) concurrent with focal behavioral observations. Deep Sea Research II 45:1389-1405. http://dx.doi.org/10.1016/S0967-0645(98)00028-9
Morton, E. S. 1975. Ecological sources of selection on avian sounds. American Naturalist 109:17-34. http://dx.doi.org/10.1086/282971
Mouy, X., D. Hannay, M. Zykov, and B. Martin. 2012. Tracking of Pacific walruses in the Chukchi Sea using a single hydrophone. Journal of the Acoustical Society of America 131:1349-1358. http://dx.doi.org/10.1121/1.3675008
Nur, N., S. L. Jones, and G. R. Geupel. 1999. Statistical guide to data analysis of avian monitoring programs. Biological Technical Report BTP-R6001-1999 U.S. Department of the Interior, Fish and Wildlife Service, Washington, D.C., USA.
Patricelli, G. L., M. S. Dantzker, and J. W. Bradbury. 2007. Differences in acoustic directionality among vocalizations of the male Red-winged Blackbird (Agelaius pheoniceus) are related to function in communication. Behavioral Ecology and Sociobiology 61:1099-1110. http://dx.doi.org/10.1007/s00265-006-0343-5
Porter, J. H., E. Nagy, T. K. Kratz, P. Hanson, S. L. Collins, and P. Arzberger. 2009. New eyes on the world: advanced sensors for ecology. BioScience 59:385-397. http://dx.doi.org/10.1525/bio.2009.59.5.6
Reichard, D. G., and J. F. Welklin. 2015. On the existence and potential functions of low-amplitude vocalizations in North American birds. Auk 132:156-166. http://dx.doi.org/10.1642/AUK-14-151.1
Rone, B. K., C. L. Berchok, J. L. Crance, and P. J. Clapham. 2012. Using air-deployed passive sonobuoys to detect and locate critically endangered North Pacific right whales. Marine Mammal Science 28:E528-E538. http://dx.doi.org/10.1111/j.1748-7692.2012.00573.x
Schmidt, R. 1986. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 34:276-280. http://dx.doi.org/10.1109/TAP.1986.1143830
Simons, T. R., M. W. Alldredge, K. H. Pollock, and J. M. Wettroth. 2007. Experimental analysis of the auditory detection process on avian point counts. Auk 124:986-999. http://dx.doi.org/10.1642/0004-8038(2007)124[986:EAOTAD]2.0.CO;2
Simpson, B. S. 1985. Effects of location in territory and distance from neighbours on the use of song repertoires by Carolina Wrens. Animal Behaviour 33:793-804. http://dx.doi.org/10.1016/S0003-3472(85)80012-9
Stafford, K. M., C. G. Fox, and D. S. Clark. 1998. Long-range acoustic detection and localization of blue whale calls in the northeast Pacific Ocean. Journal of the Acoustical Society of America 104:3616-3625. http://dx.doi.org/10.1121/1.423944
Stepanian, P. M., K. G. Horton, D. C. Hille, C. E. Wainwright, P. B. Chilson, and J. F. Kelly. 2016. Extending bioacoustic monitoring of birds aloft through flight call localization with a three-dimensional microphone array. Ecology and Evolution 6:7039-7046. http://dx.doi.org/10.1002/ece3.2447
Taff, C. C., G. L. Patricelli, and C. R. Freeman-Gallant. 2014. Fluctuations in neighbourhood fertility generate variable signalling effort. Proceedings of the Royal Society B: Biological Sciences 281:20141974. http://dx.doi.org/10.1098/rspb.2014.1974
Taylor, C. E., T. Huang, and K. Yao. 2016. Distributed sensor swarms for monitoring bird behavior: an integrated system using wildlife acoustics recorders. Artificial Life and Robotics 21:268-273. http://dx.doi.org/10.1007/s10015-016-0295-4
Tegeler, A. K., M. L. Morrison, and J. M. Szewczak. 2012. Using extended-duration audio recordings to survey avian species. Wildlife Society Bulletin 36:21-29. http://dx.doi.org/10.1002/wsb.112
Vehrencamp, S. L., J. M. Ellis, B. F. Cropp, and J. M. Koltz. 2014. Negotiation of territorial boundaries in a songbird. Behavioral Ecology 25:1436-1450. http://dx.doi.org/10.1093/beheco/aru135
Venier, L. A., S. B. Holmes, G. W. Holborn, K. A. McIlwrick, and G. Brown. 2012. Evaluation of an automated recording device for monitoring forest birds. Wildlife Society Bulletin 36:30-39. http://dx.doi.org/10.1002/wsb.88
Wang, H., C. E. Chen, A. Ali, S. Asgari, R. E. Hudson, K. Yao, D. Estrin, and C. Taylor. 2005. Acoustic sensor networks for woodpecker localization. Advanced Signal Processing Algorithms, Architectures, and Implementations XV 5910:591009.1-591009.12. http://dx.doi.org/10.1117/12.617983
White, G. C., and R. A. Garrott. 1990. Analysis of wildlife radio-tracking data. Academic, San Diego, California, USA.
Wiggins, S. M., M. A. McDonald, and J. A. Hildebrand. 2012. Beaked whale and dolphin tracking using a multichannel autonomous acoustic recorder. Journal of the Acoustical Society of America 131:156-163. http://dx.doi.org/10.1121/1.3662076
Wimmer, J., M. Towsey, P. Roe, and I. Williamson. 2013. Sampling environmental acoustic recordings to determine bird species richness. Ecological Applications 23:1419-1428. http://dx.doi.org/10.1890/12-2088.1
Zhang, J., G. Kossan, R. W. Hedley, R. E. Hudson, C. E. Taylor, K. Yao, and M. Bao. 2014. Fast 3D AML-based bird song estimation. Unmanned Systems 2:249-259. http://dx.doi.org/10.1142/S2301385014400044