There is growing interest in combining data from multiple point count studies to draw inferences about environmental processes influencing birds at larger spatial and temporal scales than the original studies intended (Cumming et al. 2010). Traditionally, human observers have collected point count data (hereafter HPC) by identifying species using acoustic and visual cues while following standardized protocols (Ralph et al. 1995). However, many differences exist between HPC studies in the point count methods used, i.e., duration of count, fixed or unlimited distance counts (Matsuoka et al. 2014). As well, concerns about human observers not detecting species that are present during a single visit have led to calls for replicating effort at the same locations (Royle and Nichols 2003, Kéry et al. 2005). The use of repeated point counts at the same location within a season to account for varying detection probability among visits has increased interest in the use of autonomous recording units (ARUs; Haselmayer and Quinn 2000, Hobson et al. 2002).
A major benefit of ARUs is that humans only visit each location twice and spend time only deploying and picking up the ARU. The ARU itself can record over an extended period and create an almost unlimited number of repeated surveys of virtually any duration (Haselmayer and Quinn 2000, Hobson et al. 2002). Human observers are more likely to detect some species visually, which can increase the odds of detection, although visual detection area is likely much smaller than aural detection area in heavily vegetated environments (Haselmayer and Quinn 2000, Hutto and Stutzman 2009). Human observers can also estimate distances to individual birds to enable the use of a bounded point count radius and/or distance-based density estimation (Buckland et al. 1993). The relative importance of being able to cost-effectively conduct repeated visits via ARUs versus estimate distance via HPC is unclear in terms of accuracy and precision when assessing trend and status of birds. Regardless, to make the best use of point count data, ornithologists need to evaluate ways to standardize HPC and ARU data to use both data types in the same analyses.
To accurately use data from different point count datasets, ornithologists have converted counts to a common standard, which is typically density (Sólymos et al. 2013). Estimating density of birds using point counts requires the following: (1) accounting for individuals that are available to be detected but do not vocalize or are not seen (Farnsworth et al. 2002, Dawson and Efford 2009); and (2) accounting for declining detection of more distant individuals (Buckland et al. 1993). Removal sampling can address the problem of animal availability based on multiple time intervals that can exist for both HPC and ARU data. However, the second problem of correcting for the area sampled and the distance over which birds are counted is more fundamental. Sound travels different distances depending on the vegetation and atmospheric conditions occurring between the signaller and the receiver (Holland 2001, Padgham 2004, Simons et al. 2007, Pacifici et al. 2008, Tarrero et al. 2008). Detectability can also vary between observers depending on factors such as age, sex, and experience (Pearson et al. 1995, Helzner et al. 2005). To compare the observed number of bird detections between point counts in two separate studies or in two separate vegetation types within the same study, ornithologists should account for the distance travelled by bird song and effective area sampled (Yip et al. 2017). Otherwise, biases in our understanding of habitat selection, population status, and temporal trend may occur if environmental conditions influencing sound transmission significantly differ between sites and times.
There are three main approaches for calculating the area over which bird sounds are detected and thus converted to density: (1) fixed-distance point counts, hereafter FIXED (Hutto et al. 1986, Petit et al. 1995); (2) maximum detected distance (MDD) at which a given species can be detected (Emlen and DeJong 1981, Rosenberg and Blancher 2005); or (3) effective detection radius (EDR) based on distance-sampling methods (Buckland et al. 1993). The FIXED approach does not seem to be possible for ARU-based point counts because signal strength from a species, and hence accuracy of distance estimation will differ because of sound absorption and reflectance varying among environmental conditions (Petit et al. 1995, Padgham 2004, Pacifici et al. 2008). In addition, such approaches discard a lot of useful data on birds that are detected past the fixed distance. In contrast, ornithologists can calculate MDD and EDR for a species from ARU-based data if (1) there are known distances to recordings of birds, and (2) if there is some simultaneously collected distance data from HPC for which MDD or EDR and ARUs can be compared and calibrated. Partners in Flight has used MDD to estimate population sizes (Rosenberg and Blancher 2005), but the Partners in Flight approach to estimating MDD is coarse and does not consider vegetation or atmospheric effects that influence MDD, leading to concerns about this approach when calculating density (Thogmartin et al. 2006). EDR accounts for the decline in detectability as the distance from an observer increases, but like the FIXED approach, EDR varies among species and environmental conditions, and reliable EDR estimates depend on well-trained field observers, accurate distance estimation, and point count methods meeting assumptions of distance sampling (Buckland et al. 1993).
Understanding how microphone and recording settings influence the area sampled for birds is crucial to ensuring that long-term monitoring and comparisons made between studies are valid using ARU techniques. Research programs and monitoring agencies have different preferences, goals, and budgets, which influences the type of ARU they decide to use and these must be calibrated to account for differences in area sampled if results are to be compared. Availability of different ARUs also changes over time as ARUs are continuously improved.
Our approach to comparing ARU models and how far they detect birds relative to human observers relies on using song broadcasts of known amplitude and distance. Distance-based broadcasts whereby a sound is played at varying distances from the observer or recorder are labor-intensive. A potential alternative could involve using a relatively limited number of distances when conducting broadcast trials but varying the volume (amplitude) of the broadcast speaker between ambient background levels and the upper range that birds are known to sing. Quantifying the relationship between amplitude and distance of different species for different ARUs could be a cost-effective way of ensuring that all ARUs are calibrated to a known and documented standard. Although the true relationship between amplitude and distance is unknown, this approach effectively identifies relative differences among ARUs.
We had three objectives. First, we developed and tested two field broadcast and modeling methods to evaluate how detection of birds is influenced by distance, ARU type, amplitude, and environmental variables relative to HPC. We did this by broadcasting sounds with varying frequencies and under different vegetation conditions over a range of distances. We then tested which sounds were detected by HPC in the field and when listening to ARU recordings in the lab. Second, we used known principles of sound physics to estimate EDR and MDD for various species. Third, we provided an approach for standardizing HPC and ARU data in the same analysis by creating generalized correction factors and a simple approach to calibration that can be used to standardize raw counts to density regardless of the method of sampling.
We collected data near Calling Lake (55°11' N, 113°12' W) and Lac la Biche, Alberta (54°38' N, 111°58' W) in August 2014. We conducted our surveys in August to reduce the chance of confusing broadcasted sounds (see below) with the songs of real birds. Broadcasts took place between 07:00–20:00 MST. We recorded broadcasted sounds that we used in our study at a total of 20 sites using ARUs (10 road sites, 5 coniferous forests, and 5 deciduous forests). Coniferous sites consisted primarily of white spruce (Picea glauca) while deciduous sites consisted primarily of trembling aspen (Populus tremuloides). Road sites occurred on flat, low-use forestry roads composed of gravel and clay. At a subset of the 20 sites (8 road, 4 coniferous, and 4 deciduous), observers stood adjacent to the ARUs and indicated which broadcasted sounds they were able to detect.
At each site, we broadcasted known sounds from varying distances (see below) and evaluated whether or not a human observer could detect them. At the same time and location, we also recorded the broadcast sounds on four types of ARUs. All recordings made by the ARUs used 2-channel stereo recordings at 44 kHz and 16-bit .wav format. The four ARUs were (1) Wildlife Acoustics’ SongMeter SM2+ GPS-enabled recording units equipped with SMX-II weatherproof microphones (5 units); (2) Wildlife Acoustics’ SM3 ARUs (5 units); (3) RiverForks CZM recorders (2 units); and (4) Zoom H1 handheld recorders (3 units). We broadcasted sounds with an Alpine ® SPR-60, 6-1/2" car speaker/tweeter and an Alpine ® UTE-42BT car stereo/audio player (Gentec Int'l, Markham, Ontario), both installed into an 11" (width) x 10" (depth) x 15" (height) plywood speaker box, along a transect from 12 to 1312m. We placed the speaker at 25-m intervals for the first 400 m, 50-m intervals between 400–800m, and 100-m intervals for broadcasts beyond 800 m. The same sequence of calls was broadcast at each distance. On forested transects where the ARU was not visible from the transmitting unit, we used a GPS and compass to properly align the speaker toward the ARU.
The broadcasted sequence began with a series of 7 pure tones (at frequencies of 1000Hz, 1414Hz, 2000Hz, 2828Hz, 4000Hz, 5656Hz, and 8000Hz) generated using Adobe Audition CS6. The song sequence following the tones consisted of 23 boreal bird species and 2 amphibian species broadcast in the following order: Clay-colored Sparrow (Spizella pallida; CCSP), Black-and-white Warbler (Mniotilta varia; BAWW), Lincoln’s Sparrow (Melospiza lincolnii; LISP), Brown-headed Cowbird (Molothrus ater; BHCO), Red-breasted Nuthatch (Sitta canadensis; RBNU), Dark-eyed Junco (Junco hyemalis; DEJU), White-throated Sparrow (Zonotrichia albicollis; WTSP), Cape May Warbler (Setophaga tigrina; CMWA), Common Raven (Corvus corax; CORA), Belted Kingfisher (Megaceryle alcyon; BEKI), Olive-sided Flycatcher (Contopus cooperi; OSFL), Pine Siskin (Spinus pinus; PISI), Tennessee Warbler (Oreothlypis peregrina; TEWA), Warbling Vireo (Vireo gilvus; WAVI), Rose-breasted Grosbeak (Pheucticus ludovicianus; RBGR), Ovenbird (Seiurus aurocapilla; OVEN), Yellow Rail (Coturnicops noveboracensis; YEAR), Western Toad (Anaxyrus boreas; WETO), Canadian Toad (Anaxyrus hemiophrys; CATO), Northern Saw-whet Owl (Aegolius acadicus; NSWO), Boreal Owl (Aegolius funereus; BOOW), Long-eared Owl (Asio otus; LEOW), Great Gray Owl (Strix nebulosa; GGOW), and Barred Owl (Strix varia; BADO). We selected these species for a variety of song characteristics that may affect probability of detection (pitch, song length). All sounds were normalized in Audition to bring peak amplitude to a standardized level. We broadcasted sounds at 90 dB, which we measured 1 m from the speaker system (based on fast-time A-weighting) using a handheld sound meter (Sper Scientific 840018).
At each transect, we attached each of the 4 ARU types to a tree or post at a height of 1.5 m. This was the same height as the speaker broadcasting the recordings at the starting point of the transect and we chose transects with minimal elevational change. For each point along a transect, we recorded the time of the broadcast and distance of the broadcast speaker from the ARUs and human observer using a GPS (± 3 m). We also measured temperature, humidity, and wind speed during each broadcast using a Kestrel 3000 pocket weather meter. Following the end of the broadcasted sequence, the first observer moved the speaker an additional 25 m along the transect and the process was repeated.
We clipped recordings into individual files for each distance from each type of ARU. Observers in the lab listened to these files at standardized volume levels and noted which species and tones they could identify and detect for each distance and each type of ARU recording. For this experiment, observers in the lab listened to tones and songs in the recordings in the original sequence that the tones and songs were broadcast to make things directly comparable to the HPC. For pure tones, observers only had to identify that a tone was present, not what frequency was broadcast. Using this method, we generated a large dataset of detections or nondetections from sounds that were known to have occurred (n = 96,502). During the HPC, the observer in the field recorded whether they could hear and correctly identify each sound as it was broadcast in sequence.
We divided data randomly into 70% training data (n = 1898 for each species or tone, without replacement) for model development and 30% test data (n = 813 for each species or tone) for model validation (sample function, R [R Core Team 2013]). We assessed the detection/nondetection of each species or tone using generalized linear models (glm function, R [R Core Team 2013]) with a binomial error family. All models included distance as a predictor of whether a tone or song was detected. We used a model where distance was the only predictor as a null model, where p(d) declined with distance at the same rate in different habitats, in different weather conditions, and for human observers versus different ARU brands. We compared this null model to 11 candidate models (Table 1). For the weather models, we had considered temperature as well, but dropped that variable because it was positively correlated with humidity.
We used Akaike’s Information Criterion to rank the relative fit of models (Burnham and Anderson 2002, Arnold 2010). To assess the absolute model fit or goodness-of-fit of the top AIC-ranked model, we used the area-under-the-curve (AUC) within receiver-operator curves for each species as a test statistic (roc function, pROC package, R [Robin et al. 2011]). AUC measures the proportion of actual detections and nondetections that were correctly predicted by the best model as opposed to false negatives or positives. We calculated AUC for the test data set excluded from model generation. We rated models with AUC > 0.70 as having sufficient ability to correctly predict if a song or tone was or was not detected (Vanagas 2004).
EDR gives the radius of the circle where the expected number of available individuals not detected within the distance equals the expected number of the detected individuals outside of that distance (Buckland et al. 1993). We estimated EDR for our Calling Lake dataset with a separate set of models rather than the set used for modeling detectability. The shape of the distance function describes how detection probability attenuates as a function of broadcast speaker distance (d) from the ARUs and human observer. The distance function is a strictly monotonic decreasing function with increasing distance. There are many different mathematical formulations to describe this shape, however we chose the half-normal distance function because of its simplicity, as well as the fact that its standard deviation parameter (τ) is directly interpretable as effective detection radius (EDR) for unlimited, i.e., not truncated, point counts in bird surveys (Sólymos et al. 2013). In the half-normal distance function, detection at a given distance can be modeled as p(d) = exp(-d²/τ²) in which detection declines as object distance (d) from the observer increases, but declines at a slower rate as τ increases. We transformed distance in metres to -d² prior to modeling to linearize the relationship. We used the coefficients for different predictors in the best model to calculate EDR for each species or tone for different vegetation types, human observers, and ARU types. In all models, we set the intercept to zero so that p(d) = 1 at d = 0, and used a complementary log-log link function instead of the usual logit link function for GLMs with a binomial dependent variable, to simplify the estimation of EDR and approximate a log-linear model (Yip et al. 2017). EDR was estimated as τ = (1/β)0.5, where β is the sum of coefficients for the main effect of distance (transformed as -d²) and any interaction effects with -d² (for example: βARU[relative to human observer]+ β -d²+ βHabitat[relative to coniferous forest]). After calculating EDR for the human observer and each ARU type, we then calculated a correction factor for the effective area sampled by each ARU type relative to human observers (A'/A = EDR2ARU'/EDR²human) in each vegetation type. This correction factor can be used to standardize the area parameter for animal density when comparing data from ARUs and human observers.
We performed Monte Carlo simulations to (1) estimate uncertainty in EDR point estimates for each sound, and (2) test for statistical differences between different vegetation types. We generated coefficients (n = 1000) using maximum-likelihood estimates and variance-covariance matrices from the original models to calculate 90% confidence intervals from the predicted values (Appendix 1; Yip et al. 2017). We omitted EDR estimates that (1) failed to solve because of a lack of nondetections in the raw data, or (2) failed to generate confidence intervals because of high uncertainty when predicting from the original model.
We estimated MDD for the same data by selecting the largest distance with a correctly identified detection based on the 95% quantile of positive detections for each species. We estimated MDD separately for ARUs and human observers using the same data for our EDR calculations to compare results from both approaches. After estimating MDD, we calculated the maximum area sampled and correction factors for each ARU type relative to human observers (A'/A = MDD²ARU'/MDD²human) in each vegetation type, using the same method as for calculating correction factors for EDR.
We used known distance data and broadcasts of the same species and tones to explore effects of sound amplitude on detection by ARUs. We conducted the amplitude study from September–October 2014 in the Blackfoot-Cooking Lake Natural Area (53°25' N, 112°49' W) near Edmonton, Alberta from 09:00–16:00 MST. We placed 10 transects in open vegetation (> 75% grass cover, < 5% shrub cover, 0% tree cover) and 10 in denser vegetation (mature deciduous stands composed primarily of trembling aspen with small amounts of balsam poplar [Populus balsamifera] and white spruce).
At each transect we placed a SM2+ ARU in the same setup as the previous experiment and broadcasted songs and tones from a distance of 50, 100, and 150 m away. We broadcast each song or tone at 11 sound pressure levels (a-weighted SPL, a measure of sound pressure relative to the threshold for human hearing) from 40 to 90 dB at 5 dB increments (= 23 songs*11 amplitudes = 253 sounds played at each of the three distances). Each sequence of sounds at each amplitude lasted 1:43 and the full broadcast for all amplitudes was 18:53. For each distance within a transect, we noted temperature, humidity, and wind speed values averaged over the duration of the broadcast using a handheld Kestrel 3000 handheld weather metre (Nielsen-Kellerman Co., Boothwyn, Pennsylvania).
Following field data collection, we used the programs PRAAT © version 5.4 and Adobe Audition © version 5.0 to cut all recordings into separate clips for each call on the recording and labelled calls according to site type (open or closed), site number (1–10), species call/tone, and amplitude. We randomized the clipped files by shuffling them with generic empty clips (containing only ambient background noise). Without knowing the file contents, 4 volunteers trained in avian call detection and recognition listened to and labelled each sound clip by whether or not a call was heard, and if so, of what species.
As in the HPC/ARU study, we used GLMs with intercept set to 0, distance transformed to -d², and a complimentary log-log link function to model whether or not a given song or tone was detected by observers listening to the ARU recordings. For each species or tone, we used a model where additive effects of distance and SPL were the only predictors of detection as a null model, where p(d) declined with distance at the same rate in different habitats and weather conditions, and varied with broadcast amplitude. We compared this null model to five candidate models (Table 2). We followed the same procedure for assessing the relative fit of the above GLMs using AIC, and assessed the goodness-of-fit of the highest ranked or most parsimonious model for each species, using AUC statistics and receiver operating curves as in the HPC/ARU experiment (Table 3).
We used the coefficients for different predictors in the best model to calculate EDR for each species or tone for different vegetation types and SPLs as with the previous experiment. EDR was estimated as τ = (1/β)0.5, where β is the sum of coefficients for the main effect of distance (transformed as -d²) and any interaction effects with -d² (for example: βSPL[45-90 dB in 5-dB increments]+ β-d²+ βOpen habitat[relative to closed habitat]). We estimated uncertainty using the same Monte Carlo method to calculate 90% confidence intervals for our EDR estimates. We did not estimate MDD for our second experiment because of a lack of precision with our distance variables (only three were used).
Detectability declined as distance to sound increased for all species and tones (mean ± SD across all models βx = 1.312x10-5 ± 1.399x10-5; Table 4). Declines in detection rate were greater in both coniferous (mean βconiferous = -0.165 ± 1.066 relative to road) and deciduous (mean βdeciduous = -1.482 ± 1.456 relative to road) vegetation types in comparison to open roadside transects (Fig. 1). Ninety percent confidence intervals for our estimates of EDR from human detection data showed significant differences between roadside and forested detection distance for 18 of 32 sounds (5656Hz, 8000Hz, BAWW, BEKI, BHCO, BLWA, CCSP, DEJU, LISP, OSFL, OVEN, PISI, RBGR, RBNU, TEWA, WAVI, WTSP, YERA; Table 5). We were unable to assess roadside confidence intervals for five sounds (1414Hz, 2828Hz, CMWA, BOOW, NSWO) because of undefined EDR estimates. We found no significant difference in detection distance between coniferous or deciduous vegetation types. ARU type also influenced detectability although this varied depending on the species or tone present. However, detectability was generally higher for human observers relative to ARUs (mean relative to human: βSM2 = -2.108 ± 1.312, βSM3 = -0.963 ± 1.086, βRiverForks = -1.181 ± 1.353, βZoom = -1.643 ± 1.407; Fig. 1). All top performing models included distance, transect type, and ARU type as important predictors (Table 5). The top performing model for 16 species and tones (1000Hz, 1414Hz, 2000Hz, BADO, CMWA, BOOW, CATO, CORA, GGOW, LEOW, NSWO, OSFL, RBGR, RBNU, WETO, WTSP) included humidity which positively influenced detectability for all sounds with the exception of CMWA (mean βhumidity = 0.020 ± 0.013). Three species (CATO, WETO, YERA) had wind in their top performing model which also had a positive influence (mean βwind = 0.191 ± 0.046). Interaction effects between ARU and transect type were part of the top performing model for seven sounds (BADO, BAWW, CMWA, BOOW, GGOW, LEOW, TEWA) indicating that detectability varied with both the type of ARU and the transect the sounds were broadcast through. For these sounds, detectability declines suddenly relative to ARUs as distance increases, particularly in coniferous vegetation types. Mean (± SD) wind speed averaged over the duration of the broadcast sequence at each distance along a transect was 1.1 ± 1.4km/h. Mean temperatures during each broadcast was 23.7 ± 6.5 °C. Relative humidity was 59.0 ± 18.9%. Performance for all models was excellent (AUC: min = 0.9180, max = 0.9659, median = 0.9647; Table 5).
EDR and MDD values showed consistent differences between humans and different ARUs (Mean ± SD EDR for all sounds: Human = 494 ± 233m, SM2 = 421 ± 188m, SM3 = 461 ± 198m, RiverForks = 470 ± 222m, Zoom = 431 ± 183m; MDD: Human = 567 ± 266m, SM2 = 427 ± 235m, SM3 = 485 ± 231m, RiverForks = 516 ± 250m, Zoom = 442 ± 208m; Fig. 2; Appendix 1, 2). Species and tones with lower detection probability (e.g., higher frequency tones, CMWA, BAWW, BLWA, YERA) had smaller EDR values than species with higher detection probability (e.g., lower-frequency tones, RBGR, toads, owls). EDR and MDD values were generally higher along roadsides than in forests (Mean EDR: Roadside = 612 ± 182m, Coniferous = 365 ± 114m, Deciduous = 378 ± 163m; Mean MDD: Roadside = 674 ± 227m, Coniferous = 364 ± 155m, Deciduous = 425 ± 221m). Human observers were consistently able to hear farther than the ARUs and had higher EDR and MDD values. SM2s had the lowest EDR and MDD values (mean ratios across all sounds: EDRSM2/EDRHuman = 0.789 ± 0.624; MDDSM2/MDDHuman = 0.558 ± 0.212; Figure 2). RiverForks (EDRRiverForks/EDRHuman = 0.940 ± 0.680; MDDRiverForks/MDDHuman = 0.861 ± 0.354) and SM3 (EDRSM3/EDRHuman = 0.897 ± 0.137; MDDSM3/MDDHuman = 0.770 ± 0.259) had the most similar detection distance relative to humans. These ratios increased at higher sound frequencies for the SM2 and RiverForks but decreased with Zoom recorders (Fig. 2; Appendix 1, 2). Thus, SM2s require larger correction factors (=[EDRSM2/EDRHuman]-1) than other types of ARUs relative to humans.
For all species and tones in the sound amplitude study, detection probability declined with increasing distance (mean ± SD across all models βx = 2.913x10-4 ± 1.476x10-4; Table 6) and decreasing sound amplitude (mean βSPL = 0.183 ± 0.037). Probability of detection at a given distance was higher in open vegetation than in closed vegetation (mean βOpenHabitat = 1.983 ± 0.899, relative to closed habitat). The best model predicting detection of each species or tone generally included distance, vegetation type, and amplitude (Table 3). Three sounds (1414Hz, LEOW, YERA) included wind in their top performing model, two sounds (4000Hz, WETO) included humidity, and one sound (CMWA) included both wind and humidity. Wind negatively influenced detectability (mean βWind = -0.168 ± 0.076) for all four sounds while humidity had a positive effect for CMWA (βHumidity = 0.023) and WETO (βHumidity = 0.042) but negative for 4000Hz (βHumidity = -0.036). Mean wind speed averaged over the duration of the broadcast sequence at each distance along a transect was 4.0 ± 2.8km/h. Mean temperatures during each broadcast was 15.2 ± 6.4oC. Relative humidity was 50.5 ± 14.2%. Performance for all models was excellent (AUC: min = 0.8705, max = 0.9836, median = 0.9495; Table 3).
As in the human-ARU comparison study, species with relatively low detection probability (e.g., BAWW, CMWA) had smaller EDR values than species with relatively high detection probability (e.g., owls; Appendix 3). EDR values were generally higher in open vegetation than closed vegetation and increased as sound amplitude increased. When sounds were pooled into one general model, we found no significant interaction effects between SPL and the type of sound (i.e., species or tone) indicating a consistent positive relationship between EDR and SPL for all sounds broadcasted (Fig. 3). Many EDR values were undefined at higher broadcast SPL in open vegetation because of an inadequate number of nondetections. For EDR to be defined, nondetections must occur at the furthest distances, which did not occur at higher sound amplitudes.
Detectability of avian vocalizations can be influenced by the surrounding environment (Darras et al. 2016, Yip et al. 2017) and by the methods used to record and identify observations (Haselmayer and Quinn 2000). We compared detection distances of different ARUs as well as human observers in the field and found differences in detectability depending on which method was used. Using the ARU-human comparison calculated here, we conclude that ARU data can be integrated with HPC datasets into larger analyses to increase the scope of inferences made about birds (Cumming et al. 2010). For example, EDR has been estimated for over 100 species by the Boreal Avian Modelling Project (hereafter BAM; http://www.borealbirds.ca/) using human-based distance estimation. Similarly, MDD for all North American species have been agreed upon by Partners in Flight (hereafter PIF; Rosenberg and Blancher 2005). For example, BAM estimates EDR for BAWW to be 50.1 m and PIF uses a MDD value of 100 m (PIF Science Committee 2013). Thus, for surveys in deciduous forest using an SM2 wildlife recorder, the EDR correction factor calculated from our study would be 0.757 and the MDD correction factor 0.779 (Appendix 1, 2). The corrected EDR would then be 37.9 m and corrected MDD would be 77.9 m for counts done using an SM2 in similar habitat. Ornithologists can directly compare density estimates from HPC and ARU data after standardizing both data types using this technique, enabling organizations like BAM or PIF to augment their existing HPC data with ARU data.
Human field observers had the highest detectability and detection distances in comparison to recordings from the SM2s, SM3s, RiverForks, and Zoom recorders. SM2s had the lowest detectability and detection distances followed by Zoom recorders, RiverForks, and SM3s. The use of ARUs to record animals introduces additional static, white noise, and electronic interference during the detection process of avian vocalizations, likely contributing to the patterns of decreasing detectability from recordings. However, we presented observers with a limited variety of species and sounds and in the first experiment, observers knew the order that the sounds would be occurring. When sounds are unpredictable and there is uncertainty about what species may be present, detections from recordings will likely increase relative to field surveys from humans because of the opportunity to double check observations in a lab-based environment.
Probability of detecting species declined more rapidly with increasing distance in closed vegetation than in open vegetation in both of our experiments (first experiment: roadside vs forest, second experiment: open grassland vs closed forest). These results are consistent with previously documented differences in detection between vegetation types (Schieck 1997, Pacifici et al. 2008). However, we observed differences in the effect of weather variables between experiments, which may have been due to the distance over which the experiments occurred. Weather effects were influential for sounds with larger EDR values (17/32 sounds; Table 5) in our first experiment as in Holland (2001) and Simons et al. (2007), but were not as prevalent in our second experiment (6/32 sounds; Table 3). In our second experiment, broadcasts only occurred to a maximum of 150 m, meaning weather variables may not have as much distance over which to act on broadcasted signals, suggesting there may be an interaction between weather conditions, distance, and sound transmission. Humidity had a consistently positive effect on detectability except for one species (CMWA) in our first experiment and one tone (4000Hz) in our second. However, the relationship between wind and detectability differed between the first (positive relationship) and second (negative relationship) experiment although wind was not included in many of our top performing models. We did not record the direction of the wind relative to the direction of our broadcasts, which may have contributed to this pattern. We also recorded higher but more consistent wind speeds in our second experiment relative to the first. A more limited range of wind speeds in the second experiment may be the reason wind was not included in those models as often. Knowing how factors like weather influences the area sampled is crucial to converting counts from ARUs and humans to accurate density estimates and is an area that we argue needs more work.
We found that EDR was consistently, positively correlated with broadcast SPL regardless of species (Fig. 3). This is important for two reasons. First, we broadcast sounds at 90 dB, which we believe to be the upper range of amplitudes that birds might vocalize at (Brumm 2004, Patricelli et al. 2007). We also had our speaker oriented directly at the receiver, which may result in unrealistic and overestimated EDRs. However, the importance of this study lies in the relative difference in EDR between treatments, which should remain the same regardless of SPL. Given that EDR increased consistently with SPL for all species (Fig. 3), we believe singing volume could be estimated for real birds using predictions from our EDR models, corrections factors, and applying our model predictions to EDRs from BAM’s human based estimates of EDR, albeit with varying degrees of uncertainty depending on model performance. This would also be under the assumption that EDRs estimated from BAM were calculated under similar conditions and that human observers estimate EDR accurately. It is not clear how accurate EDR measurements are by humans and our results show the importance of environmental variables such as the openness of the surrounding environment. Although our best performing models suggest that EDR increases consistently with SPL for most sounds, there were outlier sounds (BADO, LEOW) where EDR increased differently relative to the general trend (Fig. 3), possibly because of uncertainty in our EDR estimates.
The second reason that the consistent response of EDR to SPL is important is that it may provide a simpler way to calibrate ARUs to humans and each other. More recorder models are becoming available and the ones currently in use are routinely being updated with newer models, which have different gain settings, sensitivities, and residual electronic noise. All of these factors influence the area sampled for birds relative to humans and other ARUs (Rempel et al. 2005). Sound frequency acted differently on each recorder suggesting that microphone frequency response plays a role in detectability. Detectability decreased and differences in EDR and resulting correction factors increased with frequency for SM2s while the opposite was observed with SM3s and Zoom recorders (Fig. 2). The method we used to compare EDR between various recorders and human observers in our first experiment provided high resolution information on relative differences in detection distance, but was time consuming to carry out. We argue that, in the future, we could calibrate EDR at different amplitudes for multiple brands of ARUs using relatively few distances as in our second experiment because EDR decreased consistently for most sounds as SPL declines and would be comparable to the relative difference in EDR at 90 dB. This would allow researchers to calculate a correction factor more quickly based on the relative difference.
Our results provide further evidence supporting conclusions of previous researchers (Haselmayer and Quinn 2000, Hobson et al. 2002, Celis-Murillo et al. 2009) that the counts derived from both ARUs and human observers are relatively comparable. However, our study tested detectability under relatively controlled conditions through broadcasts and with a limited variety of species and sounds. The results found in this study may differ when field observers must identify overlapping vocalizations, unfamiliar species, or sounds in acoustically busy sampling periods that would likely have a larger influence on detectability than with ARUs. Although human observers appeared to generally detect more of the broadcasted sounds than different ARUs (particularly the SM2+), EDR and effective area sampled by some ARUs was comparable to that for human observers for some species. Furthermore, differences between recorders should be irrelevant if we can standardize data from different sources by offsetting varying detection distances and areas of ARUs. Influences of weather on EDR can be controlled to an extent by survey protocol (e.g., survey only when wind is < 2 on the Beaufort scale, when there is no rain, etc.) and corrections for variables such as vegetation/habitat type can be calculated separately (Yip et al. 2017) and applied in conjunction with corrections calculated in this study.
Although we demonstrate that simultaneous comparisons of HPC and ARU data potentially enable the calculation of EDR and densities of birds from ARU recordings, this approach still relies on accurate distance estimation during HPCs, an assumption that is frequently violated during avian surveys (Alldredge et al. 2007, Nadeau and Conway 2012). Errors in distance estimation can bias EDR and bird density calculations and will persist when using our correction approach for ARU data. There are also factors unrelated to distance estimation that should also be considered before collating these two types of point counts for the same analysis. First, some detections in HPC may be only visual, particularly of rare or of quiet species that are unavailable to ARUs, or rarely vocalizing species that are unlikely to be detected in short-duration recordings (Haselmayer and Quinn 2000, Hutto and Stutzman 2009). Second, because ARUs provide a permanent record for review, there may be a negative bias associated with species detection in HPC relative to ARU recordings because people listening to ARU data can relisten to a sound (Tegeler et al. 2012). This bias could be modeled as observer effects. Calibration of ARUs should also be an important part of the permanent record. Microphone sensitivity can decrease with use (Turgeon et al. 2017) and influence the area surveyed. Microphone quality should be checked regularly to ensure minimal variation in detection distance within recorder models. Variation in detectability between observers can be large and influence results in both HPC and from ARU recordings in part because of differences in hearing ability and experience identifying species (Sauer et al. 1994). Observer variation within ARU point counts is likely lower than HPC as a permanent record allows multiple observers to process recordings and double check unknown species. Our study should minimize interobserver variability because observers were presented with a limited number of sounds that they could review prior to the experiment. Observers were also composed of males and females between the ages of 18 and 28 who are more likely to have similar hearing levels (Emlen and DeJong 1992).
Our objectives were to investigate relative differences between ARUs and HPC. We provide methods for standardizing and correcting detection distances to derive avian densities from ARUs by accounting for differences in the area surveyed through each method. We used the ecosystems presented in this study as a case study to demonstrate application of this method, however these methods can be applied to other habitat types to broaden their use. This approach to density estimation would be more logistically feasible and affordable than studies using microphone arrays to obtain density (Efford et al. 2009). Integration of data from ARUs and HPCs could allow for larger meta-analyses to make environmental inferences about interactions between birds and the environment at larger spatial scales (Cumming et al. 2010).
We would like to acknowledge the financial support of the Alberta Biodiversity Monitoring Institute, Alberta Conservation Association, Northern Scientific Training Program, Canadian Circumpolar Institute, Ecological Monitoring Committee for the Lower Athabasca, and Joint Oil Sands Monitoring Program. Daniel Yip was supported by an Industrial Postgraduate Scholarship from the Natural Sciences and Engineering Research Council of Canada and Suncor Energy. The authors have no potential conflicts of interests to declare. The hard work of Logan McLeod, Cassandra Hardie, Natasha Annich, Elizabeth Beck, and Ricky Kong in the field is greatly appreciated.
Alldredge, M. W., T. R. Simons, and K. H. Pollock. 2007. A field evaluation of distance measurement error in auditory avian point count surveys. Journal of Wildlife Management 71:2759-2766. http://dx.doi.org/10.2193/2006-161
Arnold, T. W. 2010. Uninformative parameters and model selection using Akaike’s Information Criterion. Journal of Wildlife Management 74:1175-1178. http://dx.doi.org/10.1111/j.1937-2817.2010.tb01236.x
Brumm, H. 2004. The impact of environmental noise on song amplitude in a territorial bird. Journal of Animal Ecology 73:434-440. http://dx.doi.org/10.1111/j.0021-8790.2004.00814.x
Buckland, S. T., D. R. Anderson, K. P. Burnham, and J. L. Laake. 1993. Distance sampling: estimating abundance of biological populations. Chapman & Hall, London, UK.
Burnham, K. P. and D. R. Anderson. 2002. Model selection and multi-model inference: a practical information-theoretic approach. Second edition. Springer-Verlag, New York, New York, USA.
Celis-Murillo, A., J. L. Deppe, and M. F. Allen. 2009. Using soundscape recordings to estimate bird species abundance, richness, and composition. Journal of Field Ornithology 80:64-78. http://dx.doi.org/10.1111/j.1557-9263.2009.00206.x
Cumming, S. G., K. Lefevre, E. Bayne, T. Fontaine, F. K. A. Schmiegelow, and S. J. Song. 2010. Toward conservation of Canada’s boreal forest avifauna: design and application of ecological models at continental extents. Avian Conservation and Ecology 5(2):8 http://dx.doi.org/10.5751/ace-00406-050208
Darras, K., P. Pütz, Fahrurrozi, K. Rembold, and T. Tscharntke. 2016. Measuring sound detection spaces for acoustic animal sampling and monitoring. Biological Conservation 201:29-37. http://dx.doi.org/10.1016/j.biocon.2016.06.021
Dawson, D. K. and M. G. Efford. 2009. Bird population density estimated from acoustic signals. Journal of Applied Ecology 46:1201-1209. http://dx.doi.org/10.1111/j.1365-2664.2009.01731.x
Efford, M. G., D. K. Dawson, and D. L. Borchers. 2009. Population density estimated from locations of individuals on a passive detector array. Ecology 90:2676-2682. http://dx.doi.org/10.1890/08-1735.1
Emlen, J. T., and M. J. DeJong. 1981. The application of song detection threshold distance to census operations. Studies in Avian Biology 6:346-352.
Emlen, J. T., and M. J. DeJong. 1992. Counting birds: the problem of variable hearing abilities. Journal of Field Ornithology 63:26-31.
Farnsworth, G. L., K. H. Pollock, J. D. Nichols, T. R. Simons, J. E. Hines, and J. R. Sauer. 2002. A removal model for estimating detection probabilities from point-count surveys. Auk 119:414-425. http://dx.doi.org/10.1642/0004-8038(2002)119[0414:ARMFED]2.0.CO;2
Haselmayer, J., and J. S. Quinn. 2000. A comparison of point counts and sound recording as bird survey methods in Amazonian southeast Peru. Condor 102:887-893. http://dx.doi.org/10.1650/0010-5422(2000)102[0887:ACOPCA]2.0.CO;2
Helzner, E. P., J. A. Cauley, S. R. Pratt, S. R. Wisniewski, J. M. Zmuda, E. O. Talbott, N. de Rekeneire, T. B. Harris, S. M. Rubin, E. M. Simonsick, F. A. Tylavsky, and A. B. Newman. 2005. Race and sex differences in age-related hearing loss: the health, aging and body composition study. Journal of the American Geriatrics Society 53:2119-2127. http://dx.doi.org/10.1111/j.1532-5415.2005.00525.x
Hobson, K. A., R. S. Rempel, H. Greenwood, B. Turnbull, and S. L. Van Wilgenburg. 2002. Acoustic surveys of birds using electronic recordings: new potential from an omnidirectional microphone system. Wildlife Society Bulletin 30:709-720.
Holland, K. R. 2001. Principles of sound radiation. Pages 1-43 in J. Borwick, editor. Loudspeaker and headphone handbook. Third edition. Focal Press, Oxford, UK.
Hutto, R. L., S. M. Pletschet, and P. Hendricks. 1986. A fixed-radius point count method for nonbreeding and breeding season use. Auk 103:593-602.
Hutto, R. L., and R. J. Stutzman. 2009. Humans versus autonomous recording units: a comparison of point-count results. Journal of Field Ornithology 80:387-398. http://dx.doi.org/10.1111/j.1557-9263.2009.00245.x
Kéry, M., J. A. Royle, and H. Schmid. 2005. Modeling avian abundance from replicated counts using binomial mixture models. Ecological Applications 15:1450-1461. http://dx.doi.org/10.1890/04-1120
Matsuoka, S. M., C. L. Mahon, C. M. Handel, P. Sólymos, E. M. Bayne, P. C. Fontaine, and C. J. Ralph. 2014. Reviving common standards in point-count surveys for broad inference across studies. Condor 116:599-608. http://dx.doi.org/10.1650/CONDOR-14-108.1
Nadeau, C. P., and C. J. Conway. 2012. Field evaluation of distance-estimation error during wetland-dependent bird surveys. Wildlife Research 39:311-320. http://dx.doi.org/10.1071/WR11161
Pacifici, K., T. R. Simons, and K. H. Pollock. 2008. Effects of vegetation and background noise on the detection process in auditory avian point-count surveys. Auk 125:600-607. http://dx.doi.org/10.1525/auk.2008.07078
Padgham, M. 2004. Reverberation and frequency attenuation in forests-implications for acoustic communication in animals. Journal of the Acoustical Society of America 115:402. http://dx.doi.org/10.1121/1.1629304
Partners in Flight (PIF) Science Committee. 2013. Population estimates database, version 2013. [online] URL: http://rmbo.org/pifpopestimates
Patricelli, G. L., M. S. Dantzker, and J. W. Bradburry. 2007. Differences in acoustic directionality among vocalizations of the male Red-winged Blackbird (Agelaius pheoniceus) are related to function in communication. Behavioral Ecology and Sociobiology 61:1099-1110. http://dx.doi.org/10.1007/s00265-006-0343-5
Pearson, J. D., C. H. Morrell. S. Gordon-Salant, L. J. Brant, E. J. Metter, L. L. Klein, and J. L. Fozard. 1995. Gender differences in a longitudinal study of age-associated hearing loss. Journal of the Acoustical Society of America 97:1196-1205. http://dx.doi.org/10.1121/1.412231
Petit, D. R., L. J. Petit, V. A. Saab, and T. E. Martin. 1995. Fixed-radius point counts in forests: factors influencing effectiveness and efficiency. Pages 49-56 in C. J. Ralph, J. R. Sauer, and S. Droege, editors. Monitoring bird populations by point counts. General Technical Report PSW-GTR-149. U.S. Forest Service, Pacific Southwest Research Station, Albany, California, USA.
R Core Team. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [online] URL: http://www.R-project.org/
Ralph, C. J., J. R. Sauer, and S. Droege. 1995. Monitoring bird populations by point counts. General Technical Report PSW-GTR-149. U.S. Forest Service, Pacific Southwest Research Station, Albany, California, USA. http://dx.doi.org/10.2737/psw-gtr-149
Rempel, R. S., K. A. Hobson, G. Holborn, S. L. Van Wilgenburg, and J. Elliott. 2005. Bioacoustic monitoring of forest songbirds: interpreter variability and effects of configuration and digital processing methods in the laboratory. Journal of Field Ornithology 76:1-11. http://dx.doi.org/10.1648/0273-8570-76.1.1
Robin, X., N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, and M. Müller. 2011. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12:77. http://dx.doi.org/10.1186/1471-2105-12-77
Rosenberg, K. V., and P. J. Blancher. 2005. Setting numerical population objectives for priority landbird species. U.S. Forest Service General Technical Report PSW-GTR-191. U.S. Forest Service, Pacific Southwest Research Station, Albany, California, USA.
Royle, J. A., and J. D. Nichols. 2003. Estimating abundance from repeated presence-absence data or point counts. Ecology 84:777-790. http://dx.doi.org/10.1890/0012-9658(2003)084[0777:EAFRPA]2.0.CO;2
Sauer, J. R., B. G. Peterjohn, and W. A. Link. 1994. Observer differences in the North American breeding bird survey. Auk 111:50-62. http://dx.doi.org/10.2307/4088504
Schieck, J. 1997. Biased detection of bird vocalizations affects comparisons of bird abundance among forested habitats. Condor 99:179-190. http://dx.doi.org/10.2307/1370236
Simons, T. R., M. W. Alldredge, K. H. Pollock, and J. M. Wettroth. 2007. Experimental analysis of the auditory detection process on avian point counts. Auk 124:986-999. http://dx.doi.org/10.1642/0004-8038(2007)124[986:EAOTAD]2.0.CO;2
Sólymos, P., S. M. Matsuoka, E. M. Bayne, S. R. Lele, P. Fontaine, S. G. Cumming, D. Stralberg, F. K. A. Schmiegelow, and S. J. Song. 2013. Calibrating indices of avian density from non-standardized survey data: making the most of a messy situation. Methods in Ecology and Evolution 4:1047-1058. http://dx.doi.org/10.1111/2041-210X.12106
Tarrero, A. I., M. A. Martin, J. González, M. Machimbarrena, and F. Jacobsen. 2008. Sound propagation in forests: a comparison of experimental results and values predicted by the Nord 2000 model. Applied Acoustics 69:662-671. http://dx.doi.org/10.1016/j.apacoust.2007.01.007
Tegeler, A. K., M. L. Morrison, and J. M. Szewczak. 2012. Using extended-duration audio recordings to survey avian species. Wildlife Society Bulletin 36:21-29. http://dx.doi.org/10.1002/wsb.112
Thogmartin, W. E., F. P. Howe, F. C. James, D. H. Johnson, E. T. Reed, J. R. Sauer, and F. R. Thompson III. 2006. A review of the population estimation approach of the North American landbird conservation plan. Auk 123:892-904. http://dx.doi.org/10.1642/0004-8038(2006)123[892:AROTPE]2.0.CO;2
Turgeon, P. J., S. L. Van Wilgenburg, and K. L. Drake. 2017. Microphone variability and degradation: implications for monitoring programs employing autonomous recording units. Avian Conservation and Ecology 12(1):9. http://dx.doi.org/10.5751/ace-00958-120109
Vanagas, G. 2004. Receiver operating characteristic curves and comparison of cardiac surgery risk stratification systems. Interactive Cardiovascular and Thoracic Surgery 3:319-322. http://dx.doi.org/10.1016/j.icvts.2004.01.008
Yip, D. A., E. M. Bayne, P. Sólymos, J. Campbell, and D. Proppe. 2017. Sound attenuation in forested and roadside environments: implications for avian point count surveys. Condor 119:73-84. http://dx.doi.org/10.1650/CONDOR-16-93.1