Experimentally derived detection distances from audio recordings and human observers enable integrated analysis of point count data

Point counts are one of the most commonly used methods for assessing bird abundance. Autonomous recording units (ARUs) are increasingly being used as a replacement for human-based point counts. Previous studies have compared the relative benefits of human versus ARU-based point count methods, primarily with the goal of understanding differences in species richness and the abundance of individuals over an unlimited distance. What has not been done is an evaluation of how to standardize these two types of data so that they can be compared in the same analysis, especially when there are differences in the area sampled. We compared detection distances between human observers in the field and four commercially available recording devices (Wildlife Acoustics SM2, SM3, RiverForks, and Zoom H1) by simulating vocalizations of various avian species at different distances and amplitudes. We also investigated the relationship between sound amplitude and detection to simplify ARU calibration. We used these data to calculate correction factors that can be used to standardize detection distances of ARUs relative to each other and human observers. In general, humans in the field could detect sounds at greater distances than an ARU although detectability varied depending on species song characteristics. We provide correction factors for four commonly used ARUs and propose methods for calibrating ARUs relative to each other and human observers. Dérivation expérimentale de distances de détection d'enregistrements audio et d'observateurs humains permettant l'analyse intégrée de points d'écoute RÉSUMÉ. Les points d'écoute sont une des méthodes les plus courantes pour évaluer l'abondance d'oiseaux. Les unités d'enregistrement autonomes (ARU, pour autonomus recording units) sont de plus en plus utilisées pour remplacer les points d'écoute réalisés par des observateurs. Les études antérieures ont comparé les avantages relatifs des dénombrements par point d'écoute faits par des observateurs comparativement à ceux réalisés au moyen d'ARU, principalement pour évaluer les différences de richesse spécifique et d'abondance sur une distance illimitée. Ce qui n'a pas été testé toutefois est comment standardiser ces deux types de données de façon à ce qu'elles soient comparables dans une même analyse, particulièrement lorsqu'il y a des différences d'aire échantillonnée. Nous avons comparé la distance de détection entre des observateurs sur le terrain et quatre enregistreurs commerciaux (Wildlife Acoustics SM2, SM3, RiverForks et Zoom H1), en simulant les vocalisations de diverses espèces aviaires à des distances et des amplitudes variées. Nous avons aussi exploré la relation entre l'amplitude du son et la détectabilité dans le but de simplifier la calibration d'ARU. Nous avons utilisé ces données afin de calculer des facteurs de correction servant à standardiser les distances de détection des ARU entre eux et avec les observateurs. En général, les observateurs sur le terrain pouvaient détecter des sons à des distances plus grandes que ne le faisaient les ARU, quoique la détectabilité variait selon les caractéristiques du chant des espèces. Nous fournissons des facteurs de correction pour quatre ARU couramment utilisés et proposons une méthode pour calibrer les ARU entre eux et avec les observateurs.


INTRODUCTION
There is growing interest in combining data from multiple point count studies to draw inferences about environmental processes influencing birds at larger spatial and temporal scales than the original studies intended (Cumming et al. 2010).Traditionally, human observers have collected point count data (hereafter HPC) by identifying species using acoustic and visual cues while following standardized protocols (Ralph et al. 1995).However, many differences exist between HPC studies in the point count methods used, i.e., duration of count, fixed or unlimited distance counts (Matsuoka et al. 2014).As well, concerns about human observers not detecting species that are present during a single visit have led to calls for replicating effort at the same locations (Royle andNichols 2003, Kéry et al. 2005).The use of repeated point counts at the same location within a season to account for varying detection probability among visits has increased interest in the use of autonomous recording units (ARUs; Haselmayer andQuinn 2000, Hobson et al. 2002).
A major benefit of ARUs is that humans only visit each location twice and spend time only deploying and picking up the ARU.The ARU itself can record over an extended period and create an almost unlimited number of repeated surveys of virtually any duration (Haselmayer andQuinn 2000, Hobson et al. 2002).Human observers are more likely to detect some species visually, which can increase the odds of detection, although visual detection area is likely much smaller than aural detection area in Address of Correspondent: Daniel A. Yip, CW405 Biological Sciences Building, University Of Alberta, Edmonton, AB , Canada, T6G 2E9, dayip@ualberta.caAvian Conservation and Ecology 12(1): 11 http://www.ace-eco.org/vol12/iss1/art11/heavily vegetated environments (Haselmayer andQuinn 2000, Hutto andStutzman 2009).Human observers can also estimate distances to individual birds to enable the use of a bounded point count radius and/or distance-based density estimation (Buckland et al. 1993).The relative importance of being able to costeffectively conduct repeated visits via ARUs versus estimate distance via HPC is unclear in terms of accuracy and precision when assessing trend and status of birds.Regardless, to make the best use of point count data, ornithologists need to evaluate ways to standardize HPC and ARU data to use both data types in the same analyses.
To accurately use data from different point count datasets, ornithologists have converted counts to a common standard, which is typically density (Sólymos et al. 2013).Estimating density of birds using point counts requires the following: (1) accounting for individuals that are available to be detected but do not vocalize or are not seen (Farnsworth et al. 2002, Dawson andEfford 2009); and (2) accounting for declining detection of more distant individuals (Buckland et al. 1993).Removal sampling can address the problem of animal availability based on multiple time intervals that can exist for both HPC and ARU data.However, the second problem of correcting for the area sampled and the distance over which birds are counted is more fundamental.Sound travels different distances depending on the vegetation and atmospheric conditions occurring between the signaller and the receiver (Holland 2001, Padgham 2004, Simons et al. 2007, Pacifici et al. 2008, Tarrero et al. 2008).Detectability can also vary between observers depending on factors such as age, sex, and experience (Pearson et al. 1995, Helzner et al. 2005).To compare the observed number of bird detections between point counts in two separate studies or in two separate vegetation types within the same study, ornithologists should account for the distance travelled by bird song and effective area sampled (Yip et al. 2017).Otherwise, biases in our understanding of habitat selection, population status, and temporal trend may occur if environmental conditions influencing sound transmission significantly differ between sites and times.
There are three main approaches for calculating the area over which bird sounds are detected and thus converted to density: (1) fixed-distance point counts, hereafter FIXED (Hutto et al. 1986, Petit et al. 1995); (2) maximum detected distance (MDD) at which a given species can be detected (Emlen andDeJong 1981, Rosenberg andBlancher 2005); or (3) effective detection radius (EDR) based on distance-sampling methods (Buckland et al. 1993).The FIXED approach does not seem to be possible for ARU-based point counts because signal strength from a species, and hence accuracy of distance estimation will differ because of sound absorption and reflectance varying among environmental conditions (Petit et al. 1995, Padgham 2004, Pacifici et al. 2008).
In addition, such approaches discard a lot of useful data on birds that are detected past the fixed distance.In contrast, ornithologists can calculate MDD and EDR for a species from ARU-based data if (1) there are known distances to recordings of birds, and ( 2) if there is some simultaneously collected distance data from HPC for which MDD or EDR and ARUs can be compared and calibrated.Partners in Flight has used MDD to estimate population sizes (Rosenberg and Blancher 2005), but the Partners in Flight approach to estimating MDD is coarse and does not consider vegetation or atmospheric effects that influence MDD, leading to concerns about this approach when calculating density (Thogmartin et al. 2006).EDR accounts for the decline in detectability as the distance from an observer increases, but like the FIXED approach, EDR varies among species and environmental conditions, and reliable EDR estimates depend on well-trained field observers, accurate distance estimation, and point count methods meeting assumptions of distance sampling (Buckland et al. 1993).
Understanding how microphone and recording settings influence the area sampled for birds is crucial to ensuring that long-term monitoring and comparisons made between studies are valid using ARU techniques.Research programs and monitoring agencies have different preferences, goals, and budgets, which influences the type of ARU they decide to use and these must be calibrated to account for differences in area sampled if results are to be compared.Availability of different ARUs also changes over time as ARUs are continuously improved.
Our approach to comparing ARU models and how far they detect birds relative to human observers relies on using song broadcasts of known amplitude and distance.Distance-based broadcasts whereby a sound is played at varying distances from the observer or recorder are labor-intensive.A potential alternative could involve using a relatively limited number of distances when conducting broadcast trials but varying the volume (amplitude) of the broadcast speaker between ambient background levels and the upper range that birds are known to sing.Quantifying the relationship between amplitude and distance of different species for different ARUs could be a cost-effective way of ensuring that all ARUs are calibrated to a known and documented standard.
Although the true relationship between amplitude and distance is unknown, this approach effectively identifies relative differences among ARUs.
We had three objectives.First, we developed and tested two field broadcast and modeling methods to evaluate how detection of birds is influenced by distance, ARU type, amplitude, and environmental variables relative to HPC.We did this by broadcasting sounds with varying frequencies and under different vegetation conditions over a range of distances.We then tested which sounds were detected by HPC in the field and when listening to ARU recordings in the lab.Second, we used known principles of sound physics to estimate EDR and MDD for various species.Third, we provided an approach for standardizing HPC and ARU data in the same analysis by creating generalized correction factors and a simple approach to calibration that can be used to standardize raw counts to density regardless of the method of sampling.

Study area
We collected data near Calling Lake (55°11' N, 113°12' W) and Lac la Biche, Alberta (54°38' N, 111°58' W) in August 2014.We conducted our surveys in August to reduce the chance of confusing broadcasted sounds (see below) with the songs of real http://www.ace-eco.org/vol12/iss1/art11/birds.Broadcasts took place between 07:00-20:00 MST.We recorded broadcasted sounds that we used in our study at a total of 20 sites using ARUs (10 road sites, 5 coniferous forests, and 5 deciduous forests).Coniferous sites consisted primarily of white spruce (Picea glauca) while deciduous sites consisted primarily of trembling aspen (Populus tremuloides).Road sites occurred on flat, low-use forestry roads composed of gravel and clay.At a subset of the 20 sites (8 road, 4 coniferous, and 4 deciduous), observers stood adjacent to the ARUs and indicated which broadcasted sounds they were able to detect.

Data collection
At each site, we broadcasted known sounds from varying distances (see below) and evaluated whether or not a human observer could detect them.At the same time and location, we also recorded the broadcast sounds on four types of ARUs.All recordings made by the ARUs used 2-channel stereo recordings at 44 kHz and 16-bit .wavformat.The four ARUs were (1) Wildlife Acoustics' SongMeter SM2+ GPS-enabled recording units equipped with SMX-II weatherproof microphones (5 units); ( 2) Wildlife Acoustics' SM3 ARUs (5 units); (3) RiverForks CZM recorders (2 units); and (4) Zoom H1 handheld recorders (3 units).
We broadcasted sounds with an Alpine ® SPR-60, 6-1/2" car speaker/tweeter and an Alpine ® UTE-42BT car stereo/audio player (Gentec Int'l, Markham, Ontario), both installed into an 11" (width) x 10" (depth) x 15" (height) plywood speaker box, along a transect from 12 to 1312m.We placed the speaker at 25m intervals for the first 400 m, 50-m intervals between 400-800m, and 100-m intervals for broadcasts beyond 800 m.The same sequence of calls was broadcast at each distance.On forested transects where the ARU was not visible from the transmitting unit, we used a GPS and compass to properly align the speaker toward the ARU.
The broadcasted sequence began with a series of 7 pure tones (at frequencies of 1000Hz, 1414Hz, 2000Hz, 2828Hz, 4000Hz, 5656Hz, and 8000Hz) generated using Adobe Audition CS6.We selected these species for a variety of song characteristics that may affect probability of detection (pitch, song length).All sounds were normalized in Audition to bring peak amplitude to a standardized level.We broadcasted sounds at 90 dB, which we measured 1 m from the speaker system (based on fast-time Aweighting) using a handheld sound meter (Sper Scientific 840018).
At each transect, we attached each of the 4 ARU types to a tree or post at a height of 1.5 m.This was the same height as the speaker broadcasting the recordings at the starting point of the transect and we chose transects with minimal elevational change.
For each point along a transect, we recorded the time of the broadcast and distance of the broadcast speaker from the ARUs and human observer using a GPS (± 3 m).We also measured temperature, humidity, and wind speed during each broadcast using a Kestrel 3000 pocket weather meter.Following the end of the broadcasted sequence, the first observer moved the speaker an additional 25 m along the transect and the process was repeated.
We clipped recordings into individual files for each distance from each type of ARU.Observers in the lab listened to these files at standardized volume levels and noted which species and tones they could identify and detect for each distance and each type of ARU recording.For this experiment, observers in the lab listened to tones and songs in the recordings in the original sequence that the tones and songs were broadcast to make things directly comparable to the HPC.For pure tones, observers only had to identify that a tone was present, not what frequency was broadcast.Using this method, we generated a large dataset of detections or nondetections from sounds that were known to have occurred (n = 96,502).During the HPC, the observer in the field recorded whether they could hear and correctly identify each sound as it was broadcast in sequence.

Modeling sources of variation influencing detection of sounds
We divided data randomly into 70% training data (n = 1898 for each species or tone, without replacement) for model development and 30% test data (n = 813 for each species or tone) for model validation (sample function, R [R Core Team 2013]).We assessed the detection/nondetection of each species or tone using generalized linear models (glm function, R [R Core Team 2013]) with a binomial error family.All models included distance as a predictor of whether a tone or song was detected.We used a model where distance was the only predictor as a null model, where p(d) declined with distance at the same rate in different habitats, in different weather conditions, and for human observers versus different ARU brands.We compared this null model to 11 candidate models (Table 1).For the weather models, we had considered temperature as well, but dropped that variable because it was positively correlated with humidity.
We used Akaike's Information Criterion to rank the relative fit of models (Burnham andAnderson 2002, Arnold 2010).To assess the absolute model fit or goodness-of-fit of the top AIC-ranked model, we used the area-under-the-curve (AUC) within receiveroperator curves for each species as a test statistic (roc function, pROC package, R [Robin et al. 2011]).AUC measures the proportion of actual detections and nondetections that were correctly predicted by the best model as opposed to false negatives or positives.We calculated AUC for the test data set excluded from model generation.We rated models with AUC > 0.70 as having sufficient ability to correctly predict if a song or tone was or was not detected (Vanagas 2004).

Estimating effective detection radius for different sounds
EDR gives the radius of the circle where the expected number of available individuals not detected within the distance equals the expected number of the detected individuals outside of that distance (Buckland et al. 1993).We estimated EDR for our Calling Lake dataset with a separate set of models rather than the set used for modeling detectability.The shape of the distance function describes how detection probability attenuates as a function of broadcast speaker distance (d) from the ARUs and human observer.The distance function is a strictly monotonic decreasing function with increasing distance.There are many different mathematical formulations to describe this shape, however we chose the half-normal distance function because of its simplicity, as well as the fact that its standard deviation parameter (τ) is directly interpretable as effective detection radius (EDR) for unlimited, i.e., not truncated, point counts in bird surveys (Sólymos et al. 2013).In the half-normal distance function, detection at a given distance can be modeled as p(d) = exp(-d²/τ²) in which detection declines as object distance (d) from the observer increases, but declines at a slower rate as τ increases.We transformed distance in metres to -d² prior to modeling to linearize the relationship.We used the coefficients for different predictors in the best model to calculate EDR for each species or tone for different vegetation types, human observers, and ARU types.In all models, we set the intercept to zero so that p(d) = 1 at d = 0, and used a complementary log-log link function instead of the usual logit link function for GLMs with a binomial dependent variable, to simplify the estimation of EDR and approximate a log-linear model (Yip et al. 2017).EDR was estimated as τ = (1/β) 0.5 , where β is the sum of coefficients for the main effect of distance (transformed as -d²) and any interaction effects with -d² (for example: β ARU[relative to human observer] + βd ²+ β Habitat[relative to coniferous forest] ).After calculating EDR for the human observer and each ARU type, we then calculated a correction factor for the effective area sampled by each ARU type relative to human observers (A'/A = EDR 2 ARU '/EDR² human ) in each vegetation type.This correction factor can be used to standardize the area parameter for animal density when comparing data from ARUs and human observers.
We performed Monte Carlo simulations to (1) estimate uncertainty in EDR point estimates for each sound, and (2) test for statistical differences between different vegetation types.We generated coefficients (n = 1000) using maximum-likelihood estimates and variance-covariance matrices from the original models to calculate 90% confidence intervals from the predicted values (Appendix 1; Yip et al. 2017).We omitted EDR estimates that (1) failed to solve because of a lack of nondetections in the raw data, or (2) failed to generate confidence intervals because of high uncertainty when predicting from the original model.
We estimated MDD for the same data by selecting the largest distance with a correctly identified detection based on the 95% quantile of positive detections for each species.We estimated MDD separately for ARUs and human observers using the same data for our EDR calculations to compare results from both approaches.After estimating MDD, we calculated the maximum area sampled and correction factors for each ARU type relative to human observers (A'/A = MDD² ARU '/MDD² human ) in each vegetation type, using the same method as for calculating correction factors for EDR.

Study area
We used known distance data and broadcasts of the same species and tones to explore effects of sound amplitude on detection by ARUs.We conducted the amplitude study from September-October 2014 in the Blackfoot-Cooking Lake Natural Area (53°2 5' N, 112°49' W) near Edmonton, Alberta from 09:00-16:00 MST.We placed 10 transects in open vegetation (> 75% grass cover, < 5% shrub cover, 0% tree cover) and 10 in denser vegetation (mature deciduous stands composed primarily of trembling aspen with small amounts of balsam poplar [Populus balsamifera] and white spruce).

Data collection
At each transect we placed a SM2+ ARU in the same setup as the previous experiment and broadcasted songs and tones from a distance of 50, 100, and 150 m away.We broadcast each song or tone at 11 sound pressure levels (a-weighted SPL, a measure of sound pressure relative to the threshold for human hearing) from 40 to 90 dB at 5 dB increments (= 23 songs*11 amplitudes = 253 sounds played at each of the three distances).Each sequence of sounds at each amplitude lasted 1:43 and the full broadcast for all amplitudes was 18:53.For each distance within a transect, we noted temperature, humidity, and wind speed values averaged over the duration of the broadcast using a handheld Kestrel 3000 handheld weather metre (Nielsen-Kellerman Co., Boothwyn, Pennsylvania).
Following field data collection, we used the programs PRAAT © version 5.4 and Adobe Audition © version 5.0 to cut all recordings into separate clips for each call on the recording and labelled calls according to site type (open or closed), site number (1-10), species call/tone, and amplitude.We randomized the clipped files by shuffling them with generic empty clips (containing only ambient background noise).Without knowing the file contents, 4 volunteers trained in avian call detection and recognition listened to and labelled each sound clip by whether or not a call was heard, and if so, of what species.http://www.ace-eco.org/vol12/iss1/art11/

Modelling sources of variation influencing detection of sounds
As in the HPC/ARU study, we used GLMs with intercept set to 0, distance transformed to -d², and a complimentary log-log link function to model whether or not a given song or tone was detected by observers listening to the ARU recordings.For each species or tone, we used a model where additive effects of distance and SPL were the only predictors of detection as a null model, where p(d) declined with distance at the same rate in different habitats and weather conditions, and varied with broadcast amplitude.We compared this null model to five candidate models (Table 2).We followed the same procedure for assessing the relative fit of the above GLMs using AIC, and assessed the goodness-of-fit of the highest ranked or most parsimonious model for each species, using AUC statistics and receiver operating curves as in the HPC/ARU experiment (Table 3).

Estimating effective detection radius for different sounds
We used the coefficients for different predictors in the best model to calculate EDR for each species or tone for different vegetation types and SPLs as with the previous experiment.EDR was estimated as τ = (1/β) 0.5 , where β is the sum of coefficients for the main effect of distance (transformed as -d²) and any interaction effects with -d² (for example: β SPL[45-90 dB in 5-dB increments] + β -d ²+ β Open habitat[relative to closed habitat] ).We estimated uncertainty using the same Monte Carlo method to calculate 90% confidence intervals for our EDR estimates.We did not estimate MDD for our second experiment because of a lack of precision with our distance variables (only three were used).

Using known distance data to estimate effects of recorder technology, vegetation type, weather, and species detection Effective detection radii for humans and ARUs in different vegetation types with known-distance data
Detectability declined as distance to sound increased for all species and tones (mean ± SD across all models β x = 1.312x10 -5 ± 1.399x10 -5 ; Table 4).Declines in detection rate were greater in both coniferous (mean β coniferous = -0.165± 1.066 relative to road) and deciduous (mean β deciduous = -1.482± 1.456 relative to road) vegetation types in comparison to open roadside transects (Fig. 1).Ninety percent confidence intervals for our estimates of EDR from human detection data showed significant differences between roadside and forested detection distance for 18 of 32 sounds (5656Hz, 8000Hz, BAWW, BEKI, BHCO, BLWA, CCSP, DEJU, LISP, OSFL, OVEN, PISI, RBGR, RBNU, TEWA, WAVI, WTSP, YERA; Table 5).We were unable to assess roadside confidence intervals for five sounds (1414Hz, 2828Hz, CMWA, BOOW, NSWO) because of undefined EDR estimates.We found no significant difference in detection distance between coniferous or deciduous vegetation types.ARU type also influenced detectability although this varied depending on the species or tone present.However, detectability was generally higher for human observers relative to ARUs (mean relative to human: β SM2 = -2.108± 1.312, β SM3 = -0.963± 1.086, β RiverForks = -1.181± 1.353, β Zoom = -1.643± 1.407; Fig. 1).All top performing models included distance, transect type, and ARU type as important predictors (Table 5).The top performing model for 16 species and tones (1000Hz, 1414Hz, 2000Hz, BADO, CMWA, BOOW, CATO, CORA, GGOW, LEOW, NSWO, OSFL, RBGR, RBNU, WETO, WTSP) included humidity which positively influenced detectability for all sounds with the exception of CMWA (mean β humidity = 0.020 ± 0.013).Three species (CATO, WETO, YERA) had wind in their top performing model which also had a positive influence (mean β wind = 0.191 ± 0.046).Interaction effects between ARU and transect type were part of the top performing model Table 3. Model selection for factors influencing detection probability of different sounds for the SPL experiment and AUC statistics on test data for the top AIC-ranked model testing differences in detection with varying SPL.All sounds used the same models for selection.We selected top models using lowest AICc value and ∆AICc.For multiple models with ∆AICc < 2, we selected the simplest model with fewest parameters (Arnold 2010)."df " is the degrees of freedom and "logLik" is the log likelihood value for that particular model."*" indicates variable interactions.for seven sounds (BADO, BAWW, CMWA, BOOW, GGOW, LEOW, TEWA) indicating that detectability varied with both the type of ARU and the transect the sounds were broadcast through.For these sounds, detectability declines suddenly relative to ARUs as distance increases, particularly in coniferous vegetation types.Mean (± SD) wind speed averaged over the duration of the broadcast sequence at each distance along a transect was 1.1 ± 1.4km/h.Mean temperatures during each broadcast was 23.7 ± 6.5 °C.Relative humidity was 59.0 ± 18.9%.Performance for all models was excellent (AUC: min = 0.9180, max = 0.9659, median = 0.9647; Table 5).

Using known distance data to estimate effects of sound amplitude on detection Effective detection radii for sounds at different amplitudes in different vegetation types
For all species and tones in the sound amplitude study, detection probability declined with increasing distance (mean ± SD across all models β x = 2.913x10 -4 ± 1.476x10 -4 ; Table 6) and decreasing sound amplitude (mean β SPL = 0.183 ± 0.037).Probability of detection at a given distance was higher in open vegetation than in closed vegetation (mean β OpenHabitat = 1.983 ± 0.899, relative to closed habitat).The best model predicting detection of each species or tone generally included distance, vegetation type, and amplitude (Table 3).Three sounds (1414Hz, LEOW, YERA) included wind in their top performing model, two sounds (4000Hz, WETO) included humidity, and one sound (CMWA) included both wind and humidity.Wind negatively influenced detectability (mean β Wind = -0.168± 0.076) for all four sounds while humidity had a positive effect for CMWA (β Humidity = 0.023) and WETO (β Humidity = 0.042) but negative for 4000Hz (β Humidity = -0.036).Mean wind speed averaged over the duration of the broadcast sequence at each distance along a transect was 4.0 ± 2.8km/h.Mean temperatures during each broadcast was 15.2 ± 6.4 o C. Relative humidity was 50.5 ± 14.2%.Performance for all models was excellent (AUC: min = 0.8705, max = 0.9836, median = 0.9495; Table 3).
As in the human-ARU comparison study, species with relatively low detection probability (e.g., BAWW, CMWA) had smaller EDR values than species with relatively high detection probability (e.g., owls; Appendix 3).EDR values were generally higher in open vegetation than closed vegetation and increased as sound amplitude increased.When sounds were pooled into one general model, we found no significant interaction effects between SPL and the type of sound (i.e., species or tone) indicating a consistent positive relationship between EDR and SPL for all sounds broadcasted (Fig. 3).Many EDR values were undefined at higher broadcast SPL in open vegetation because of an inadequate number of nondetections.For EDR to be defined, nondetections must occur at the furthest distances, which did not occur at higher sound amplitudes.

DISCUSSION
Detectability of avian vocalizations can be influenced by the surrounding environment (Darras et al. 2016, Yip et al. 2017) and by the methods used to record and identify observations (Haselmayer and Quinn 2000).We compared detection distances of different ARUs as well as human observers in the field and found differences in detectability depending on which method was used.Using the ARU-human comparison calculated here, we conclude that ARU data can be integrated with HPC datasets into larger analyses to increase the scope of inferences made about birds (Cumming et al. 2010).For example, EDR has been estimated for over 100 species by the Boreal Avian Modelling Project (hereafter BAM; http://www.borealbirds.ca/)using human-based distance estimation.Similarly, MDD for all North American species have been agreed upon by Partners in Flight (hereafter PIF; Rosenberg and Blancher 2005).For example, BAM estimates EDR for BAWW to be 50.1 m and PIF uses a MDD value of 100 m (PIF Science Committee 2013).Thus, for surveys in deciduous forest using an SM2 wildlife recorder, the EDR correction factor calculated from our study would be 0.757 and the MDD correction factor 0.779 (Appendix 1, 2).The corrected EDR would then be 37.9 m and corrected MDD would be 77.9 m for counts done using an SM2 in similar habitat.Ornithologists can directly compare density estimates from HPC and ARU data after standardizing both data types using this technique, enabling organizations like BAM or PIF to augment their existing HPC data with ARU data.
Human field observers had the highest detectability and detection distances in comparison to recordings from the SM2s, SM3s, RiverForks, and Zoom recorders.SM2s had the lowest detectability and detection distances followed by Zoom recorders, RiverForks, and SM3s.The use of ARUs to record animals introduces additional static, white noise, and electronic interference during the detection process of avian vocalizations, likely contributing to the patterns of decreasing detectability from recordings.However, we presented observers with a limited variety of species and sounds and in the first experiment, observers knew the order that the sounds would be occurring.When sounds are unpredictable and there is uncertainty about what species may be present, detections from recordings will likely increase relative to field surveys from humans because of the opportunity to double check observations in a lab-based environment.
Probability of detecting species declined more rapidly with increasing distance in closed vegetation than in open vegetation in both of our experiments (first experiment: roadside vs forest, second experiment: open grassland vs closed forest).These results are consistent with previously documented differences in detection between vegetation types (Schieck 1997, Pacifici et al. 2008).However, we observed differences in the effect of weather variables between experiments, which may have been due to the distance over which the experiments occurred.Weather effects were influential for sounds with larger EDR values (17/32 sounds; Table 5) in our first experiment as in Holland (2001) and Simons et al. (2007), but were not as prevalent in our second experiment (6/32 sounds; Table 3).In our second experiment, broadcasts only Fig. 3. Influence of the sound pressure level (dB) of our song broadcasts on EDR for tones, owls, songbirds, and all other species, plotted separately.We found no statistically significant interaction between different species/tones although EDR for two species of owl (GGOW and LEOW) appear to increase at a greater rate with distance than other sounds.
occurred to a maximum of 150 m, meaning weather variables may not have as much distance over which to act on broadcasted signals, suggesting there may be an interaction between weather conditions, distance, and sound transmission.Humidity had a consistently positive effect on detectability except for one species (CMWA) in our first experiment and one tone (4000Hz) in our second.However, the relationship between wind and detectability differed between the first (positive relationship) and second (negative relationship) experiment although wind was not included in many of our top performing models.We did not record the direction of the wind relative to the direction of our broadcasts, which may have contributed to this pattern.We also recorded higher but more consistent wind speeds in our second experiment relative to the first.A more limited range of wind speeds in the second experiment may be the reason wind was not included in those models as often.Knowing how factors like weather influences the area sampled is crucial to converting counts from ARUs and humans to accurate density estimates and is an area that we argue needs more work.
We found that EDR was consistently, positively correlated with broadcast SPL regardless of species (Fig. 3).This is important for two reasons.First, we broadcast sounds at 90 dB, which we believe to be the upper range of amplitudes that birds might vocalize at (Brumm 2004, Patricelli et al. 2007).We also had our speaker oriented directly at the receiver, which may result in unrealistic and overestimated EDRs.However, the importance of this study lies in the relative difference in EDR between treatments, which should remain the same regardless of SPL.
Given that EDR increased consistently with SPL for all species (Fig. 3), we believe singing volume could be estimated for real birds using predictions from our EDR models, corrections factors, and applying our model predictions to EDRs from BAM's human based estimates of EDR, albeit with varying degrees of uncertainty depending on model performance.This would also be under the assumption that EDRs estimated from BAM were calculated under similar conditions and that human observers estimate EDR accurately.It is not clear how accurate EDR measurements are by humans and our results show the importance of environmental variables such as the openness of the surrounding environment.Although our best performing models suggest that EDR increases consistently with SPL for most sounds, there were outlier sounds (BADO, LEOW) where EDR increased differently relative to the general trend (Fig. 3), possibly because of uncertainty in our EDR estimates.
The second reason that the consistent response of EDR to SPL is important is that it may provide a simpler way to calibrate ARUs to humans and each other.More recorder models are becoming available and the ones currently in use are routinely being updated with newer models, which have different gain settings, sensitivities, and residual electronic noise.All of these factors influence the area sampled for birds relative to humans and other ARUs (Rempel et al. 2005).Sound frequency acted differently on each recorder suggesting that microphone frequency response plays a role in detectability.Detectability decreased and differences in EDR and resulting correction factors increased with frequency for SM2s while the opposite was observed with SM3s and Zoom recorders (Fig. 2).The method we used to compare EDR between various recorders and human observers in our first experiment provided high resolution information on relative differences in detection distance, but was time consuming to carry out.We argue that, in the future, we could calibrate EDR at different amplitudes for multiple brands of ARUs using relatively few distances as in our second experiment because EDR decreased consistently for most sounds as SPL declines and would be comparable to the relative difference in EDR at 90 dB.This would allow researchers to calculate a correction factor more quickly based on the relative difference.
Our results provide further evidence supporting conclusions of previous researchers (Haselmayer and Quinn 2000, Hobson et al. 2002, Celis-Murillo et al. 2009) that the counts derived from both ARUs and human observers are relatively comparable.However, our study tested detectability under relatively controlled conditions through broadcasts and with a limited variety of species and sounds.The results found in this study may differ when field observers must identify overlapping vocalizations, unfamiliar species, or sounds in acoustically busy sampling periods that would likely have a larger influence on detectability than with ARUs.Although human observers appeared to generally detect more of the broadcasted sounds than different ARUs (particularly the SM2+), EDR and effective area sampled by some ARUs was comparable to that for human observers for some species.Furthermore, differences between recorders should be irrelevant if we can standardize data from different sources by offsetting varying detection distances and areas of ARUs.
Influences of weather on EDR can be controlled to an extent by survey protocol (e.g., survey only when wind is < 2 on the Beaufort scale, when there is no rain, etc.) and corrections for variables such as vegetation/habitat type can be calculated separately (Yip et al. 2017) and applied in conjunction with corrections calculated in this study.
Although we demonstrate that simultaneous comparisons of HPC and ARU data potentially enable the calculation of EDR and densities of birds from ARU recordings, this approach still relies on accurate distance estimation during HPCs, an assumption that is frequently violated during avian surveys (Alldredge et al. 2007, Nadeau andConway 2012).Errors in distance estimation can bias EDR and bird density calculations and will persist when using our correction approach for ARU data.There are also factors unrelated to distance estimation that should also be considered before collating these two types of point counts for the same analysis.First, some detections in HPC may be only visual, particularly of rare or of quiet species that are unavailable to ARUs, or rarely vocalizing species that are unlikely to be detected in short-duration recordings (Haselmayer andQuinn 2000, Hutto andStutzman 2009).Second, because ARUs provide a permanent record for review, there may be a negative bias associated with species detection in HPC relative to ARU recordings because people listening to ARU data can relisten to a sound (Tegeler et al. 2012).This bias could be modeled as observer effects.Calibration of ARUs should also be an important part of the permanent record.Microphone sensitivity can decrease with use (Turgeon et al. 2017) and influence the area surveyed.Microphone quality should be checked regularly to ensure minimal variation in detection distance within recorder models.Variation in detectability between observers can be large and influence results in both HPC and from ARU recordings in part because of differences in hearing ability and experience identifying species (Sauer et al. 1994).Observer variation within ARU point counts is likely lower than HPC as a permanent record allows multiple observers to process recordings and double check unknown species.Our study should minimize interobserver variability because observers were presented with a limited number of sounds that they could review prior to the experiment.Observers were also composed of males and females between the ages of 18 and 28 who are more likely to have similar hearing levels (Emlen and DeJong 1992).
Our objectives were to investigate relative differences between ARUs and HPC.We provide methods for standardizing and correcting detection distances to derive avian densities from ARUs by accounting for differences in the area surveyed through each method.We used the ecosystems presented in this study as a case study to demonstrate application of this method, however these methods can be applied to other habitat types to broaden their use.This approach to density estimation would be more logistically feasible and affordable than studies using microphone arrays to obtain density (Efford et al. 2009).Integration of data from ARUs and HPCs could allow for larger meta-analyses to make environmental inferences about interactions between birds and the environment at larger spatial scales (Cumming et al. 2010).
Responses to this article can be read online at: http://www.ace-eco.org/issues/responses.php/997 Appendix 1.Detection radius, detection area, and correction factors calculated for EDR for different songs and tones detected by four brands of autonomous recording units (ARUs), from listening trials conducted at 20 transects near Calling Lake and Lac La Biche, Alberta, Canada in 2014.
Correction factors are relative to human observers in the field and are calculated using a ratio of ARU to Field Observer detection areas.

Fig. 1 .
Fig. 1.Probability of detecting OSFL with distance from ARU in (A) open (roadside) and closed (forested) habitat, and (B) with human observers, RiverForks, SM2, SM3, and Zoom recorders.Predictions are calculated from binomial detection data and plotted with 95% confidence intervals.

Fig. 2 .
Fig. 2. Correction factors for (A) EDR and (B) MDD of various ARU types at different frequencies.ARUs are in comparison to human detection as a reference.Correction factors are calculated using a ratio of detection area of ARU to detection area of a field observer (Appendix 1, 2).Correction factors less than 1 mean smaller detection distances than human observers in the field and can be applied to ARU data to standardize it with data from HPC.

Table 1 .
Candidate models to be compared against a null distance model for the autonomous recording units (ARUs) experiment.

Table 2 .
Candidate models to be compared against a null distance model for the SPL experiment.

Table 4 .
Model coefficients (recorder type, habitat type, distance, interactions between habitat type and distance, wind, humidity) for the top AIC-ranked model predicting probability of detecting each species and tone with RiverForks (RF), SM2, SM3, and Zoom (Zm) recorders, in listening trials conducted at 20 transects near Calling Lake and Lac La Biche, Alberta, Canada in 2014."x" is equal to -(Distance)2.The reference level for coniferous (Co) and deciduous (Dec) habitat is roadside habitat."NA" means that variable was not included in the top model for that sound."*" indicates variable interactions.

Table 5 .
(Arnold 2010)on for factors influencing detection probability of different sounds for the ARU experiment and AUC statistics on test data for the top AIC-ranked model testing differences in detection distance between multiple models of ARU.All sounds used the same models for selection.We selected top models using lowest AICc value and ∆AICc.For multiple models with ∆AICc < 2, we selected the simplest model with fewest parameters(Arnold 2010)."df " is the degrees of freedom and "logLik" is the log likelihood value for that particular model."*" indicates variable interactions.

Table 6 .
Model coefficients (amplitude, distance, habitat type, interactions between habitat type and distance, wind, humidity) for the top AIC-ranked model predicting probability of detecting each species and tone, in listening trials conducted along 20 transects in the Blackfoot-Cooking Lake Natural Area near Edmonton, Alberta, Canada in 2014."x" is equal to -(Distance)2.SPL = amplitude (dB).The reference level for open habitat is closed habitat."NA" means that variable was not included in the top model for that sound.
Values lessthan 1 indicate a smaller detection area relative to human observers and values greater than 1 indicate greater detection area relative to human observers.Correction factors can be applied to ARU data to standardize survey areas with those of observers in the field."NA"indicates EDR values that could not be solved by our models due to uncertainty caused by insufficient non-detections.Detection radius, detection area, and correction factors calculated for MDD of different songs and tones detected by four brands of autonomous recording units (ARUs), from listening trials conducted at 20 transects near Calling Lake and Lac La Biche, Alberta, Canada in 2014.Correction factors are relative to human observers in the field and are calculated using a ratio of ARU to Field Observer detection areas.Values less than 1 indicate a smaller detection area relative to human observers and values greater than 1 indicate greater detection area relative to human observers.Correction factors can be applied to ARU data to standardize survey areas with those of observers in the field.