Long-term monitoring programs such as the North American Breeding Bird Survey (hereafter BBS; Sauer et al. 2014) and the Christmas Bird Count (Link et al. 2006) provide crucial data that help to guide species status assessments and conservation efforts (Downes et al. 2016). Despite the success of those monitoring programs, many species are insufficiently monitored because of large geographic gaps in avian monitoring programs (Sauer et al. 2003, Francis et al. 2009, Machtans et al. 2014), or the species life history traits, e.g., crepuscular/nocturnal behavior, make them hard to monitor using traditional census methods (Goyette et al. 2011, Zwart et al. 2014). Many of the gaps in monitoring efforts exist in regions, e.g., boreal forest, or habitats where poor access and logistical constraints, high cost, and a lack of skilled observers have hindered monitoring efforts (Sauer et al. 2003, Francis et al. 2009, Machtans et al. 2014). Thus, it would be beneficial to augment monitoring efforts with alternative methods for poorly sampled species, habitats, and regions to better guide species assessments, conservation prioritization, and test hypotheses for causes of population change.
One potential solution to monitor bird populations in regions and habitats in which it is difficult to send skilled observers is to augment human observers with stereo recordings (Hobson et al. 2002, Francis et al. 2009, Klingbeil and Willig 2015). This approach is a plausible solution because most avian monitoring and research programs collect count data using point count surveys that primarily rely upon detection of acoustic cues (Hobson et al. 2002, Blumstein et al. 2011, Matsuoka et al. 2014). Indeed, several comparisons of data from stereo recordings against human observers in the field suggest that recordings provide relatively similar estimates of species abundance and community composition (Hobson et al. 2002, Blumstein et al. 2011, Venier et al. 2012, Klingbeil and Willig 2015).
Although many comparisons of acoustic recordings with point counts conducted by human observers suggest the data are generally comparable, subtle differences nonetheless do exist (Hutto and Stutzman 2009, Venier et al. 2012). For example, Hutto and Stutzman (2009) found that significantly more species were detected using point counts than acoustic recordings and speculated that this may in part be due to differences in the radius over which species were audible between methods. In addition to potential differences between a given recording system and human observers, previous research has also shown variable species detection between different recording systems (Rempel et al. 2013). Thus, broader incorporation of ARU-based data into monitoring programs may require correcting for differential detectability between ARUs and human observers in the field (Sidie-Slettedahl et al. 2015). We describe a sampling design and analysis framework to relate counts from ARUs to traditional point count data and derive estimated bird densities for both by correcting data for biases in species availability and perceptibility following earlier developments by Bart and Earnst (2002) and Sólymos et al. (2013). Development of this method will allow for efficient and cost-effective augmentation of acoustic monitoring programs with ARU technology for species and regions with poor survey coverage.
We develop a framework to use simultaneously conducted human point count and ARU surveys to estimate statistical offsets to adjust for systematic differences between counts conducted by humans versus ARUs (following Sólymos et al. 2013). We use field data to test whether our approach removes bias in ARU-based counts relative to densities estimated from field observers conducting point counts using both distance estimation and time removal sampling. Our proposed approach to correcting counts between ARUs and humans would be most easily applied if the statistical offsets do not vary with other factors affecting detection, and we therefore tested whether the offsets differed by habitat or with environmental noise. Based on the literature and field experience, we hypothesized that ratio of counts from ARUs relative to counts from human observers conducted at the same time and location would be < 1 on the assumption that the detection radius of the ARU would be less than that of a human observer. We also hypothesized that the same ratio would be smaller when recordings were made in deciduous vs. other forest types and/or windy conditions because microphones tend to amplify leaf rustle (B. Turnbull, personal communication).
We conducted our study in the boreal forest of Saskatchewan, Canada. Study sites were located in Bird Conservation Region 6 (Boreal Plains Ecozone) and Bird Conservation Region 8 (Boreal Shield Ecozone) between 53° 34'N, 103° 43' W and 58° 08'N, 109° 28' W (Fig. 1) in the summers of 2014 and 2015. Surveys were conducted at 363 unique point count stations distributed among 105 study sites, each of which constituted unique forest stands. We sampled between 1 and 12 point count locations per study site (median = 3). Effort was approximately equally distributed between years (n = 205 in 2014 and n = 200 in 2015), with 42 point count stations in 9 separate study sites surveyed in both years. Based on the 250 m resolution Land Cover Map of Canada 2005 (LCC05; Latifovic et al. 2008), the most frequently sampled land cover classes were open coniferous (29.2%), closed mature mixed (16.3%), and open mature deciduous (11.6%) forest, with the remainder of the point count samples distributed among 13 other land cover classes (Table 1).
All surveys were conducted by one of five observers between 15 minutes prior to sunrise and 4.5 hours after sunrise between 1–29 June. Upon arriving at the point count station, observers attached a Song Meter SM2+ ARU with a pair of SMX-II microphones (Wildlife Acoustics Inc. ©, Maynard, MA) to the nearest tree at approximately head height and began a manual recording. ARUs were programmed to record in stereo in wav file format, using a sampling rate of 44,100 samples per second and using factory default acoustic gain settings for the microphone preamplifier. The observer stood approximately 3 m from the ARU to avoid introducing extraneous noise into the recordings. Observers then announced when they began and ended a simultaneous 10-minute point count to ensure data from the subsequent transcription of ARU recordings were collected over the identical time frame to point counts conducted by the human observers.
Our point count protocol followed those recommended in Matsuoka et al. (2014). In brief, field observers placed observed or acoustically detected individuals into one of three distance bins (0–50, 50–100, > 100 m) while conducting point counts. Observers were trained in distance estimation prior to field work in addition to opportunistically ground-truthing distance estimates using GPS units to estimate distances between point count centroids and birds heard or observed while walking between point count locations. To account for differences in availability, observers additionally coded observations to the time interval (0–3, 3–5, and 5–10 minutes) of initial detection; thus treating any subsequent detections of the same individual as though it were “removed” from the population (Farnsworth et al. 2002).
To avoid introducing additional observer biases, ARU recordings were transcribed by the same observer that conducted the field count. During transcriptions, each acoustically identified individual was coded into one of 10 subset 1-minute long time periods (0–1, 1–2, through 9–10 minutes, respectively) to facilitate estimation of availability using a count-removal approach. ARU based intervals were subsequently collapsed to the 0–3, 3–5, and 5–10 minute intervals to match the human observer-based design. Transcribers were not privy to field notes during transcription and transcription was conducted after field season. Unlike counts conducted under field conditions, transcribers were allowed to pause and/or rewind the recording, e.g., to confirm identification, as is frequently done in data transcription. Finally, transcribers categorized environmental noise recorded from the ARU data for each point count using a five point scale (1 = none, 2 = light, 3 = moderate, 4 = heavy, and 5 = excessive).
Sólymos et al. (2013) previously demonstrated that point count data can be adjusted for differences in field methodologies if the data include information on the time (Farnsworth et al. 2002, Sólymos et al. 2013) and distance (Matsuoka et al. 2012, Sólymos et al. 2013) intervals in which the individuals were first heard. These extra data allow the application of removal (Farnsworth et al. 2002, Sólymos et al. 2013) and distance modeling (Buckland et al. 2001, Matsuoka et al. 2012) to estimate components of detection probability. Specifically, removal or time-of-detection methods allow the estimation of the probability that an individual bird present at the time of survey gave a visual or auditory cue and was therefore available for detection, i.e. availability (p), while distance sampling allows estimation of the probability that the available birds were detected (perceptibility [q]) given that they were available (Alldredge et al. 2007a, Nichols et al. 2009). The two components of the observation process can be estimated independent of each other using conditional maximum likelihood estimation (see Appendix in Sólymos et al. 2013). Sólymos et al. (2013) established that incorporating the components of detection probability as statistical offsets in generalized linear (GLM) or generalized linear mixed effects (GLMM) models effectively adjusts count data for differences in point count methodology. The offset based method of Sólymos et al. (2013) forms the basis of our approach to placing ARUs and human observers on a similar footing, but assumes we can approximate both components of detection for both humans and ARUs. Obtaining p for both ARUs and humans is simply a matter of removal sampling (Farnsworth et al. 2002, Sólymos et al. 2013) or employing time-of-detection methods (Alldredge et al. 2007a,b). The use of Global Positioning System (GPS) synchronized ARU arrays allows distance to sound source to be directly estimated via differences in timing of sound arrival to linked ARUs (Dawson and Efford 2009, Mennill et al. 2012). The use of synchronized ARU arrays can provide accurate and precise estimates of density (Dawson and Efford 2009, Mennill et al. 2012), but is expensive owing to the need for many ARUs spaced over small distances, e.g., ~30m (Mennill et al. 2012). To reduce costs, it would therefore be advantageous to devise methods of estimating distance-related detection error for single ARUs sampled with an unknown effective detection radius (hereafter EDR). Thus, we require a method to indirectly estimate detectability for single ARUs to apply the methods of Sólymos et al. (2013).
For a count conducted by the human observer (H), the expected value of a count for a single species from a point count survey observer can be expressed as:
where YH is the count, N is the species’ abundance, D is the point level density (per unit area), AH is the area sampled, p(tj) is the probability of an individual singing (and being detected) at least once during the cumulative duration of the count (tj) given that it is present to be detected (j=1,...,J; the number of time intervals), and q(rk) is the probability that an individual bird within point count radius (rk) is detected given that it is provides a cue, e.g., song, (k=1,...,K; the number of distance intervals). Although the area sampled (AH) is typically unknown, it can be estimated via distance sampling, for example using binomial or multinomial distance estimators to estimate the effective detection radius (EDR, denoted here as τ) assuming perfect detectability (q = 1) within this effective distance:
The simplest approach to determine the relationship between counts from human observers and those from ARUs is to conduct paired sampling (or “double sampling” sensu Bart and Earnst 2002). If we simultaneously use an ARU (A) to record the same acoustic environment in which a human observer (H) is conducting a point count, the population density to which both are exposed is identical by design; i.e., D = DH = DA. As a result, if all else is equal then differences in the observed counts from ARUs and human observers should be primarily due to differences in the area sampled by each method. We note, however, that minor differences in estimated abundances could also be due to differences in the probability of detecting cues from individuals birds (pH versus pA) related to differences in how detections are made in the field versus in laboratory, e.g., possibility of double checking recordings or lack of external distractions in the laboratory. This assumption (pH = pA) can be explicitly tested by estimating pH and pA from the data by recording time intervals in which individuals were first detected and using removal models (Farnsworth et al. 2002, Sólymos et al. 2013) or using time-of-detection methods (Alldredge et al. 2007a,b). For the sake of simplicity, we start by assuming that pH and pA are equal. If we divide the expected values of the counts, we can observe the expected relationship between the areas sampled by humans versus ARUs:
So, if we let δ be such that:
As a result, the ratio of mean counts derived from the ARU to mean counts by the human observer provides an estimate of a squared scaling constant (δ²) that mathematically relates τH to the unknown EDR of an ARU (τA).
Counts are often modeled in log-linear Poisson general linear (GLM) or generalized linear mixed (GLMM) models. If we estimate τH and pH using distance and removal sampling, respectively, following Sólymos et al. (2013) we can calculate a correction factor (C) and the mean for a count made at point count location i (i=1,...,n; number of locations) by the human observer that can be expressed as:
Poisson or negative binomial GLM or GLMMs can be fit in this fashion combining human observer and ARU based counts using an indicator function (IA) taking 0 value for human observers and 1 for ARU based counts:
Log density is estimated as a linear combination of predictor variables and corresponding coefficients.
Prior to analysis, we removed species, e.g., gulls, ducks, that are poorly monitored using point count methods because they are frequently detected as flyovers and thus violate the closure assumption. We then began by estimating EDRH based on our model calibration data. We limited analyses to species with at least 15 detections, and fit half-normal binomial distance models to estimate EDRs (Matsuoka et al. 2012). We then fit count removal models to both the human observer and ARU data using a model in which we included survey type as a factor to test for a difference in species availability. We considered p unequal if the 95% confidence interval (hereafter 95% CI) for ARU survey parameter estimate did not overlap zero. Distance and removal models were fit using the “detect” package based on conditional maximum likelihood estimating procedure (Sólymos et al. 2016).
Although δ² can be approximated based on the square root of the ratio between arithmetic mean ARU and human observer counts (see above), we are interested in deriving maximum likelihood estimates and associated confidence intervals of δ that account for sampling design. These can be derived from coefficients (δ² = exp[β]) from Poisson or negative binomial regression, which can be interpreted as the ratio of the count between levels of a treatment. We used Poisson GLMMs to estimate δ by including a fixed effect factor for survey type (ARU vs. human as the reference category), and included random intercepts for station and visit to account for paired observations between human observers and ARUs. Following Sólymos et al. (2013), we used our human observer data to derive species specific estimates of log(EDRH² ∙ π ∙ p) and included these as statistical offsets in our GLMMs. We estimated δ for each species in which the comparison of availability between ARU and humans (above) showed pH is approximately equal to pA as per the assumptions of our approach.
We validated the predictive performance of our models and examined bias in density estimates (relative to those estimated by the human observers in the field) by using repeated random subsampling of the data. In each repeated subsample, data were partitioned by randomly selecting 70% of the study sites (n = 74) for developing GLMMs from which we estimated δ, and 30% of the study sites (n = 31) were withheld as independent validation samples. We repeated this random sampling 50 times. In each repeated sample, we estimated δ using the aforementioned GLMM structure and calculated the 95% CI across the 50 replicated analyses. We also calculated empirical estimates of δ by dividing the mean ARU count by the mean human observer count in the withheld validation data for each of the 50 repeated subsampling events. We then assessed whether the 95% CIs for the GLMM-based estimates of δ overlapped with 95% CIs from the empirical estimates of δ and examined the (Pearson’s) correlation between both estimates of δ. In addition, we also examined whether the inclusion of δ in statistical offsets reduced bias in predicted densities from ARU surveys within each random subsampling. We began by estimating density for human observations by fitting a GLMM to the subset of each validation subsample in which we included a random intercept for study site and a statistical offset, i.e., log(EDRH² ∙ π ∙ p) to generate mean study site level density estimates. We then fit two competing models to the ARU data from the same sites with the same random effects structure as used for the human observer data, but fit one (our “null” model) in which we included the statistical offset used for the human observer data, and a competing model in which we used the δ estimate from the model calibration data within the same iteration to estimate the offset as log([δ *EDRH]² ∙ π ∙ p). Based on these models, we calculated bias as the difference between the mean density predicted from the models fit to the ARU data minus the predicted mean density estimated from the human observer data. We were interested in testing whether δ values estimated by different approaches were statistically different. Because δ represents a relative difference between two methods and we used the same EDRH estimates, incorporating the uncertainty around EDRH would not have changed our results. It should be noted however that propagating the error through modeling (e.g. as described in Sólymos et al. 2013) might be required when estimating bird densities.
Finally, we used our full data set to assess whether δ varied between habitat types (deciduous/mixed wood habitat types versus all other categories) and environmental noise conditions. We constructed six a priori GLMM models that all included random intercepts for station and visit as per above and offsets based on those calculated from human observer data. We included a null (intercept only) model, a model with fixed effects for survey type (two-level factor: ARU vs. human), habitat type (two-level factor: deciduous/mixed vs. other) and models with both survey type plus habitat type or survey type plus environmental noise as main effects. Although noise was an ordinal variable, previous analyses suggest a reasonably linear response of counts to our noise variable and thus we treated noise as a linear covariate. Finally, in addition to the main effects models, we included two models that incorporated interactions between survey type and habitat type versus one with survey type and noise interaction. We did not consider models including all three main effects and interactions. We selected among competing models based on Akaike’s information criterion (AIC, Burnham and Anderson 2002), and we only considered models with a ΔAIC of < 2 as potentially competitive. If habitat or environmental noise differentially impacted the detection radius of an ARU relative to a human observer, models including the interaction terms should receive the greatest support.
Forty-one species met our minimum sample size criteria (Table A1.1). Across species, effective detection radii (EDRH) ranged from ~34 to 167 m (median = 68 m; Table A1.1). Of the 41 species for which we estimated availability based on count-removal models, models did not successfully converge for one species (Northern Flicker, Colaptes auratus), and five species did not meet the assumption of equal availability based on parameter estimates for the ARU factor in removal models. Estimates of availability (p) were strongly correlated (Pearson’s r = 0.81, p < 0.001) between count removal models fit to human observer versus ARU count data (Fig. 2; Table A1.1). Species not meeting the assumption of equal availability include Ruffed Grouse, Bonasa umbellus (survey effect for ARU; β = 0.84, SE = 0.37), Red-eyed Vireo, Vireo olivaceus (β = 0.30, SE = 0.09), Tennessee Warbler, Oreothlypis peregrina (β = 0.25, SE = 0.10), Chestnut-sided Warbler, Setophaga pensylvanica (β = 0.43, SE = 0.20), and Chipping Sparrow, Spizella passerina (β = 0.66, SE = 0.24).
Models examining variation in paired counts between human observers and ARUs suggested that the majority of species were slightly less detectable on ARU recordings than in the field (Fig. 3; Table A1.2). Across species, the median estimate of δ was 0.95 (minimum = 0.78, maximum = 1.11); however, 95% CIs overlapped one for 18 out of 35 species (Fig. 3; Table A1.2). Comparison of 95% CIs around estimates of δ against those for empirical ratios of ARU to human observer counts showed overlap for all 35 species (Table A1.2). Across species, estimates of δ from our calibration models were positively correlated with empirical ratios derived from the withheld validation samples (Fig. 3; Pearson’s r = 0.84, p < 0.001).
Applying δ estimates within the statistical offsets resulted in reduced bias for 33 out of 35 species compared to modeling the data using offsets taken solely from human observer data (Fig. 4). Failing to incorporate δ estimates within the statistical offsets resulted in 24 species with negative biases in their density estimates (Fig. 4). Of the 24 species with negatively biased density estimates derived using uncorrected offsets (taken from human observer data), five species (Ovenbird [Seiurus aurocapilla], Dark-eyed Junco [Junco hyemalis], Ruby-crowned Kinglet [Regulus calendula], Clay-colored Sparrow [Spizella pallida], and Connecticut Warbler [Oporornis agilis]) had 95% CIs that did not overlap zero (Fig. 4), whereas density estimates for these same species were unbiased when δ was incorporated in the offsets (Fig. 4). For Ovenbird, failing to incorporate δ within the offset resulted in density being underestimated by 0.10 birds ha-1 on average. Similarly, densities of Dark-eyed Junco, Ruby-crowned Kinglet, Clay-colored Sparrow, and Connecticut Warbler were underestimated by 0.07 birds ha-1, 0.04 birds ha-1, 0.02 birds ha-1, and 0.01 birds ha-1, respectively. Conversely, estimates of Philadelphia Vireo (Vireo philadelphicus) and Cedar Waxwing (Bombycilla cedrorum) densities were less biased on average when the statistical offsets did not incorporate δ; however, 95% CIs overlapped zero for both offset approaches (Fig. 4), suggesting both methods produced unbiased estimates. Although the remainder of the species had 95% CIs that overlapped zero for both of the statistical offset approaches, incorporating δ also (on average) reduced overestimation of densities (Fig. 4). For example, using uncorrected versus δ corrected statistical offsets resulted in greater overestimation of density on average for Cape May Warbler (Setophaga tigrina; 0.02 vs. 0.01 birds ha-1), Palm Warbler (Setophaga palmarum; 0.03 vs. 0.00 birds ha-1), American Redstart (Setophaga ruticilla; 0.03 vs. 0.02 birds ha-1), Brown Creeper (Certhia americana; 0.03 vs. 0.01 birds ha-1), Orange-crowned Warbler (Oreothlypis celata; 0.03 vs. 0.01 birds ha-1), and Nashville Warbler (Oreothlypis ruficapilla; 0.04 vs. 0.01 birds ha-1).
Based upon AIC model selection, the null model was the most parsimonious for 18 species, the model only including the survey type factor was the most parsimonious for two species, the model including factors for both survey and habitat type was the most parsimonious for 10 species, and the model including survey type and noise was the most parsimonious model for five species (Table 2). There was substantial model uncertainty for virtually all species; however, neither of the interaction models received substantial support (Table 2). Across species, the minimum ΔAIC (3.43) for the survey type by habitat type interaction was observed for Dark-eyed Junco, and parameter estimates from the interaction in that model show little evidence for an effect (β = -0.38, SE = 0.51). Similarly, the most substantial support for the survey type by noise interaction was observed for American Robin (Turdus migratorius; ΔAIC = 3.21), which similarly showed little evidence for an effect (β = 0.17, SE = 0.32).
Our results provide further evidence supporting the conclusions of previous researchers (Haselmayer and Quinn 2000, Hobson et al. 2002, Celis-Murillo et al. 2009, Blumstein et al. 2011) that the raw counts derived from both acoustic recordings and human observers are relatively comparable. In paired comparisons between ARU and human observers, we found that the null models were favored over models incorporating a survey type effect for 18 out of 35 species. This was further supported by parameter estimates for the survey effect (δ) that overlapped one for the majority of the species that we modeled. Together, these lines of evidence suggest occasional minor biases exist in count data from ARUs relative to human observations.
Despite the relative similarity of many of the raw counts, systematic biases were apparent, and for five species the 95% CIs for estimated bias in densities did not overlap zero if the statistical offsets did not incorporate δ; thus analytically dealing with these biases will be important for data integration. This may be especially important since the biases may differ between acoustic recorder types and brands (Rempel et al. 2013, Yip et al. 2017) and/or may change with equipment wear (Turgeon et al. 2017). We demonstrated that correcting ARU data for differential detectability can effectively remove the majority of these biases. Because our experimental design and statistical analysis resulted in similar species availability, we suggest that the key source of bias in the counts derives from differences in detection radius between human observers and ARUs.
Our results suggest that the relatively simple approach of pairing human observers with ARUs allows data to be successfully corrected for systematic biases between counts. Repeated random subsampling of our data suggested that δ estimates were relatively robust to sampling variation and were correlated with empirical estimates calculated from withheld data from independent study sites. In addition, applying offsets incorporating δ reduced bias for almost every species examined. Furthermore, we found little support for interactions between survey type and habitat type and survey type and environmental noise effects on δ. The lack of support for the interaction models suggests that biases between ARUs and human observers are apparently relatively consistent (but see below). Therefore, paired sampling can generally be used to derive corrections that can be readily obtained using relatively common Poisson (GLM or GLMM) regression models simply by including survey type as a factor available in most modern statistical software.
In addition to our approach facilitating integration of ARU data with human point counts, it has the added benefit that avian density estimates can be derived from single ARUs as long as the human observers include distance estimation in their survey protocol. Alternative methods exist to derive densities from ARUs, typically employing acoustic localization from arrays of synchronized ARUs (Dawson and Efford 2009, Campbell and Francis 2012, Mennill et al. 2012). Acoustic arrays require a greater financial investment in ARUs because multiple ARUs are required for an array and each is more expensive owing to additional hardware (GPS). For example, as of the time of writing, Wildlife Acoustics Inc. (http://www.wildlifeacoustics.com/store#song-meter-sm3) charges US$1049 for a single SM3 ARU plus an additional US$299 for the accompanying GPS module. Although acoustic localization is rapidly evolving, it can be logistically and computationally difficult. Thus, our approach provides a logistically feasible and affordable alternative to more complicated designs. Future paired comparisons between ARU and/or human point counts placed within acoustic arrays or traditional spot mapping grids (Bart and Earnst 2002) would further improve certainty in point count density estimation and would naturally fit with the analytical approach we describe here.
An alternative method to correcting biases between ARUs and human point counts would be through playback experiments. Yip et al. (2017) conducted playback experiments in which species calls were played along transects at various distances away from human observers and ARUs. An experimental approach has the advantage of the calls coming from known distances, however it also requires the experimenter to make assumptions about the amplitude at which birds sing/call because how loud wild birds sing is variable and generally unknown (Brackenbury 1979). In addition, the effects of directionality on song amplitude in wild birds are not well described but are known to impact detection probability from experimental playbacks (Alldredge et al. 2007a,c). Therefore experimentally replicating the impact of bird orientation relative to point count location complicates the approach. In contrast, with respect to density estimation our approach assumes that observers accurately estimated distance, which can be inaccurate (Alldredge et al. 2007c). We suggest that both paired comparisons and experimental playbacks could be used in a complementary fashion to estimate correction factors between human point counts and ARUs. Further, we suggest that paired sampling is a pragmatic approach to obtain statistical offsets for the majority of species and does not require assumptions about amplitude and directionality of songs. Where sample sizes become limiting because of species rarity, the experimental approach of Yip et al. (2017) would allow estimates to be obtained for species for which δ cannot be estimated because of a lack of detections.
Similar to our results, Yip et al. (2017) generally found estimates of δ that were less than 1. Unlike our results however, Yip et al. (2017) found evidence for habitat related variation in δ. Presumably detection for both human observers and ARUs were similarly affected by habitat and environmental noise in our experiment and thus our density estimates may be biased low; however, systematic differences between survey types were apparently corrected. Given the apparent difference between our results and those of Yip et al. (2017), future analyses under a broader set of habitat conditions and with a broader range of species may provide evidence suggesting the need for stratification to improve the corrections we have employed here. For example, we did not fit observer or habitat specific distance models that would presumably improve precision because there can be substantial interobserver variation in distance estimation (Nadeau and Conway 2012). Despite not having estimated observer or habitat specific offsets, our validation still suggests a substantial reduction in bias despite randomly sampling among habitats and observers. Greater effort should be put into replicating our design with more combinations of species, habitats, and environmental conditions to facilitate estimating how much annual effort should be placed on paired sampling because the added time for transcription is an added cost of our method.
Our results and external validation provide evidence that data from both human observers and ARUs can be placed on a similar footing. We therefore recommend monitoring and research programs begin further integration of ARUs and human observed point counts to take advantage of the relative merits of both methods. Not only would this improve sample sizes, but would also allow researchers to gain a better understanding of factors influencing detection probability owing to the ease of obtaining repeated samples with programmable ARUs. Although further sampling could provide refinements to our estimates, we have shown that our approach reduces bias related to survey type. Our method therefore provides an easily implemented method that facilitates the integration of ARU data with human observer point counts to allow expanded monitoring efforts and will facilitate meta-analyses with historic point count data to examine factors influencing avian populations (Cumming et al. 2010, Sólymos et al. 2013).
This work was supported by operating grants to SVW from the Wildlife and Habitat Assessment Section (PR) of the Canadian Wildlife Service - Environment and Climate Change Canada. We thank E. Cumming, B. Obermayer, and C. Chutter for field assistance on this project. This publication is a contribution of the Boreal Avian Modelling (BAM) Project, an international research collaboration on the ecology, management, and conservation of boreal birds. We acknowledge BAM’s members, avian and biophysical Data Partners, and funding agencies (including Environment and Climate Change Canada and the U.S. Fish & Wildlife Service), listed in full at http://www.borealbirds.ca/index.php/acknowledgements. We would like to thank K. Hobson, B. Klingbeil, and two anonymous reviewers for comments that helped us to improve the manuscript.
Alldredge, M. W., K. H. Pollock, T. R. Simons, J. A. Collazo, and S. A. Shriner. 2007b. Time-of-detection method for estimating abundance from point-count surveys. Auk 124:653-664. http://dx.doi.org/10.1642/0004-8038(2007)124[653:TMFEAF]2.0.CO;2
Alldredge, M. W., T. R. Simons, and K. H. Pollock. 2007c. A field evaluation of distance measurement error in auditory avian point count surveys. Journal of Wildlife Management 71(8):2759-2766. http://dx.doi.org/10.2193/2006-161
Alldredge, M. W., T. R. Simons, K. H. Pollock, and K. Pacifici. 2007a. A field evaluation of the time-of-detection method to estimate population size and density for aural avian point counts. Avian Conservation and Ecology 2(2):13. https://doi.org/10.5751/ACE-00205-020213
Bart, J., and S. Earnst. 2002. Double sampling to estimate density and population trends in birds. Auk 119:36-45. http://dx.doi.org/10.1642/0004-8038(2002)119[0036:DSTEDA]2.0.CO;2
Blumstein, D. T., D. J. Mennill, P. Clemins, L. Girod, K. Yao, G. Patricelli, J. L. Deppe, A. H. Krakauer, C. Clark, K. A. Cortopassi, S. F. Hanser, B. McCowan, A. M. Ali, and A. N. Kirschel. 2011. Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus. Journal of Applied Ecology 48:758-767. http://dx.doi.org/10.1111/j.1365-2664.2011.01993.x
Brackenbury, J. H. 1979. Power capabilities of the avian sound-producing system. Journal of Experimental Biology 78(1):163-166.
Buckland, S. T., D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers, and L. Thomas. 2001. Introduction of distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.
Burnham, K. P., and D. R. Anderson. 2002. Model selection and inference: a practical information-theoretic approach. Springer-Verlag, New York, New York, USA. http://dx.doi.org/10.1007/978-1-4757-2917-7
Campbell, M., and C. M. Francis. 2012. Using microphone arrays to examine effects of observers on birds during point count surveys. Journal of Field Ornithology 83:391-402. http://dx.doi.org/10.1111/j.1557-9263.2012.00389.x
Celis-Murillo, A., J. L. Deppe, and M. F. Allen. 2009. Using soundscape recordings to estimate bird species abundance, richness, and composition. Journal of Field Ornithology 80:64-78. http://dx.doi.org/10.1111/j.1557-9263.2009.00206.x
Cumming, S. G., K. L. Lefevre, E. Bayne, T. Fontaine, F. K. A. Schmiegelow, and S. J. Song. 2010. Toward conservation of Canada’s boreal forest avifauna: design and application of ecological models at continental extents. Avian Conservation and Ecology 5(2):8. http://dx.doi.org/10.5751/ACE-00406-050208
Dawson, D. K., and M. G. Efford. 2009. Bird population density estimated from acoustic signals. Journal of Applied Ecology 46:1201-1209. http://dx.doi.org/10.1111/j.1365-2664.2009.01731.x
Downes, C. M., M.-A. R. Hudson, A. C. Smith, and C. M. Francis. 2016. The Breeding Bird Survey at 50: scientists and birders working together for bird conservation. Avian Conservation and Ecology 11(1):8. http://dx.doi.org/10.5751/ACE-00855-110108
Farnsworth, G. L., K. H. Pollock, J. D. Nichols, T. R. Simons, J. E. Hines, and J. R. Sauer. 2002. A removal model for estimating detection probabilities from point-counts surveys. Auk 119:414-425. http://dx.doi.org/10.1642/0004-8038(2002)119[0414:ARMFED]2.0.CO;2
Francis, C. M., P. J. Blancher, and R. D. Phoenix. 2009. Bird monitoring programs in Ontario: What have we got and what do we need? Forestry Chronicle 85:202-217. http://dx.doi.org/10.5558/tfc85202-2
Goyette, J. L., R. W. Howe, A. T. Wolf, and W. D. Robinson. 2011. Detecting tropical nocturnal birds using automated audio recordings. Journal of Field Ornithology 82:279-287. http://dx.doi.org/10.1111/j.1557-9263.2011.00331.x
Haselmayer, J., and J. S. Quinn. 2000. A comparison of point counts and sound recording as bird survey methods in Amazonian southeast Peru. Condor 102:887-893. http://dx.doi.org/10.1650/0010-5422(2000)102[0887:ACOPCA]2.0.CO;2
Hobson, K. A., R. S. Rempel, H. Greenwood, B. Turnbull, and S. L. Van Wilgenburg. 2002. Acoustic surveys of birds using electronic recordings: new potential from an omnidirectional microphone system. Wildlife Society Bulletin 30:709-720.
Hutto, R. L., and R. J. Stutzman. 2009. Humans versus autonomous recording units: a comparison of point-count results. Journal of Field Ornithology 80:387-398. http://dx.doi.org/10.1111/j.1557-9263.2009.00245.x
Klingbeil, B. T., and M. R. Willig. 2015. Bird biodiversity assessments in temperate forest: the value of point count versus acoustic monitoring protocols. PeerJ 3:e973. https://doi.org/10.7717/peerj.973
Latifovic, R., I. Olthof, D. Pouliot, and J. Beaubien. 2008. Land cover map of Canada 2005 at 250m spatial resolution. Natural Resources Canada/ESS/Canada Centre for Remote Sensing, Ottawa, Ontario, Canada. [online] URL: ftp://ftp.ccrs.nrcan.gc.ca/ad/NLCCLandCover/LandcoverCanada2005_250m/
Link, W. A., J. R. Sauer, and D. K. Niven. 2006. A hierarchical model for regional analysis of population change using Christmas Bird Count data, with application to the American Black Duck. Condor 108:13-24. http://dx.doi.org/10.1650/0010-5422(2006)108[0013:AHMFRA]2.0.CO;2
Machtans, C. S., K. J. Kardynal, and P. A. Smith. 2014. How well do regional or national Breeding Bird Survey data predict songbird population trends at an intact boreal site? Avian Conservation and Ecology 9(1):5. http://dx.doi.org/10.5751/ACE-00649-090105
Matsuoka, S. M., E. M. Bayne, P. Sólymos, P. C. Fontaine, S. G. Cumming, F. K. A. Schmiegelow, and S. J. Song. 2012. Using binomial distance-sampling models to estimate the effective detection radius of point-count surveys across boreal Canada. Auk 129:268-282. http://dx.doi.org/10.1525/auk.2012.11190
Matsuoka, S. M., C. L. Mahon, C. M. Handel, P. Sólymos, E. M. Bayne, P. C. Fontaine, and C. J. Ralph. 2014. Reviving common standards in point count surveys for broad inference across studies. Condor 116:599-608. http://dx.doi.org/10.1650/CONDOR-14-108.1
Mennill, D. J., M. Battiston, D. R. Wilson, J. R. Foote, and S. M. Doucet. 2012. Field test of an affordable, portable, wireless microphone array for spatial monitoring of animal ecology and behaviour. Methods in Ecology and Evolution 3:704-712. http://dx.doi.org/10.1111/j.2041-210X.2012.00209.x
Nadeau, C. P., and C. J. Conway. 2012. Field evaluation of distance-estimation error during wetland-dependent bird surveys. Wildlife Research 39:311-320. http://dx.doi.org/10.1071/WR11161
Nichols, J. D., L. Thomas, and P. B. Conn. 2009. Inferences about landbird abundance from count data: recent advances and future directions. Pages 201-236 in D. L Thomson, E. G. Cooch, M. J. Conroy, editors. Modeling demographic processes in marked populations. Environmental and Ecological Statistics series. Springer, New York, New York, USA. http://dx.doi.org/10.1007/978-0-387-78151-8_9
Rempel, R. S., C. M. Francis, J. N. Robinson, and M. Campbell. 2013. Comparison of audio recording system performance for detecting and monitoring songbirds. Journal of Field Ornithology 84:86-97. http://dx.doi.org/10.1111/jofo.12008
Sauer, J. R., J. E. Fallon, and R. Johnson. 2003. Use of North American Breeding Bird Survey data to estimate population change for bird conservation regions. Journal of Wildlife Management 67:372-389. http://dx.doi.org/10.2307/3802778
Sauer, J. R., J. E. Hines, J. E. Fallon, K. L. Pardieck, D. J. Ziolkowski Jr., and W. A. Link. 2014. The North American Breeding Bird Survey, results and analysis 1966 - 2013. Version 01.30.2015. USGS Patuxent Wildlife Research Center, Laurel, Maryland, USA.
Sidie-Slettedahl, A. M., K. C. Jensen, R. R. Johnson, T. W. Arnold, J. E. Austin, and J. D. Stafford. 2015. Evaluation of autonomous recording units for detecting 3 species of secretive marsh birds. Wildlife Society Bulletin 39:626-634. http://dx.doi.org/10.1002/wsb.569
Sólymos, P., S. M. Matsuoka, E. M. Bayne, S. R. Lele, P. Fontiane, S. G. Cumming, D. Stralberg, F. K. A. Schmiegelow, and S. J. Song. 2013. Calibrating indices of avian density from non-standardized survey data: making the most of a messy situation. Methods in Ecology and Evolution 4:1047-1058. http://dx.doi.org/10.1111/2041-210X.12106
Sólymos, P., M. Moreno, and S. R. Lele. 2016. detect: Analyzing wildlife data with detection error. R package version 0.4-0. The R Project for Statistical Computing, Vienna, Austria. [online] URL: https://CRAN.R-project.org/package=detect
Turgeon, P. J., S. L. Van Wilgenburg, and K. L. Drake. 2017. Microphone variability and degradation: implications for monitoring programs employing autonomous recording units. Avian Conservation and Ecology 12(1):9. http://dx.doi.org/10.5751/ace-00958-120109
Venier, L. A., S. B. Holmes, G. W. Holborn, K. A. Mcilwrick, and G. Brown. 2012. Evaluation of an automated recording device for monitoring forest birds. Wildlife Society Bulletin 36:30-39. http://dx.doi.org/10.1002/wsb.88
Yip, D. A., L. Leston, E. M. Bayne, P. Sólymos, and A. Grover. 2017. Experimentally derived detection distances from audio recordings and human observers enable integrated analysis of point count data. Avian Conservation and Ecology 12(1):11. http://dx.doi.org/10.5751/ace-00997-120111
Zwart, M. C., A. Baker, P. J. K. McGowan, and M. J. Whittingham. 2014. The use of automated bioacoustic recorders to replace human wildlife surveys: an example using nightjars. PLoS ONE 9(7):e102770. http://dx.doi.org/10.1371/journal.pone.0102770