Integrating wetland bird point count data from humans and acoustic recorders

Wetland loss is cause for concern for populations of many wetland bird species throughout North America. The North American Breeding Bird Survey, the primary resource for broad-scale avian population data, does not provide sufficient data for many marsh bird species. Targeted marsh bird monitoring programs have been implemented across the continent in an attempt to fill this gap. Despite these efforts, a number of wetland species are so elusive that they remain an analytical challenge because of small sample sizes and low detectability. Thus, there is need for tools and approaches that will increase sampling efficiency and boost geographic representation. Autonomous recording units (ARUs) have the potential to address some of these challenges, but require the ability to combine in-person survey data with ARU data for collective analysis. Our primary objective was to estimate statistical offsets, or correction factors, to account for systematic differences between in-person and ARU counts of wetland-associated bird species. We found that ARU recordings were generally equivalent to in-person point counts, with bias in a small number of species (2 of 19 for Song Meter SM2 and 1 of 16 for Song Meter SM4 Acoustic Recorders; Wildlife Acoustics Inc. ©, Maynard, MA). However, bias was removed in all of the species through use of our correction factors. Therefore, our correction factors were effective for integrating in-person and ARU point count data even for species where differences exist. We also found that commercially available SM4 recorders have larger effective detection radii than SM2 recorders. Researchers should consider the microphone sensitivity and signal-to-noise ratios of any recording unit before purchasing, and more sensitive models with lower noise should be used where possible. Our results, and particularly our correction factors, are useful for biologists combining in-person and ARU point count data to achieve larger sample sizes, higher statistical power, and ultimately better information for more effective wetland conservation. Intégration des données de comptage des oiseaux des zones humides obtenues par des enregistreurs humains et acoustiques RÉSUMÉ. La disparition des zones humides est une cause de préoccupation pour nombreuses espèces d'oiseaux des zones humides à travers l'Amérique du Nord. La North American Breeding Bird Survey, principale source de données à grande échelle concernant les populations aviaires, s'avère toutefois insuffisante dans le cas de nombreuses espèces d'oiseaux des marais. Des programmes ciblés de surveillance des oiseaux des marais ont été mis en oeuvre sur l'ensemble du continent afin de combler cette lacune. En dépit de ces efforts, un certain nombre d'espèces d'oiseaux des marais sont si insaisissables qu'elles constituent un défi en termes d'analyse en raison de la petite taille des échantillons et de leur faible détectabilité. Il est donc nécessaire de trouver des outils et des approches qui augmenteront l'efficacité de l'échantillonnage et amélioreront la représentation géographique. Les unités d'enregistrement autonomes (ARU) pourraient permettre de relever certains de ces défis. Il faudrait toutefois être en mesure de combiner les données recueillies lors d'enquêtes en personne et les données ARU aux fins d'analyse collective. Notre objectif principal était d'estimer les décalages statistiques et d'appliquer des facteurs de correction afin de tenir compte de différences systématiques entre les comptages réalisés en personne et via des ARU sur les espèces d'oiseaux des zones humides. Nous avons constaté que les enregistrements des ARU étaient généralement équivalents aux comptages effectués en personne, avec un biais pour un petit nombre d'espèces (2 sur 19 avec les enregistreurs acoustiques Song MeterTM SM2 et 1 sur 16 avec les enregistreurs Song MeterTM SM4 ; Wildlife Acoustics Inc. ©, Maynard, MA). Nos facteurs de correction ont cependant permis d'éliminer ce biais pour toutes les espèces. En conséquence, nos facteurs de correction se sont avérés efficaces pour intégrer les données de comptage en personne et via des ARU, même pour les espèces où il existe des différences. Nous avons également constaté que les enregistreurs SM4 vendus dans le commerce présentent des rayons de détection effectifs plus importants que les enregistreurs SM2. Les chercheurs devraient tenir compte de la sensibilité des microphones et des ratios signal-bruit de toute unité d'enregistrement avant de faire un achat, et utiliser dans la mesure du possible les modèles les plus sensibles et les moins bruyants. Nos résultats et en particulier, nos facteurs de correction, sont utiles pour permettre aux biologistes qui associent des comptes en personne et via des ARU d'obtenir des échantillons de plus grande taille, une puissance statistique supérieure et finalement, de meilleures informations pour une conservation plus efficace des zones humides.


INTRODUCTION
Wetlands have declined in North America, in some areas dramatically, with emergent marsh suffering particularly high losses (Tiner 1984, Snell 1987, Dalh 2006. The primary causes of wetland loss are agriculture and development, which often also remove the natural upland habitats adjacent to marshes (Tiner 1984). Because of this habitat loss there is concern for populations of marsh birds such as bitterns, rails, and grebes that depend on emergent wetlands for breeding and stopover during migration (Saunders et al. 2019). Although population trends for many of these species are known in some regions (e.g., Tozer 2016), there are limitations with monitoring certain species throughout large parts of their range. The primary challenge is that many marsh birds have low detection rates, requiring several repeat visits to a large number of survey sites to obtain sufficient data for statistical modeling (Tozer et al. 2006, 2016, Steidl et al. 2013. Marsh birds are also elusive and crepuscular or nocturnal, typically necessitating night-time visits and/or call-broadcasts to entice individuals to call and be detected by observers (Tozer et al. 2006, Conway 2011. To further complicate monitoring efforts, many wetland complexes can be difficult or dangerous to access even in daylight, e.g., traversing floating mats of marsh vegetation on foot. Because of concerns about marsh bird populations and difficulties in monitoring these species, there is a need for tools and approaches that increase spatial and temporal efficiency and allow larger sample sizes. Autonomous recording units (ARUs) have the potential to address some of these challenges by increasing detection of marsh birds and expanding the spatial and temporal capacity of monitoring programs (Shonfield and Bayne 2017). ARUs do have drawbacks, including high start-up cost, ongoing maintenance costs, potential equipment failure, and demands of data storage and management. However, many practitioners find the benefits of ARUs outweigh their disadvantages, and the use of ARUs is increasingly prevalent worldwide in the field of biodiversity monitoring (Haselmayer and Quinn 2000, Hobson et al. 2002, Steer 2010. ARUs have been shown to be effective tools for detecting marsh birds and for supplementing traditional monitoring programs (Klingbeil and Willig 2015, Znidersic et al. 2020. ARUs can be deployed using the same spatial sampling scheme as in-person point counts, but eliminate the need for a human to be present at the time of observations. They can be programmed to make recordings of any specified duration at any time of the day or night, making many "revisits" possible at every sampling location. Furthermore, they can be deployed during winter when wetlands are frozen and easily accessible by anyone who is capable of navigating to a waypoint (Shonfield and Bayne 2017). Thus, ARUs overcome many of the wetland-specific challenges of traditional in-person surveys.
Research based on paired surveys shows that ARU surveys are generally equivalent to in-person point counts for species that are primarily detected using aural cues, although ARUs sometimes detect slightly fewer individuals and species on average (Alquezar and Machado 2015, Sedláček et al. 2015, Vold et al. 2017. Recent advances demonstrate that density estimates can be obtained from recordings by following the same standardization protocol that is used to integrate in-person point counts of varying lengths (Sólymos et al. 2013, Bombaci and Pejchar 2019. This protocol accounts for imperfect detection for both in-person and ARU surveys, and applies a species-specific correction factor to data from ARU surveys. The correction factor describes the difference between the effective detection radii of human observers and ARU recordings. Given the potential of ARUs to enhance marsh bird monitoring efforts, it is important to quantify the correction factors so that bird survey data from either source can be combined for analysis. Here we apply the framework for paired point counts created by Van Wilgenburg et al. (2017) to a suite of birds inhabiting marshes and their adjacent uplands. In doing so, our primary objective is to estimate statistical offsets, or correction factors, to account for systematic differences between in-person and recording-based counts of wetland-associated bird species. As a secondary objective, we also test for differences in correction factors between two commercial recorders in widespread use, the Song Meter TM SM2 and Song Meter TM SM4 Acoustic Recorders (Wildlife Acoustics Inc. ©, Maynard, MA). As bioacoustic technology advances, the internal noise of recording units is decreasing and the microphone sensitivity is increasing (Wildlife Acoustics 2011, 2019). Therefore, we anticipate correction factors to vary among models of ARU as well as among bird species that vocalize at various frequencies. We predicted that mean counts from ARU surveys would be slightly less than mean counts from in-person surveys, and that SM2 counts would be slightly less than SM4 counts because of differences in their microphone sensitivities and respective effective detection radii. Results from this work and associated correction factors will be useful for researchers who wish to combine wetland bird survey monitoring data from inperson and ARU sources for combined analysis.

Sampling
We conducted this study in the Canadian provinces of Saskatchewan, Ontario, Prince Edward Island, New Brunswick, and Nova Scotia in 2018 and 2019. All surveys were completed during weather favorable for detecting birds (no precipitation, wind < 20 km/h) at wetlands dominated by emergent vegetation, i.e., marshes, during the marsh bird breeding season from 23 May to 29 June. In total 602 surveys were conducted by 13 observers at 480 survey sites across 94 wetlands or wetland complexes. Of these 480 sites, 301 were part of existing monitoring programs and 179 were selected opportunistically. Most sites were located on roads directly adjacent to wetlands (~90%), some were offroad at the edge of a wetland and its adjacent upland (~5%), and a small minority were in the middle of large marsh complexes (~5%). Upland vegetation included prairie, agriculture, parkland, and forest.
Each in-person survey was a passive point count, i.e., no callbroadcast was used, lasting five minutes. Marsh bird surveys typically include a call-broadcast sequence (Conway 2011) following the passive segment to evoke calls from birds that otherwise might remain undetected. However, the use of callbroadcasts would have influenced the movement and singing rate of individual birds and thus violated the assumptions of distance and time removal sampling, so we compared only results from passive surveys. In the field, we recorded observations of a group of focal species. The minute interval (i.e., minute 0, 1, 2, 3, or 4) and the distance band (0-50 m, 50-100 m, or > 100 m) was noted Observers recorded the full length of each in-person survey in the field using an SM2 recorder with a pair of SMX-II microphones, and surveyors in Saskatchewan used an SM4 in addition to the SM2. Sensitivity of SM2 microphones was tested using an Extech Model 407744 Sound Level Calibrator and those that tested below -42 dB were not used in the study (Turgeon et al. 2017). SM4 microphones were tested using the same instrument, and all tested above -35 dB. ARUs were affixed to a tripod at a height of approximately 1.3 m during recordings of surveys. ARUs were programmed to record in stereo WAV file format, with a sampling rate of 44100 Hz and factory default settings for the microphone preamplifier. Observers stood 3 to 5 m away from the ARUs to minimize recording extraneous noise.
To minimize observer bias, we transcribed ARU recordings at the end of the field season with the same observers interpreting the recordings as who had conducted the paired in-person point counts. Observers did not consult field notes, and we randomized file names so that information about date and location was not known to the observer during transcription. During transcription, we noted individuals of focal species and the minute interval of first detection. Transcription was done using software that allowed observers to watch spectrograms as they listened to recordings (e.g., Audacity®, Raven Pro (© Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Ithaca, NY). Unlike under field conditions, transcribers were allowed to pause and rewind recordings to confirm identification, as well as consult with colleagues for difficult identifications (which occurred less than 10 times during our study), as is typically done in audio transcription. Although the differences in methodology between in-person and ARU surveys did make the comparison between humans and ARUs slightly less equal, it made our correction factors more accurate for realistic data collection. A noise code was assigned to each recording on the same five-point scale as in the field and paired-surveys that had recordings with excessive noise interference were not included in our analysis. Sólymos et al. (2013) demonstrated that data from point counts of varying lengths and radii can be adjusted using time-ofdetection (time removal) and distance sampling so that they can be combined into larger datasets without creating systematic biases. The expected count at a point count location is a function of a species' availability (p) and perceptibility (q). Availability is the probability that an individual gives a cue during the survey period, given it is present, and is a function of its singing rate (φ). Time removal models can be fitted to estimate singing rates, and therefore availability (Farnsworth et al. 2002, Sólymos et al. 2013). These models fit time to first detection (t ij ) to an exponential function that is described by φ, assuming a constant singing rate:

Framework
(8) Thus p can be calculated by the cumulative density function from time t = 0 to t = t iJ : Where t iJ is the duration, in minutes, of the point count. Perceptibility (q) is the probability that an individual is detected, given it is available, and depends on its effective detection radius (τ). Distance sampling models can be fitted to estimate effective detection radii, and therefore perceptibility (Buckland et al. 2001, Matsuoka et al. 2012). These models fit distance of individual birds (r) to a half-normal distribution that is described by τ: (8) We assumed that all available birds at r = 0 were detected, that r was measured without error, and that birds were detected at their initial location.
The paired point count method of Van Wilgenburg et al. (2017) assumes that true bird abundance (N) is equal for in-person and ARU surveys. It models differences in detection by making the simplifying assumption that in-person and ARU surveys have equal availability (p H = p A ), but allows for differences in effective detection radii (τ H ≠ τ A ). The assumption that p H = p A is then explicitly tested using time removal models. We did not model for differences in p, but we noted that this was possible and may be necessary for some species. We also reasoned that there were at most only very small differences in false positive or negative misidentifications during in-person compared to ARU surveys, so we did not adjust for such differences (Rempel et al. 2019). For complete methodology, see Van Wilgenburg et al. (2017). Briefly, the expected count for a species for an in-person (Y H ) survey at any point count location was: Where N is the species' true abundance (number of birds), p is availability, and q is perceptibility. We note that N is not observed directly. This can be rewritten as: Where D is point-level density (birds/area) and A H is the area sampled by a human observer. The exact area sampled is not known but the effective area sampled can be estimated using distance sampling. The in-person effective area sampled is described by: where τ H is the in-person effective detection radius. We assumed perfect detectability (q = 1) within this distance. (8) We assumed density and availability were equal between survey methods (D A = D H and p A = p H ), and we defined a new variable: Where δ² is the squared correction factor that describes the difference between ARU and in-person detection radii. This can be estimated as above, by comparing the mean counts of ARUs and in-person surveys, but it can also be estimated by backtransforming the fixed-effect "survey type" coefficient of Poisson or negative binomial generalized linear (GLM) or generalized linear mixed (GLMM) models (i.e., δ² = exp[β]).
Statistical models of count data can be fit using statistical offsets, or estimated correction factors (Ĉ i ), that transform total counts to bird densities as individuals per unit area, taking imperfect detection into account. We defined the correction factor as: (12) For Poisson GLMs, the offset will be log-transformed (offset = log[Ĉ i ]). A mean count at any given in-person survey location (λ i, H ) can be expressed as: (12) The calculated δ² can then be incorporated into statistical offsets for ARU surveys and thus eliminate systematic biases in calculated bird densities. GLMM models can incorporate both human and ARU surveys when they are fit with these correction factors and include the δ² coefficient and I A , an indicator function taking on a value of 0 for in-person surveys and 1 for ARU surveys: (10) (12)

Data analysis
Data analysis followed Van Wilgenburg et al. (2017). Species with fewer than 14 detections on either in-person or ARU surveys were excluded from analysis. We began by estimating τ H and singing rates for each species by fitting distance and removal models using the "detect" R package (Sólymos et al. 2018) as described by Sólymos et al. (2013). Time removal models were fitted separately for ARU surveys (φ A ), in-person surveys (φ H ), and the full dataset using survey type as factor (φ). Analysis was not continued for species where the full dataset models had survey type factors whose 95% confidence intervals overlapped 0. We used the calculated τ H and φ to estimate in-person correction factors (Ĉ H ).
We estimated the correction factor (δ) and validated models by using repeated random subsampling of the data. For each species, we randomly selected data from 70% of survey stations where the species was present to develop GLMs to estimate δ. The remaining 30% of data were used to calculate δ again by dividing the mean count from ARU surveys by the mean count from in-person surveys. This was repeated 50 times and we calculated means and 95% confidence intervals for both measures of δ across all replicates. We examined the 95% confidence intervals and correlation of the GLM-based calculations of δ to the empirical ratios to validate the performance of the models.
We also used the 30% validation set to examine whether the inclusion of δ in statistical offsets reduced bias in predicted densities from in-person versus ARU surveys. We first fit a GLM with a statistical offset log(τ H ² • π • p), i.e., log(Ĉ H ), to calculate average density using in-person data. We then fit two competing models using ARU data, one with the same statistical offset and the other including δ: log([δ • τ H ]² • π • p). We calculated bias as the difference in estimated density between the in-person model and each competing ARU model. We used GLMs instead of GLMMs like Van Wilgenburg et al. (2017) because the inclusion of a random effect at the site level did not improve the models nor change our results.
We repeated the above analysis twice, once for paired in-person and SM2 surveys and once for paired in-person and SM4 surveys. Thus, estimates of δ were calculated separately for SM2 surveys and SM4 surveys.
We used the full dataset to create competing Poisson generalized linear mixed models (GLMMs) to predict species abundance using site as a random effect. We included a null model (intercept only), a model with fixed effect for survey type (three levels: inperson, SM2, and SM4), a model with noise as a fixed effect, a model with both noise and survey type as a fixed effect (survey type + noise), and a model that incorporated the interaction between noise and survey type (survey type * noise). Noise level (0-4) was treated as a linear covariate. We did not test for a year effect because we compared counts conducted at the exact same time and place; therefore, year effect is controlled for by the paired design. We used Akaike's Information Criterion (AIC) to select the most parsimonious of the five models (Akaike 1973, Beier et al. 2001). Next, we calculated δ by taking the median, 2.5% and 97.5% quantiles from 10,000 Monte Carlo simulations drawn from a multivariate normal distribution using survey type GLMM coefficents as the means and variance-covariance matrices. Last, we plotted noise as a function of wind speed (km/ h) for each of the three survey types separately, and calculated linear lines of best fit to explore how noise varied in accordance to wind speed depending on the type of survey.

RESULTS
Two focal species did not meet the criteria for minimum number of detections for SM2 paired surveys (Common Gallinule and Least Bittern) and four did not meet criteria for SM4 paired surveys (Common Gallinule, Least Bittern, Nelson's Sparrow, and Virginia Rail; Table A1.1). Effective detection radii for in-person counts ranged from 51 m to 181 m (Fig. 1). Availability across all species ranged from 0.171 to 1, and correlation in availability between in-person and ARU counts was weak but significant  The median estimate of δ across all species for SM2 surveys was 0.98 and the median estimate of δ for SM4 surveys was 1.03. Confidence intervals of δ for SM2 overlapped 1 for 13 of 19 species, and δ for SM4 overlapped 1 for 13 of 16 species (Fig. 3; Tables A1.5 and A1.6). SM4 δ estimates were greater than SM2 δ estimates for 14 of 16 species. Across species, estimates of δ using calibration data were highly correlated with estimates of δ using validation data ( Fig. 4; Pearson's r = 0.99, p < 0.001 for both SM2 and SM4). Within each species, estimates of species density using in-person data for each of the 50 repeated random subsamples were well correlated with estimates of density using ARU data, with the exception of a few species: American Coot, Pied-billed Grebe, Red-winged Blackbird, and Yellow-headed Blackbird (Fig. A1.1 (Fig. 5). Of the 10 species with negative biases for SM2 surveys, two had 95% confidence intervals that did not overlap 0 (Virginia Rail and Nelson's Sparrow). For SM4 surveys, no birds had a significant negative bias, and of eight species with positive biases, one had a 95% confidence interval that did not overlap 0 (Savannah Sparrow).

Fig. 5.
Bias in estimated densities from autonomous recording unit (ARU) surveys (top, SM2; bottom, SM4) compared to bias from the same surveys but including the correction factor δ as statistical offset. Bias was calculated as difference in predicted bird density (birds / ha) from in-person point counts to ARU point counts. In-person and uncorrected density was calculated using the QPAD approach (Sólymos et al. 2013) of applying species-specific offsets (log[τ H ² • π • p]) to raw counts of species abundance. Corrected density was calculated using the same offset but including the correction factor δ: log([δ•τ H ]² • π • φ). Biases were calculated over 50 repeated random subsamples, using 70% of data to calculate δ estimates for use in corrected offsets and using the remaining 30% to calculate estimated inperson, corrected ARU, and uncorrected ARU densities. See main text for scientific species names.
Based on AIC model selection, the noise model was most parsimonious for 11 species, the null model most parsimonious for 4 species, the survey type + noise model most parsimonious for 4 species, the survey type model most parsimonious for 1 species, and the survey type-noise interaction model received no support (Table 1). There was no effect of wind on noise for inperson surveys (β = -0.035, SE = 0.021), but there was a significant positive effect for SM2 surveys (β = 0.165, SE = 0.028) and SM4 surveys (β = 0.108, SE = 0.035).

DISCUSSION
Our results confirm the growing consensus that ARU recordings are generally equivalent to in-person point counts, which is shown by other studies (Alquezar and Machado 2015, Sedláček et al. 2015, Vold et al. 2017) and meta-analyses (Shonfield and Bayne 2017, Darras et al. 2018). Some studies show that in-person point counts detect more birds than ARU recordings (e.g., Borker et al. 2015, Sidie-Slettedahl et al. 2015, but most of these studies used older technology, e.g., SM1, with less sensitive microphones than newer models (Rempel et al. 2013), or they transcribed recordings using automated recognizers rather than manually. These studies may, therefore, reflect limitations of technology or methodology at the time of publication rather than the utility of ARUs in general. In contrast, studies comparing a typical sampling of an ARU deployment, i.e., manual transcription of multiple recordings per station, to a typical in-person point count (commonly one visit per station) generally found that ARUs perform better than in-person surveys, especially for nocturnal birds (Zwart et al. 2014, Klingbeil and Willig 2015, Bobay et al. 2018). This is not surprising, considering the greater number and broader diurnal distribution of sampling occasions possible from ARU recordings relative to single in-person counts. This is something we were able to control for with our paired human-ARU point counts, which had equal amounts of sampling time.
Our findings also show that the newer SM4s are more sensitive and have larger effective detection radii than SM2s, a finding that is consistent with previous research showing that newer recorder technology improves on older models (Rempel et al. 2013, Yip et al. 2017. Therefore, the make and model of recording unit needs to be accounted for by practitioners using ARU in monitoring programs. Density estimates produced from ARU recordings were generally equivalent to those produced by in-person surveys (Fig. A1.1), and most estimates of δ had 95% confidence intervals that overlapped 1 (Fig. 3). In-person and uncorrected ARU density estimates from repeated random subsampling were highly correlated for most species, and for many species were nearly equivalent ( Fig. A1.1). Models for most species did not include an ARU effect, and bias for most species overlapped one, further suggesting that in-person and ARU counts are equivalent for most species (Fig. 3; Table 1). However, when present, both minor and major biases were corrected by inclusion of δ, demonstrating that these correction factors will facilitate effective integration of in-person and ARU surveys.
Species with negative biases and δ values less than 1 were: for SM2, Pied-billed Grebe, Virginia Rail, American Bittern, Nelson's Sparrow, Yellow-headed Blackbird, and Red-winged Blackbird; and for SM4, Yellow-headed Blackbird and Redwinged Blackbird (Figs. 3 and 5). Both blackbird species were detected at high densities, and an accurate count without visual cues became impossible beyond a certain threshold (see Table  A1.2 and Fig. A1.1). Specifically, recordings did not estimate densities higher than 1.0 birds / ha, and we never estimated more than 10 individual blackbirds of either species from a recording. By contrast, in-person surveys detected as many as 30 Yellowheaded Blackbirds and as many as 50 Red-winged Blackbirds. The limitations of recordings for estimating high densities is also shown by Drake et al. (2016), who found that transcribers underestimated the number of Yellow Rails (Coturnicops noveboracensis) when there were more than seven individuals vocalizing on a single recording. Furthermore, while the large majority (> 97%) of detections of most secretive marsh birds are aural, detections of a few marsh birds such as Horned Grebe (Podiceps auritus) and Eared Grebe (Podiceps nigricullis) are over 90% visual (K. L. Drake, unpublished data). Thus, in-person surveys will perform much better than ARUs for estimating abundance of certain species, especially blackbirds that occur at very high densities. A call index or a "too many to count" option for transcribers could make transcription more realistic for recordings with very high numbers of a particular species. For certain species, point counts, whether in-person or ARU, may be inappropriate, and other methods such as territory mapping may be required to obtain a density index. Our correction factors take these effects into account and will reduce systematic biases. Nevertheless, depending on the goal of a study, it may be good practice to do a visual survey of wetlands when deploying or retrieving ARUs as a safeguard for detecting individuals of species that may be missed entirely on recordings.
With the exception of Nelson's Sparrow, other species with low δs for SM2s are species with low-pitched calls. Low-frequency environmental noise (wind, traffic), tends to be amplified on recordings at lower frequencies (Fig. 6), possibly masking some of the songs of these species. This is similar to Bombaci and Pejchar (2019), who also found that species with low-pitched calls had lower δs. However, Van Wilgenburg et al. (2017) found that the extremely low-pitched drumming of the Ruffed Grouse (Bonasa umbellus) was better detected by ARUs. The effect of wind was less pronounced on the SM4s than on the SM2s, and the species with low-pitched calls were generally detected just as well on SM4 recordings as by people in the field.
Species with positive biases and δ values larger than 1 were (Figs. 3 and 5): for SM2, Brown-headed Cowbird and Yellow Warbler; and for SM4, Sora, Savannah Sparrow, Yellow Warbler, Le Conte's Sparrow, Clay-colored Sparrow, and American Coot. Species with high-pitched calls (Brown-headed Cowbird, Savannah Sparrow, Le Conte's Sparrow) tend to be detected better by ARUs; perhaps these species are more difficult to hear inperson but more easily seen on a spectrogram. This effect is especially pronounced on SM4 recordings, which have more sensitive microphones, resulting in large differences in δs for those species (Figs. 3 and 7).
Across species, the mean SM2 δ was slightly less than 1 (0.98) and estimates of δ were less than 1 for 14 of 19 species, supporting our prediction that τ A would be slightly smaller than τ H . However, the mean SM4 δ was slightly greater than 1 (1.03) and estimates of δ were greater than 1 for 10 of 16 species, indicating that τ A s for SM4s are actually slightly larger than τ H s. Measures of δ for SM4 recordings were higher than those for SM2 recordings for 15 of 16 species, supporting our prediction that τ A would be larger for SM4s than for SM2s.  http://www.ace-eco.org/vol15/iss2/art9/ Fig. 7. Spectrograms of simultaneous recordings at a wetland in Saskatchewan, Canada. Above, SM4; below, SM2. Higher microphone sensitivity (darker songs) and lower internal noise (lighter "background") is evident on the SM4 spectrogram. The songs of several Savannah Sparrows (Passerculus sandwichensis) can be seen; the red box shows one as an example. The observer estimated two Savannah Sparrows in the field, four on the SM4, and three on the SM2.
There are several types of ARUs, as well as several high-quality nonautonomous recording units that are affordable to hobbyists and are useful in situations when a person is available to operate the unit during recording. Rempel et al. (2013) tested several older models of recording units and showed that different makes and models had different microphone sensitivities and signal-to-noise ratios. They also showed that both signal-to-noise ratios and microphone sensitivities varied with audio frequency and that units with lower signal-to-noise ratios detected fewer birds compared to human surveyors and more sensitive recording units. Similarly, Darras et al. (2018) found by meta-analysis of 23 published papers that signal-to-noise ratio of microphones had a positive effect on species richness. Thus, each model of recording unit with its particular specifications will need to be tested to determine the most accurate model-and species-specific correction factors.
Our estimates of availability (p) were close to 1 for most songbirds, especially for Red-winged and Yellow-headed Blackbirds, which we encountered at high densities in Saskatchewan. By contrast, availability estimates were lower and associated uncertainty was large for more elusive marsh birds, e.g., grebes, Virginia Rail, Sora, and American Coot. This is likely due to less-frequent singing in these species (Conway 2011). These species account for examples where there was somewhat weak correlation between estimates of p H and p A . Despite low correlation, p A was higher than p H for these species, even in the context of a paired point count, with the exception of American Coot and Virginia Rail (SM2). The low availability of elusive marsh birds is precisely why call-broadcast is typically used for in-person surveys, because it entices the birds to sing and therefore increases availability. However, callbroadcast affects species movement and biases population estimates (Conway andGibbs 2011, Zuberogoitia et al. 2011). The use of ARUs can overcome low availability without using call-broadcast by transcribing multiple recordings from the same station, i.e., repeat visits, to increase overall probability of detection. Algorithms can be used to estimate the number of recordings that would need to be processed in order to have a good chance of detecting a particular species (Sliwinski et al. 2016). Using the algorithms of Sliwinski et al. (2016), to have > 90% chance of detecting a species when p = 0.28 (our lowest p A estimate, for Virginia Rail) would require at least seven 5-min surveys. The transcription of this many-repeat visits would increase the cost of manual transcription, but would ultimately provide more accurate population estimates, and could be more cost-effective than whatever number of repeat in-person visits might be required (Conway and Gibbs 2011).
For this suite of elusive marsh birds, it is possible that availability will be higher in the absence of a human observer. Studies on forest bird populations have shown that a human observer has no effect on position of individual birds, probability of occurrence, or singing rate Francis 2012, Hutto andHutto 2020), and we expect similar results for many of our species. However, changes in marsh bird behavior due to the presence of an observer is an area needing more controlled research. A methodology similar to that of Campbell and Francis (2012), who used microphone arrays to test movement and vocalization of upland birds before and after the arrival of an observer, could be applied to marsh birds. This could be an important area of research for a suite of so-called "secretive" species for which callbroadcast surveys are advocated to increase availability in real time during the course of a survey. If p A is indeed different when a human is not present, this could be accounted for in any offsets in order to integrate in-person and ARU point counts. Because many elusive marsh birds have highest detectability at night (Tozer et al. , 2017, and because most in-person counts are conducted at dawn or dusk, differences in availability at different times of day will also need to be included into statistical offsets for both ARU an in-person counts. Most estimates of p H are similar to those of p A ; however, p A tended to be slightly higher and have smaller error than p H , even for elusive marsh birds with generally low p (Fig. 2). This was also found by Van Wilgenburg et al. (2017; see Fig 2, which shows more species above the 1:1 correspondence line than below) and Bombaci and Pejchar (2019; see Table 1). In time removal models, shorter times to first detection lead to higher measures of p. Therefore, a higher p is equivalent to a shorter average time to first detection. This means that observers detected most birds slightly earlier on recordings than they did in the field, likely because of the ability to pause or rewind a recording to verify identifications. This pattern even occurred to some extent with grebes, suggesting that even without visual cues observers may be able to detect them earlier on a recording. The pattern also applies to both species that failed the equal availability test: Song Sparrow and Brown-headed Cowbird. We speculate that observers in the field needed to listen to the Song Sparrow several times to eliminate similar-sounding species (e.g., Vesper Sparrow, Pooecetes gramineus). Similarly, the call of the Brown-headed Cowbird is weak, very high, and short; therefore, it may be easy to miss in the field during the first few minutes when observers are focused on the louder and more prominent marsh birds. On the other hand, this call shows up very well on a spectrogram, and with the ability to rewind observers may be less overwhelmed in a lab setting. For these two species and other species that do not have equal availability between survey types, e.g., Ruffed Grouse (Van Wilgenburg et al. 2017), p could be incorporated into statistical offsets as was done by Sólymos et al. (2013) for point counts of varying lengths and at different times of day and year.
ARUs are flexible and effective tools that offer a wide range of benefits. A recorder deployed for the full spring-summer season in temperate North America is able to collect data on species that vocalize in early spring, e.g., amphibians or owls, early summer, e.g., grebes, dawn (most songbirds), dusk, e.g., nightjars, and night, e.g., rails and bitterns (Shonfield and Bayne 2017). Autonomous recorders will address many challenges of marsh bird monitoring such as infrequent singing and reluctance of elusive species to vocalize when a human is present (Conway and Gibbs 2011). Bird song identification skills are not necessarily needed by field personnel who deploy and retrieve the recorders, thereby greatly expanding the pool of potential qualified field personnel, including volunteers for citizen science monitoring programs (Dickinson et al. 2010). With the emerging consensus that data from recorders are comparable to human observations, and the fact that methodological biases can be understood and accounted for (Sólymos et al. 2013, Yip et al. 2017, this study), we recommend that ARUs see increasing use to facilitate delivery of wetland bird surveys.
Responses to this article can be read online at: http://www.ace-eco.org/issues/responses.php/1661 In-person and uncorrected ARU estimates of species density (birds/ha). Calculations are based on GLM parameters of the 30% validation data subset within the repeated random subsampling framework. Each point represents one of 50 random subsamples. The line indicates 1:1 correspondence. Inset are histograms of GLM survey-effect estimates of δ using the 70% calibration data across all 50 subsamples. Where points tend to fall underneath the 1:1 line, δ is lower, and the larger the spread of the points the more uncertainty there is in the δ estimate. Figures are shown for each species and for both SM2 and SM4 densities. Graphs are labelled with 4-letter species codes (see Table A1).