In the still of the night: revisiting Eastern Whip-poor-will surveys with passive acoustic monitoring

. Recent advances in passive acoustic monitoring warrant the review of survey protocols because passive acoustic monitoring can increase sampling effort with minimal additional cost. In particular, protocols for nocturnal species should be re-evaluated because automated processing with signal recognition is expected to perform well for these species and surveys conducted by human observers are often limited by safety concerns. We revisited the best survey practices for the Eastern Whip-poor-will ( Antrostomus vociferus ), a nocturnal species of conservation concern. Whip-poor-will surveys are typically limited to nighttime, but also to times of high lunar illumination because their calling rate is associated with moonlight levels. We used automated recognition to extract Whip-poor-will detections from a dataset of autonomous recording unit (ARU) recordings from sites with known Eastern Whip-poor-will occupancy in Ontario, Canada. Temperature and time relative to sunset had particularly strong quadratic effects on detectability, with detectability maximized at 13 °C and 4 hours after sunset. Moon altitude and day of year had positive effects on detectability, while wind speed had negative effects on detectability. We found constraining surveys by optimal values of those detectability covariates was worthwhile only up until 10 recordings, at which point the cumulative probability of detecting an Eastern Whip-poor-will at each site was equal between constrained and unconstrained nocturnal recordings. The number of recordings required to reach an asymptote for detectability was between 81 and 97, depending on recording length. We provide objective-specific recommendations for Eastern Whip-poor-will surveys and suggest unconstrained passive acoustic monitoring as the preferred survey method for many objectives. Given the rise of passive acoustic monitoring, survey practices for many species should be revisited because the increases in sampling effort provided by ARUs can improve cumulative detection probability and potentially outweigh the advantages of limiting surveys to times and dates of optimal detectability. sur la détectabilité, la détectabilité maximale se situant à 13 °C et à 4 heures après le coucher du soleil. L'altitude de la lune et le jour de l'année ont eu des effets positifs sur la détectabilité, tandis que la vitesse du vent a eu des effets négatifs sur celle-ci. Nous avons constaté que le fait de restreindre les relevés aux valeurs optimales de ces covariables de détectabilité ne valait la peine que jusqu'à 10 enregistrements, après quoi la probabilité cumulée de détecter un engoulevent à chaque site était égale entre les enregistrements nocturnes restreints ou non restreints. Le nombre d'enregistrements requis pour atteindre une asymptote de détectabilité se situait entre 81 et 97, selon la longueur de l'enregistrement. Nous offrons des recommandations spécifiques aux objectifs pour les inventaires d'Engoulevent bois-pourri et proposons que le suivi acoustique passif sans restriction soit la méthode d'inventaire privilégiée pour de nombreux objectifs. Étant donné l'essor des suivis acoustiques passifs, les pratiques d'inventaire pour de nombreuses espèces devraient être réévaluées, car l'augmentation de l'effort d'échantillonnage fournie par les EA peut améliorer la probabilité de détection cumulative et l'emporter potentiellement sur les avantages de limiter les inventaires aux heures et aux dates de détectabilité optimale.


INTRODUCTION
Successful management of avian species includes regular evaluation of survey best practices in an adaptive management framework (Yoccoz et al. 2001, Nichols and Williams 2006, Lindenmayer and Likens 2009. Part of this process of revisiting survey protocols should include quantifying and understanding probability of detection (hereafter "detectability"; Kéry and Schmidt 2008). Understanding the conditions where detectability is maximized can result in improved precision of statistical estimates such as population trend (Diefenbach et al. 2007, Sauer et al. 2017), occupancy, or density (MacKenzie et al. 2002, Rosenberg et al. 2017. Imperfect detectability on surveys can also have further consequences for species, such as exclusion from biodiversity reporting (North American Bird Conservation Initiative Canada 2019), environmental impact assessment, range delineation, or other assessments.
Nightjars (Family Caprimulgidae) are nocturnal birds that require specialized protocols to maximize detectability. Obviously, constraining observations to darkness is important for detecting nightjars and has been shown to be critical for detecting population declines relative to dawn surveys (Knight et al. 2021). Surveying during optimal lunar conditions is another wellestablished survey suggestion for nightjars. Vocal activity of Common Poorwills (Phalaenoptilus nuttallii; Brauner 1953, Woods andBrigham 2008), Chuck-will's-widows (Antrostomus carolinensis; Harper 1938, Cooper 1981, Eastern Whip-poorwills (Antrostomus vociferus;Cooper 1981, Mills 1986, Wilson and Watts 2006, Red-necked Nightjars (Caprimulgus ruficollis; Reino et al. 2015), Common Pauraque (Nyctidromus albicollis), and Little Nightjar (Setopagis parvula; Pérez-Granados et al. 2022) have all been positively associated with moonlight levels, including lunar phase or percent lunar illumination and lunar altitude. Recommendations have been made for surveys to be conducted around the full moon (Reino et al. 2015), and specifically when there is at least 50% lunar illumination (Wilson and Watts 2006). Citizen science monitoring protocols for nightjars, including the recently formalized Canadian Nightjar Survey, now recommend or require surveying within one week of the full moon . Lunar phase or illumination has also been included as a detectability covariate in occupancy studies (Farrell et al. 2017, Vala et al. 2020. Avian survey methods are changing because of recent advances in passive acoustic monitoring, and so it follows that best survey practices should be revisited. Autonomous recordings units (ARUs) facilitate surveying sites many more times than human observers can at little to no extra effort or cost Bayne 2017, Gibb et al. 2018). ARUs are useful for surveying nightjars because frequent nocturnal surveys in remote areas can be both logistically challenging and dangerous. Automated computer recognition of acoustic signals allows efficient processing of large amounts of acoustic recordings. Unlike the diurnal period, the nocturnal soundscape is much quieter, reducing the likelihood of sound masking, i.e., overlap of sounds in time and frequency band. Reduced sound masking during the nocturnal period makes it much more straightforward for a computer to separate nightjar calls from the rest of the soundscape (Zwart et al. 2014, Knight et al. 2017, Pérez-Granados and Schuchmann 2020. Once detected, their frequent, simple vocalizations are relatively easy for computer algorithms to classify. Automated recognition facilitates the collection of large, highly detailed survey datasets with a time stamp for every vocalization. Other parameters such as relative sound level can also be derived concurrently and can be used to estimate the distance of detection for every vocalization . Although automated recognition does require visual or aural validation to remove false positive detections from the results, it can be approximately five times more efficient than aural processing alone (Knight et al. 2017), and there are postprocessing approaches available that can further increase this efficiency (Balantic andDonovan 2020, Knight et al. 2020).
Given these improvements in survey efficiency, constraining nightjar surveys by lunar conditions or other covariates (hereafter "constrained survey") may not always be the most effective survey recommendation. If increasing survey effort is of relatively minimal additional cost-as is the case with passive acoustic monitoring-conducting surveys that are not restricted to specific times or conditions (hereafter "unconstrained survey") may have a higher cumulative probability of detection because they allow for the collection of more information. In fact, an early assessment of ARU and recognizer technology for the Eastern Whip-poorwill concluded that passive acoustic monitoring and automated recognition could enable monitoring on "more dates than possible using established field protocols" (Clark and Fristrup 2009:8). Maximizing survey effort and, thus, potentially cumulative probability of detection is particularly important for objectives that require confirmation of species presence or absence such as environmental assessment or range delineation.
Our goal was to revisit best survey practices for the Eastern Whippoor-will for both ARU and human surveys. The Eastern Whippoor-will is a nightjar species that has a Near Threatened global conservation status (Cink et al. 2020), is listed as Threatened under Canada's Species at Risk Act, and is on the U.S. Fish and Wildlife Service's (USFWS) list of Birds of Conservation Concern. Developing standardized survey protocols has been identified as a high-priority knowledge gap for the Eastern-Whippoor-will (Environment and Climate Change Canada 2018). We were interested in informing ARU protocols because passive acoustic monitoring has been identified as a potential survey method for this species given its extensive range and nocturnal habits (Environment and Climate Change Canada 2018). We were also interested in informing human surveys, including the recently formalized citizen science Canadian Nightjar Survey. We used automated recognition to extract Whip-poor-will detections from a high temporal resolution dataset of ARU recordings from sites in southern Ontario with known Eastern Whip-poor-will occupancy. First, we used an occupancy analysis framework to determine which temporal and weather covariates best predicted Whip-poor-will detectability. We then examined the effects of sampling effort (in this case, recording length and number of recordings) on detectability and occupancy estimates and cumulative probability of detection. Next, we examined whether constraining surveys by detectability covariates, including percent lunar illumination, or using all available recordings resulted in higher cumulative probability of detection. We compared constrained and unconstrained surveys with mixed effects logistic regression across a range of recording lengths and number of recordings. We then used a nonlinear least squares growth model to determine the sample size at which cumulative probability of detection asymptoted to inform environmental assessment. Based on our results, we provide recommendations for Eastern Whippoor-will survey protocols in an objective-based framework by differentiating between objectives that require confirmation of presence-absence and objectives that can accommodate imperfect detection.

Study area and sites
We studied detectability of Eastern Whip-poor-wills near the center of their breeding range in eastern Ontario, Canada (Fig.  1). Our study sites included two areas: (i) the transition zone between the Canadian Shield and Mixedwood Plains ecozones (Crins et al. 2009), and (ii) the southern portion of Prince Edward County. We selected these areas as part of a (i) grassland bird breeding phenology study in 2017, and (ii) a systematic survey of breeding birds on the Prince Edward Point National Wildlife Area (PEP hereafter) in 2019. Although the surveys were focused on diurnal passerine birds, we also included crepuscular and nocturnal sampling in those areas given the high potential suitability of the habitat for Eastern Whip-poor-will. The vegetation community in both study areas consisted mostly of open shrubland and thicket, dominated by red cedar (Juniperus virginiana), Red-osier dogwood (Cornus stolonifera), and prickly ash (Zanthoxylum americanum), surrounded by mature hardwood forest. At those two study areas, we selected 32 study sites with known Eastern Whip-poor-will occupancy. At the grassland area, initial suitability of 18 sites was evaluated using Google Earth imagery, where sites containing open grassland with < 50% forest canopy closure within a 100 m radius of each selected site were considered further. Six of the initial 18 grassland sites were removed after ground-truthing because of unsuitable habitat (n = 4) and excessive vehicular noise (n = 2), resulting in 12 locations for ARU deployment. At PEP, we divided the area into a grid of hexagons with 250 m between centroids and randomly selected hexagons with at least 750 m between centroids to maintain spatial independence of survey locations, resulting in 20 locations for ARU deployment.

Audio recording collection
All recordings were made using ARUs (Model SM2+, Wildlife Acoustics, Maynard, MA, USA). We used a sampling rate of 16 kHz, or double the maximum frequency of a typical Whip-poorwill song (Cink et al. 2020), and a bit depth of 16 bits. We used the same model of ARU between years to ensure a comparable signal-to-noise ratio (e.g., Darras et al. 2020), and conducted routine microphone testing and replacement between years to ensure microphone sensitivity and potential detection radius was not compromised (Turgeon et al. 2017). We affixed ARUs to vertical tree trunks or fence posts at a height of approximately 1.5 m and removed branches, leaves or other debris near microphones that might impede clear recordings. In 2017, we collected recordings every third day at 30, 90, and 150 minutes after sunset from 16 April to 30 July, resulting in 108 10-minute recordings per site. In 2019, we collected recordings every second day at every hour on the hour from sunset to sunrise from 17 May to 16 July, resulting in 279 five-minute recordings per site. Premature battery failure of some ARUs resulted in fewer recordings at some sites, particularly in 2017 (Appendix 1).

Recognizer construction
We used Song Scope software (Wildlife Acoustics, Maynard, MA, USA) to construct a recognizer for the Eastern Whip-poor-will. Song Scope is a signal detection recognizer that extracts Mel Frequency Cepstral Coefficients from each detected signal and computes the overall score using Hidden Markov Models. Although Song Scope was recently discontinued by its manufacturer, we chose it because it remains freely available and was shown to perform well for another nightjar species, the Common Nighthawk (Chordeiles minor; Knight et al. 2017). We selected 80 clips of full Eastern Whip-poor-will songs (Fig. 2) from a dataset of high-quality recordings collected across southern Ontario from 2015 to 2020 using autonomous recording units (Model SM2+, Wildlife Acoustics, Maynard, MA, USA) and a shotgun microphone system (Nagra SD digital audio recorder, Sennheiser ME66 shotgun microphone). We also explored building recognizers with just the second note of the song phrase (Fig. 2) and a longer clip of multiple song phrases, but found the single full phrase performed best in preliminary evaluation. We used only loud clips recorded at close-range to maximize the probability of detection at 0 m and facilitate largescale use of the recognizer (Knight and Bayne 2019, Knight et al. 2020). We removed any clips that were not fully detected by the signal detection process in Song Scope (setting available in Appendix 2), leaving 60 clips for recognizer training. The recognizer file we constructed for Song Scope software is available in Appendix 3.

Recognizer evaluation
We followed Knight et al. (2017) to evaluate the precision and recall of our recognizer and select a score threshold for processing. We randomly selected a test dataset of five-minute nighttime recordings where we knew Eastern Whip-poor-wills were present, visually scanned them for calls, and categorized them as present or absent. We continued this process until we had 20 presence and 20 absence recordings. We visually and/or aurally processed the presence recordings and counted the number of Eastern Whippoor-will calls in each recording. We then used our trained Song Scope recognizer to scan those 40 selected recordings with a score threshold of 20 and a quality threshold of 20. We visually validated the recognizer results to separate true and false positives and found the recognizer detected Eastern Whip-poor-wills in 18 of the 20 (90%) presence recordings. Of the 3525 Eastern Whippoor-will calls we detected in the test dataset by sight/sound, 1626 (46%) were detected by the recognizer. We then compared the number of detections in the recognizer results to the number of detections in the benchmark data across all possible score thresholds to determine precision (proportion of true positives vs false positives) and recall (proportion of true positives vs false negatives; Knight et al. 2017;Fig. 3). We also determined the recall of Eastern Whip-poor-will presence per recording across all possible score thresholds. Our results indicated the call-level recall of the recognizer was low, however, so we also examined the relationship between recall and sound energy, which is a proxy for detection distance (Hedley et al. 2020, to ensure this low recall was simply due to a lower effective detection radius than that of a human observer (Appendix 4).

Acoustic data processing
We scanned our entire set of recordings with our Song Scope recognizer with a score threshold of 60 and a quality threshold of 0. We selected a score threshold of 60 to balance false positives and false negatives at the recording level. Our recognizer evaluation suggested a score threshold of 60 resulted in detected of 30.9% of individual Eastern Whip-poor-will calls but 85% of five-minute recordings, as compared with a human listener (Fig.  3). We visually validated the recognizer results to separate true and false positives.  3. Evaluation of Eastern Whip-poor-will (Antrostomus vociferus) call detection for a recognizer built in Song Scope software across multiple score thresholds. Precision is the proportion of recognizer hits that are true detections. Recall is the proportion of target species vocalizations detected by the recognizer. Presence-absence recall is the proportion of fiveminute recordings containing Eastern Whip-poor-wills in which the recognizer detected the target species. Results are from a test dataset of 40 nocturnal recordings, 20 of which contained Eastern Whip-poor-will vocalizations. Dashed line represents the score threshold selected for acoustic data processing.

Detectability covariates
We collected temporal, solar, lunar, and weather detectability covariates for each recording at each study site. We used the "suncalc" package (Thieurmel and Elmarhraoui 2019) to calculate time since sunset and moon fraction for every recording. We used the "weathercan" package (LaZerte and Albers 2018) to retrieve weather variables from nearby weather stations. Of the available weather stations in the study area, we selected the nearest one that had the most complete hourly dataset available (Appendix 5). For each recording, we retrieved hourly wind speed and temperature data and daily total precipitation data. We also quantified potential weather effects on perceptibility via sound masking or degradation. We used the "hardRain" package (Metcalf et al. 2020) to quantify the signal-to-noise ratio (StN) and power spectrum density (PSD) between 4.4 and 5.6 kHz for each one-minute interval of recording. We chose that signal band because it has been shown to be effective at classifying heavy rainfall (Metcalf et al. 2020). We then checked all potential covariates for variance inflation and correlation and removed any variables with VIF > 5 or correlation > 0.7.

Statistical analysis
First, we used an occupancy modeling framework in the "unmarked" package (Fiske and Chandler 2011) in R version 4.0.3 (R Core Team 2020) to determine which covariates predicted Eastern Whip-poor-will detectability. We randomly sampled 50 recordings from each of the 32 study sites to even out sampling across the study sites (Appendix 1). We fit two sets of occupancy models: one with all potential solar, lunar, and temporal covariates, and one with weather covariates. We included time relative to sunset and temperature as second order polynomials based on predicted relationships for these two covariates. We also Table 1. Occupancy model selection results for detectability (P) of Eastern Whip-poor-wills (Antrostomus vociferus) from acoustic recordings. Results are the mean and standard deviation (SD) of 100 bootstrapped models for each covariate set (1) temporal, lunar, and solar (day = day of year, altitude = lunar altitude, sunset = time relative to sunset); (2) weather (PSD = power spectrum density, StN = signal to noise ratio, temperature = temperature in degrees Celsius, wind = wind speed, rain = total daily precipitation). Bold indicates the model with the highest mean model weight that was thus selected for subsequent analyses. Occupancy ~1 for all models. included an interaction between moon altitude and fraction. For each set, we fit one global model and all potential combinations of the covariates in that model. We bootstrapped this visit selection and model fitting process 100 times. From each set, we selected the model with the highest model weight across the 100 bootstraps. We then combined all the covariates from the two best fitting models into a final model. We again randomly sampled 50 5-minute visits and fit them to this final model. We bootstrapped visit sampling and model fitting 100 times. We summarized the 100 bootstraps to obtain mean and standard error estimates for the coefficients in our final model.
We then examined the effects of sampling effort on Eastern Whippoor-will detectability and occupancy estimates. To examine the effects of sampling effort, we constructed occupancy models for a range of recording lengths (1 to 5 minutes) and number of recordings (1 to 30,40,50,60,70,80,90,100). For each combination of these two sampling effort parameters, we randomly sampled the appropriate number of recordings from each study site and the appropriate recording length from each of those recordings, starting at the beginning of the recording.
We fit an occupancy model to the validated recognizer data from those recordings, including the detectability covariates from the final model of our previous analysis. We bootstrapped this sampling and model fitting process 100 times and calculated the mean occupancy and detectability estimates and 95% confidence intervals (CI) across those bootstraps.
Next, we examined the effects of constraining surveys by detectability covariates. In other words, how is sampling effort affected by using only recordings from times and days when the probability of detection is high, for example when surveys are conducted by a human observer? We restricted recordings to those collected between sunset and 7 hours after sunset, when the moon altitude was greater than 0.6 radians, when the temperature was between 7 and 20 degrees Celsius, and when the wind speed was less than 19 km/h. These thresholds were selected following recommendations from the Canadian Nightjar Survey and the results from the previous analysis (Table 1, Fig. 4). We repeated the previous analysis, with the exception that we only selected up to 10 recordings per study site because that was the minimum available after constraining our dataset by our detectability covariates.
We compared the cumulative probability of detection of constrained and unconstrained surveys using logistic regression. We used a binomial response variable of whether an Eastern Whip-poor-will was detected at each site for each bootstrap and included site as a random effect. We included length of recording, number of recordings, and covariate approach (constrained/ unconstrained) as covariates and all two-way interactions. We only used bootstraps with up to 10 recordings to allow direct comparison of the two approaches.
We also determined the proportion of the 32 sites at which a Whippoor-will was detected per bootstrap and used an asymptotic nonlinear least squares growth model (Von Bertalanffy 1957) to model the proportion of sites with detections across number of recordings for each of the recording lengths (1 to 5 minutes). We determined the number of recordings required to reach the predicted asymptote of proportion of sites with detections as the x-intercept of 99% of the asymptote. We repeated this analysis for recording length across number of visits (every 10 visits from 10 to 100).

RESULTS
The recognizer reported 115,954 potential Eastern Whip-poorwill detections, of which 52,704 were validated as true positives and 63,250 were removed as false positives (precision = 0.45). Of those false positives, 56,663 (89.6%) detections occurred during the dawn chorus just before sunrise. The number of Eastern Whippoor-will detections per minute of recording ranged from 0 to 336 (mean = 4.76, SD = 25.51). The mean rate of Eastern Whippoor-will detections per site varied from 0.02/min to 30.54/min (mean = 5.03/min, SD = 8.08/min).
The model with highest mean model weight across bootstraps for solar, lunar, and temporal covariates included everything except moon fraction; it included day of year, time relative to sunset, and moon altitude (Table 1). The model with the highest mean model weight across bootstraps for weather covariates included everything except total daily rain; it included temperature, wind speed, StN, and PSD (Table 1). Moon altitude had a positive effect on Eastern Whip-poor-will detectability, while day of year, wind speed, StN, and PSD all had negative effects, with wind speed having the strongest effect (Fig. 4). Temperature and time relative Fig. 4. Mean and 95% confidence interval of effects of temporal, lunar, solar, weather, and recording covariates on the detectability of Eastern Whip-poor-wills (Antrostomus vociferus), as predicted by occupancy models. Predictions for each covariate are made while holding all other covariates at values that maximize detectability (i.e., maximum for moon altitude, mean for time since sunset and temperature, and minimum for day of year, wind speed, power spectrum density, and signal-to-noise ratio).
to sunset had quadratic effects on detectability, with the highest probability of detection at 13 °C and 4 hours after sunset, respectively.
The mean probability of detection varied between 0.059 for an unconstrained survey of 100 1-minute recordings and 0.325 for a constrained survey of 10 5-minute recordings (Fig. 5). The mean probability of detection per recording across all combinations of recording length and number of recordings was 0.103 when all recordings were included and 0.268 when recordings constrained by detectability covariates were included. Recording length affected mean probability of detection, with shorter recordings having a lower probability of detection, particularly for 1-minute recordings when surveys were constrained by detectability covariates (Fig. 5). Mean probability of detection was slightly higher for unconstrained surveys with few recordings; however, the main effect of number of recordings was to reduce the uncertainty of the detectability estimate.
The mean probability of occupancy estimates varied between 0.445 for an unconstrained survey of 9 2-minute recordings and 0.821 for a constrained survey of five 1-minute recordings (Fig.  5). The mean probability of occupancy per recording across all combinations of recording length and number of recordings was 0.608 when all recordings were included and 0.583 when only recordings were constrained by detectability covariates. In general, the occupancy estimates were higher for longer recordings and increasing numbers of recordings; however, the one-minute constrained recordings had the highest occupancy estimates, likely because the model corrected for the particularly low detectability of this survey effort combination. For both constrained and unconstrained surveys, mean probability of occupancy peaked between 1 and 5 surveys, and then stabilized; however, all differences were well within the mean 95% confidence intervals of the bootstrapped model estimates.

Fig. 5. Effects of sampling effort (recording length and number of recordings) for autonomous recording unit surveys of Eastern Whip-poor-wills (Antrostomus vociferus).
Probability of detection and probability of occupancy estimates are mean predictions and 95% confidence intervals from occupancy models of 100 bootstrapped datasets. Two model sets were examined: one in which all nocturnal recordings in the dataset were included ("unconstrained"), and one for which recordings were constrained by covariates that had been shown to affect detectability ("constrained"). The dotted line on the unconstrained model set indicates the maximum number of recordings for the constrained model set and is included to facilitate comparison between the two plots with differing xaxis scales.
The cumulative probability of detection ranged between 0.7% for one unconstrained 1-minute recording to 30.7% for 10 constrained 5-minute recordings (Fig. 6). The cumulative probability of detection for constrained surveys was more than twice as high as that of unconstrained surveys when only one recording was sampled (4.6% vs 1.9% for 5-minute recordings); however, as sample size increased, this difference decreased until they were nearly equal for 10 recordings (28.7% vs 27.9% for 5minute recordings). Fig. 6. Effects of sampling effort (recording length and number of recordings) for autonomous recording unit surveys of Eastern Whip-poor-wills (Antrostomus vociferus). Cumulative probability of detection was defined as whether an Eastern Whip-poor-will was detected within a set of randomly selected recordings. Mean and 95% confidence intervals of cumulative probability of detection were calculated from 100 bootstraps for each of the recording length and number of recording combinations. Two model sets were examined: one in which all nocturnal recordings in the dataset were included ("unconstrained"), and one for which recordings were constrained by covariates that had been shown to affect detectability ("constrained"). Shaded intervals represent the 95% confidence interval and are shown only for the unconstrained model set for visualization (both model sets had nearly identical confidence intervals).
For number of recordings, the proportion of sites at which an Eastern Whip-poor-will was detected reached an asymptote between 0.679 and 0.754, depending on recording length ( Figure  7). No asymptote was reached for one-minute recordings. The minimum sample size required to reach 99% of that asymptote was lowest for longer recordings; minimum sample size was 81, 84, 88, and 97 recordings for five-minute through two-minute recordings. For recording length, the predicted values for up to five minutes recording length did not reach the asymptote predicted by the nonlinear least squares growth model, but the maximum proportion of sites at which an Eastern Whip-poorwill was detected was between 0.480 and 0.837 for five-minute recordings. The proportion of sites at which an Eastern Whippoor-will was detected did not reach one because there were several sites at which an Eastern Whip-poor-will was detected in only a few recordings (1 recording: 3 sites, 2 recordings: 1 site, 3 recordings: 3 sites).

Fig. 7. Effects of sampling effort (recording length and number of recordings) for autonomous recording unit surveys of Eastern Whip-poor-wills (Antrostomus vociferus).
Cumulative probability of detection was defined as whether an Eastern Whip-poor-will was detected in any of the randomly selected recordings in each of 100 bootstraps for each of the recording length and number of recording combinations. Lines are the mean model predictions from an asymmetric nonlinear least squares growth model. The sample size required to reach the predicted asymptote was calculated as the x-intercept of 99% of the asymptote.

DISCUSSION
We used time series ARU survey data from sites with known Eastern Whip-poor-will occupancy to revisit survey protocols for this species. Temperature and time relative to sunset had particularly strong effects on detectability, with detectability maximized at intermediate values of both. Moon altitude had positive effects on detectability, while day of year, wind speed, and two acoustic measurements of potential sound masking or attenuation from weather (StN: signal-to-noise ratio; PSD: power spectrum density) negatively affected detectability. We found constraining surveys by optimal values of those detectability covariates was worthwhile only up until 10 recordings, at which point the cumulative probability of detecting an Eastern Whippoor-will at each site was equal between constrained and unconstrained nocturnal recordings. The number of recordings required to reach an asymptote for proportion of sites with detections was between 53 and 65 recordings, depending on recording length; however, we did not find an asymptote for recording length.
The design and evaluation of survey methods should be performed in an objective-specific framework (O'Connor et al. 2000, Yoccoz et al. 2001, Nichols and Williams 2006, and so we recommend survey protocols should be selected based on the project objectives and logistical constraints (Table 2). Choosing between constrained versus unconstrained and human observer versus ARU surveys should therefore be done on a case-by-case basis because there are trade-offs for each choice. In general, human surveys have higher detectability because of a larger effective detection radius (Yip et al. 2017a, Darras et al. 2018), but we showed here that passive acoustic monitoring has higher overall detectability because it is more efficient for collecting time series data (among many other benefits; Rempel et al. 2013, Shonfield and Bayne 2017, Darras et al. 2019. Passive acoustic monitoring does, however, require specialized equipment and training, which can be unrealistic for survey programs that cover large study areas. On one end of the objectives spectrum, surveys requiring confirmation of species presence or absence should use protocols that maximize cumulative probability of detection (e.g., Pérez-Granados et al. 2018; Table 2). Undue harm to Eastern Whippoor-wills is of particular concern to environmental managers because this species is listed as Threatened under Canada's Species at Risk Act, and nest disturbance is prohibited under Canada's Migratory Bird Conservation Act. Our results suggested at least 80 recordings of at least 5 minutes length each are required to reach an asymptote in cumulative probability of detection (Table  2). We note, however, that our asymptote was not at a cumulative probability of detection of 1.00, but rather at 0.75 because there were several sites at which an Eastern Whip-poor-will was only detected a handful of times. These detections could be of prospecting individuals; however, the date when most the detections occurred (June) is beyond the territory settlement phase of the breeding period in this region. Instead, we suggest these are individuals at the extreme periphery of their territory, which is typically 2-10 ha (Fitch 1958), or that these males are non-territorial "floaters" that were not breeding (Hunt 2016).
There is a possibility that samples longer than the 5-minute recordings we used could increase the proportion of sites at which an Eastern Whip-poor-will was detected; we did not find an asymptote of cumulative probability of detection for recording length. Guidance is mixed on whether fewer long recordings (Rempel et al. 2013, Sugai et al. 2020 or more short recordings (Cook and Hartley 2018) result in higher detection probability.
Our results suggest that recordings should be at minimum two minutes long for occupancy modeling because our occupancy estimate for 1-minute recordings was higher than our other estimates, suggesting unstable parameterization.
On the other end of the spectrum, surveys for which the intended objective can accommodate imperfect detection can have much more flexible protocols (Table 2). Although maximizing cumulative probability of detection is desirable because model estimates have higher precision when detectability is maximized (MacKenzie et al. 2002, Diefenbach et al. 2007, there are ways to account for imperfect detection: hierarchical models in occupancy modeling (MacKenzie et al. 2002), detectability covariates in population trend estimation, e.g., observer ability (Link and Sauer 2016), and offsets or correction factors in density estimation (Buckland et al. 1995, Sólymos and Lele 2016, Sólymos et al. 2018, to name a few. If using passive acoustic monitoring, we suggest Eastern Whippoor-will surveys should move beyond the previous reliance on moon phase and use an unconstrained protocol for most applications. We showed that cumulative probability of detection is maximized by conducting many unconstrained visits. In other words, using all survey data and accounting for imperfect detection with the appropriate covariates results in higher overall detectability than only surveying when conditions are optimal. If using this unconstrained approach, we recommend quantifying poor recording quality with measurements of signal-to-noise ratio and power spectrum density (Metcalf et al. 2020), as we showed both had significant effects on Eastern Whip-poor-will detectability.
Using passive acoustic monitoring and an unconstrained approach to survey Eastern Whip-poor-wills and other nocturnal species should be feasible for most applications if equipment is available. ARUs are a valuable tool for surveying nocturnal species like nightjars (Frommolt and Tauchert 2014, Shonfield et al. 2018, Duchac et al. 2020 because nocturnal human surveys for nightjars are often restricted to roadsides for safety considerations (Takats et al. 2001, which can result in a biased understanding of habitat relationships, occupancy, and population size (Pankratz et al. 2017, Yip et al. 2017b. The time required to process ARU recordings can be an obstacle for passive acoustic monitoring; however, we showed here automated recognition was an effective method for our study. Although our recognizer precision was not particularly high (0.45; compared to > 70% in Knight et al. 2017, Pérez-Granados and Schuchmann 2020, for other nightjar species), we did not find validation an onerous process and automated processing was much more efficient than manual review for processing our dataset of over 650 hours of audio recordings. Furthermore, the majority (89.6%) of our false positives occurred during dawn chorus, which we sampled to ensure we captured the full range of Eastern Whippoor-will availability for detection; practitioners using ARUs to monitor Eastern Whip-poor-wills could greatly increase the precision of the recognizer by omitting dawn chorus sampling. Our recognizer recall rate was low for individual calls (0.31; compared to 0.74, 0.85 in Pérez-Granados and Schuchmann 2020, for other nightjar species); however, the recall rate at the fiveminute recording level was quite high (0.85), likely due to the high call rate of the Eastern Whip-poor-will . The recording-level recall rate was the evaluation metric of interest for our study because we used detection/non-detection at the recording level as our response variable, and so we are confident in our results. Studies that wish to use call rate or time-to-first detection as a response variable for applications like density estimation should explore more cutting-edge algorithms like convolutional neural networks that might yield higher recall values (Stowell et al. 2019).
On the other hand, surveys by human observers may be advantageous if there is a large survey area to cover and/or repeat visits are not feasible (Klingbeil and Willig 2015). For example, range-wide population monitoring programs like the North American Breeding Bird Survey and the Canadian Nightjar Survey rely on single visit surveys by citizen scientists to monitor avian population trends (Downes et al. 2016, Hudson et al. 2017. Nightjar detectability during human observer surveys may also be improved by using call playbacks (e.g., Zuberogoitia et al. 2020). If surveys are conducted by human observers, the observation process should be constrained by detectability covariates because more than 10 repeat visits are unlikely for human observer protocols (Table 2). We showed temperature and wind have the strongest effects on Eastern Whippoor-will detectability, with detections maximized at intermediate values.
Other authors have found effects of weather (wind, rain) on Eastern Whip-poor-will detectability (Farrell et al. 2017, Vala et al. 2020. Cool weather is likely particularly important for nightjar activity levels because many species, including Eastern Whip-poor-wills, are able to undergo partial torpor (Lane et al. 2004). Red-necked Nightjar (Caprimulgus ruficollis) detectability has also been shown to be particularly sensitive to temperature (Camacho 2013). We therefore recommend that the Canadian Nightjar Survey be constrained to between 13 and 20 °C. We also showed time relative to sunset had an effect on Eastern Whippoor-will detectability, with detectability maximized at 4 hours after sunset; however, the confidence intervals on the effects of time relative to sunset were quite wide. We therefore recommend surveys be conducted anytime when the sun is less than 6 degrees below the horizon, such as during nautical or astronomical twilight, or night. Finally, we showed that moon altitude had a positive effect on Eastern Whip-poor-will detectability, which is contrary to other studies that have found an effect of moon illumination (Cooper 1981, Mills 1986, Wilson and Watts 2006. The origin of using moon illumination instead of moon altitude to constrain nightjar surveys dates back to Mills (1986) who actually found a stronger effect of altitude but opted to recommend moon illumination because all moons with high illumination also have high altitude. We also recommend surveys continue to be constrained by moon illumination because it is a much easier constraint to interpret and implement, which is important for the success of monitoring programs based around citizen science (Parsons et al. 2011, McKinley et al. 2017).
Revisiting survey protocols for acoustic species is timely because of the rapid proliferation of passive acoustic monitoring and our results about the context-dependency of survey recommendations suggest revisitation may also be warranted. Regular evaluation of survey protocols relative to desired outcomes, particularly for long-term monitoring programs, is also important because it encourages managers to revisit the objectives and design of the program as part of an adaptive management framework (Yoccoz et al. 2001, Nichols and Williams 2006, Lindenmayer and Likens 2009. We showed that constraining the observation process by detectability covariates may no longer be an optimal survey method because of the large volume of information that can be collected if constraints are not applied, as suggested previously by others. We suggest that revisiting survey protocols for other nocturnal species may be warranted because automated recognition is likely to work well for those species, and there are several examples already published (Zwart et al. 2014, Knight et al. 2017, Shonfield et al. 2018). The method we present here of using ARU data from sites of known occupancy to understand detectability trade-offs and refine survey protocols can be applied to other taxa. We suggest this approach could enhance wildlife management for other species because improved survey protocols can provide more efficient use of resources and more precise statistical estimates that inform management decisions.