Factors associated with automated detection of Northern Spotted Owl ( Strix occidentalis caurina ) four-note location calls

,


INTRODUCTION
Development and evaluation of automated identification methods can improve monitoring of rare species whereas advancing understanding about animal communities and nontarget species (Lesmeister et al. 2018, Stowell andSueur 2020). Technological advances can address this challenge while creating a reproducible archive of data and analytical procedures. Passive acoustic monitoring is a rapidly developing field that is revolutionizing monitoring of vocal species (Potamitis et al. 2014, Pankratz et al. 2017, Shonfield et al. 2018, Stowell et al. 2019. Advances in recording technology support collection of vast amounts of data with minimal disturbance to sensitive species (Suter et al. 2017). However, given the massive quantities of audio collected in full-scale passive acoustic studies, software (in the general sense of computer algorithms) will be required to identify target species. True fully automated systems are not yet available and manual scanning of recordings is impractical across large time and spatial scales (Swiston and Mennill 2009, Wimmer et al. 2010, Dema et al. 2017. Analysis of long-term field recordings rely on semi-automated approaches because current recognition software is not sufficient for practical use with long field recordings (Kalan et al. 2015, Priyadarshani et al. 2018, Dema et al. 2020. Diversity in sampling design and statistical modeling approaches creates challenges associated with differences in design and modeling assumptions. For example, occupancy modeling assumptions of population closure and independence among sites are easily violated for data derived from passive detectors without careful consideration of the monitoring design Dawson 2012, Devarajan et al. 2020). As a result, estimates of occupancy may be inaccurate (Efford and Dawson 2009, Neilson et al. 2018, Sollmann 2018. To use passive approaches for estimating occupancy with minimal bias, the probability of detecting an animal must be known for all distances from the detector (Gibb et al. 2019, Sugai et al. 2020. Direct measurement of the detection distance and area, sometimes referred to as detection space, is needed because of the complex nature of sound transmission through the environment (Darras et al. 2016, Ostashev et al. 2018. Sound transmission through forests is affected by source characteristics, meteorological conditions, atmospheric adsorption, terrain type, and vegetation cover Richards 1982, Naguib andWiley 2001). Failure to account for factors influencing detection distance can bias estimates of the state process and lead to misinterpretations of activity patterns (MacLaren et al. 2018, Payne et al. 2010. Misinterpretations may, in turn, engender management decision errors that can jeopardize threatened and endangered species such as Northern Spotted Owls (NSO, Strix occidentalis caurina; Lesmeister et al. 2018).
Tests of automated software recognizer performance have been limited to determining levels of confidence in species detection (Chambert et al. 2018, Knight et al. 2017, Balantic and Donovan 2019. Effective detection distance has been calculated for human observers (Alldredge et al. 2007, Matsuoka et al. 2012) and for human observers listening to autonomous unit recordings (Yip et al. 2017b, Hingston et al. 2018, Stiffler et al. 2018), but direct measurement of detection distance relying on automated identification of the call of interest has received little attention (Knight and Bayne 2018). In these cases, estimates of detection distance might be derived directly from recordings if placement of microphones is quantified (Marques et al. 2013, K. Darras, B. Kolbrek, A. Knorr, and V. Meyer, unpublished manuscript). Compared to a human observer, automated detection of autonomous recording unit (ARU) recordings may have a smaller detection radius resulting in different occupancy results if the area sampled is not correctly defined (Knight et al. 2017).
In the Pacific Northwest of the United States, landowners have surveyed NSO with call playback methods for more than three decades. Transitioning NSO surveys from traditional methods to ARU technology requires an understanding of detection distances to assess recordings accurately with and without detections (Duchac et al. 2020, Sugai et al. 2020. State and federal agencies have developed standardized survey protocols to minimize the impact of management activities on NSO. These protocols depend on the territorial nature of NSO by playing calls to elicit territorial responses. Survey station placement is guided by the assumption a human can nominally hear a NSO up to 400 m. Although this assumption has not been evaluated empirically, it has been inferred from Forsman et al. (1984). Despite recognition that calculating effective sampling area is needed and the increasing effort toward passive acoustic monitoring, automated detector performance as a function of distance is not well understood.
Here, we investigated detection distance and coverage area with automated detectors for an intensely monitored species, NSO. Our objective was to examine effect of playback distance on the ability of three algorithms to identify NSO four-note location calls at two sites in the Oregon Coast Range. We compared analysis results from two commercially available packages (Song Scope and Kaleidoscope, Wildlife Acoustics, Maynard, ME) and one publicly available deep convolutional neural network (CNN) from the U.S. Forest Service (Ruff et al. 2019). We present detection distance results in terms of site occupancy, coverage area, and analysis method (software).

Data collection
We conducted our study in Douglas County, Oregon (Fig. 1) on the Coos Bay Operating Area owned by Weyerhaeuser. We installed single Wildlife Acoustics SM2+ ARUs from 2014-2019 near five previously known NSO nest locations. We chose two of these sites for distance testing to represent the range of vegetative conditions across NSO sites on the Coos Bay Operating Area (Table 1). Lower Cat Creek (LCC) was in a structurally complex 220-year-old stand comprised of Douglas-fir (Pseudotsuga menziesii), western hemlock (Tsuga heterophylla) and western red cedar (Thuja plicata), and a hardwood understory of vine maple (Acer circinatum), bigleaf maple (Acer macrophyllum), Pacific madrone (Arbutus menziesii) and Pacific rhododendron (Rhododendron macrophyllum). Yew Ridge (YEW) was in an open 100-year-old stand of smaller more numerous Douglas-fir and hemlock and lacked the complex, multi-canopy structure of LCC. Although LCC occurred in a stand with much larger trees than YEW, hardwoods shrubs often limited visibility to less than 10 m. The shrub layer was sparse at YEW with visibility occasionally > 100 m (Fig. 1).
To increase the sampled area, we connected two Wildlife Acoustics SMX-II omnidirectional microphones to each SM2+ ARU. We placed microphones at the end of 100 m of cable in opposite directions, North and South at LCC and East and West at YEW. Because of topographic variation, we placed microphones 162 m and 169 m apart at LCC and YEW, respectively, with the SM2+ ARU located between them. To maximize the omni-directional capabilities of the SMX-II microphones, we affixed the microphones to small diameter trees 1.5 m above the ground. We placed tear resistant mesh screens around the microphones to prevent damage from rodents and birds. In early 2018, we added a third microphone at YEW west elevated (west elev) 10 m directly above and facing the same direction as the West microphone. On a quarterly basis, we used an Amprobe SM-CAL1 Sound Meter (Everett, Washington) to verify that the microphones were within factory specifications.  Bowles et al. 2002). We refer to the set of calls as a trial, and all trials performed on a day as a test. For each trial, we played the calls at 1.5 m height using the same FoxPro ZR2 Electronic Game Caller (FoxPro, Inc., Lewistown, Pennsylvania). The theoretical communication limit of a NSO call broadcast in a forest under these conditions is between 300-600 m (Fig. 2).  Dailey and Redman (1975) with ranges from 330-800 m. The 98 dB at 1 m curve is from Bowles et al. (2002). The 91 dB at 1 m curve is from the first author (personal observation). The horizontal black line is minimum background noise level from all distance testing trials. The horizontal gray line is the theoretical discrimination level based on a signal to noise ratio -9 dB below the minimum background noise level (Lengagne and Slater 2002 . Trials prior to 2018 generally consisted of a single recording with 5 calls being played first toward and then 5 calls being played away from the microphone. Trials in 2018 and 2019 consisted of 10 calls played in each direction and separate recordings for each direction. The number of calls played during each trial only varied from these numbers when a trial was repeated. We repeated trials if the person triggering the SM2+ stopped a recording too soon or if external noise, such as truck traffic or aircraft, had the potential to introduce additional background noise. Given the topography and conditions, it was possible to hear vehicles for several minutes prior to the traffic sounds exceeding background levels and to complete the trial without excessive noise. However, even if a trial was repeated, we processed all recordings and included them in our analysis, including those calls with increased background noise or when a recording stopped prematurely. Each call lasted 4 seconds with 3 seconds between calls. The SM2+ recorded stereo WAVE files with a sampling frequency of 16,000 Hz. The microphone had a 48 dB pre-amp with no gain, high, or low pass filtering.
No owls of any species were heard during testing. Various corvid species were heard commonly and appear in the recordings. A single NSO was seen at LCC during one test but was never heard vocalizing. The owl flew within visual distance of the observers but did not stop. We suspended testing for a period we deemed sufficient to minimize negative effects of repeated calling on the NSO.

Data processing
Song Scope and Kaleidoscope allow detection of individual calls and, thus, calculation of how probability of detection varies with distance. Both Wildlife Acoustics packages identify calls or events of interest that must be validated by listening to or looking at the event. Events identified by the recognizer were labelled as NSO or left blank. We used the simple clustering algorithm in Kaleidoscope. As with Song Scope, events were labelled as NSO or left blank. Song Scope and Kaleidoscope settings are in Appendix 1. Recall was calculated as the number of calls detected divided by the number of calls broadcast. Precision was calculated as the number of calls detected divided by the sum of calls detected and false positives produced by the software.
The CNN method was developed specifically to streamline the task of identifying owl vocalizations in the Pacific Northwest and has the potential to minimize time spent validating detections (Ruff et al. 2019). This method, implemented in Python (Python version 3.7, Python Software Foundation), converts an audio file into a spectrogram and then applies an image classification system based on labelled training data sets (Salamon and Bello 2017) to produce a probability that the clip contains a call of interest. The probability exceeding an arbitrary chosen score (threshold) allows transformation of the probability scores into occupancy scores, which can then be evaluated for variation by distance. The CNN method used this way and coupled with our study design is fully automated. Fully automated implementation of this method in this manner was not envisioned by its developers.
The CNN method does not directly identify events, but rather generates a probability of detection within an audio clip. We modified the CNN process described in Ruff et al. (2019) to apply a 12-second window moving in 1-second increments (12-second window moving in 12-second increments in the original version). We chose 1-second increments because the 12-second window does not neatly fit our recoding lengths, and it gave us the ability to create a more restrictive threshold of six consecutive windows exceeding the chosen score. The CNN process returns a prediction score between 0 and 1 for six owl species and noise. The seven prediction scores for each 12 second window sum to 1.
The CNN method looks at 12-second windows that may contain 1 complete call, 2 complete calls or 1 call and parts of 2 other calls, because our calls were 4 seconds in length and were played approximately 3 seconds apart. Because of this, it is likely that each 12-second moving window contains at least a portion of 1 call and could have parts of up to 3 calls, making obtaining and interpreting the number of calls identified difficult for this data set.

Statistical analysis
Summarizing the Song Scope and Kaleidoscope results to the microphone, trial point, and software level resulted in 190 total data points (five microphones (two at LCC, three at YEW) × 19 trial points × 2 software). We calculated slope distance from each trial point to each microphone to account for the direct line distance travelled by the sound waves between the test point and the microphone. Between 99 and 161 total calls were played at each test point throughout the nine tests. We counted how many of these calls resulted in confirmed detections by each software from the recording taken on each microphone. Using a generalized linear model with a quasibinomial distribution and logit link, we modeled the number of detected calls (successes) and the number of non-detected calls (failures). This is modeling recall as defined in Priyadarshani et al. (2018). Covariates used in the model were microphone, software, and slope distance. The microphone covariate is used to uniquely identify locations (north and south microphones occur at LCC; east, west, and west elev microphones occur at YEW) and was included to adjust for sitespecific differences seen in the data. The quasibinomial distribution allowed for an overdispersion parameter to be fit, because we saw more variation in our data than would be expected under a simple binomial model (Dunn and Smyth 2018). With the model output, we estimated the probability of detecting a call based on the microphone, distance the call is played from, and software used to analyse the recording (Table A2.1). We also compared the probability of detecting a call based on whether the broadcast direction of the playback was toward or away from the microphone. To estimate this probability, we summarized our data to the microphone, test point, software, and broadcast direction level, resulting in 370 total data points (after dropping the pre-2018 trials where the toward and away calls could not be separated because they existed in a single recording). We again used a generalized linear model with a quasibinomial distribution and logit link to model the probability that a call was detected by broadcast direction, software, distance from the microphone, and microphone. In addition to modeling the probability of successfully detecting calls, we summarized our data to an occupancy level for each unique recording for each software, with a value of 1 if at least one call in the recording was detected by Kaleidoscope or Song Scope respectively and a value of 0 if calls were not detected in the recording. Except when noted, we used R for data processing, summarizing, modeling, and graphical displays (R Core Team 2019).
The CNN data processing was slightly different because the CNN system was not established to detect individual calls as broadcast in our trials. Our broadcast calls were ~4 seconds in length and were played in relatively quick succession (~3 seconds between calls), meaning that multiple calls could have occurred within a 12-second moving window. Thus, any single score from the CNN processing represents a probability of detecting at least 1 call within that 12-second window. Given this distinction, we summarized the CNN data to an occupancy level, with a simple 0/1 score for each unique recording using several threshold values. We examined four different threshold levels to allow examination of changing areal coverage as the threshold changed. We considered a recording to have occupancy if any single CNN score surpassed a threshold of 0.9 or 0.95. We also considered two rolling thresholds where occupancy was recorded only if a recording had at least six consecutive scores of 0.9 or greater or 0.95 or greater. We did not validate occupancy determination for the CNN method. These thresholds represent the most restrictive levels used in Ruff et al. (2019). We compared the performance of Song Scope, Kaleidoscope and these four different CNN score thresholds for their ability to correctly identify occupancy at a recording level.

RESULTS
We collected 1300 unique recordings across two sites, five microphones, 19 trial points, and nine tests per site. Most recordings (1273)  Although the modeled ability of the software programs to detect calls from increasing distances followed a smooth decay, individual microphone results varied with the direction of the trial point to the microphone. For example, the north microphone at LCC had three trial points that were approximately 100 m from the microphone (100.7 m, 101.6 m, and 105.4 m). The naive detection rates were 0.7, 0.3, and 0.5 for these three trial points, respectively (Fig. 3). At YEW, observed Kaleidoscope detection rates at the east microphone were constant for all trial points to the east whereas the trial points at distances along the south line show greater detection rates than the corresponding distances along the north line (Fig. 4, top). The LCC north microphone detected a higher proportion of calls from the west 200 m point than the 50 m, 100 m and 150 m west points (Fig. 4, bottom).
Model results showed some evidence for a difference in the ability of Song Scope and Kaleidoscope to detect NSO calls when broadcast toward versus away from the microphone (mean effect estimate 0.249; 95% CI: 0.008-0.490). Unsurprisingly, detection was greater for calls played facing the microphone. Excluding calls played directly next to the microphone (distance of 0 m and assigned to the "toward" group), Kaleidoscope and Song Scope correctly identified occupancy in 46% and 48% of tests played toward the microphone, respectively, and 40% and 43% of calls played away from the microphone, respectively. Increase in detection probability because of broadcast direction was an order of magnitude smaller than the effect of slope distances over the range of distances we tested and similar in magnitude the location effect (Appendix 3).  Figure A2.2) to emphasize differences in detection probabilities for three trial points (all 100 m from the microphone), Douglas County, Oregon. Calls from the east 50 m point are more than two and six times as likely to be detected than calls from the north 100 m and east 100 m points, respectively. Small dots are all other trial points.
In a specific recording, the CNN method correctly determined occupancy more often than either Kaleidoscope or Song Scope. Occupancy is the main quantity of interest at an operational level for surveys associated with management around threatened or endangered species. We compared relative percentages of nondetection/detection across four CNN threshold options to the Kaleidoscope or Song Scope simple detections (Table 3). For all threshold options, few cases existed where Kaleidoscope or Song Scope achieved a detection that was not also detected by our CNN threshold. The largest discrepancy occurred when comparing Song Scope to the rolling 0.95 threshold (which is our most stringent CNN threshold). This result appeared in less than 1% of cases where Song Scope detected at least one call in a recording and the CNN score did not have any spikes exceeding the threshold. The reversed comparison showed that the CNN method correctly identified 13%-25% of recordings that either Kaleidoscope or Song Scope classified as having no calls. The two Wildlife Acoustics packages were consistent in characterizing the file state, Kaleidoscope correctly identified 3% of files that Song Scope misidentified, whereas Song Scope correctly identified 5% of files Kaleidoscope misidentified. Fig. 4. Proportion of Kaleidoscope detections by distance from trial points at the Yew Ridge East (top) and Lower Cat Creek North (bottom) microphones, Douglas County, OR, USA. The total proportion of calls identified correctly varies with direction. The proportion of detections from any trial point for both microphones was dissimilar to trial points at the same distance in the other three directions. Microphones are indicated by the white cross (+) inside the dot. Trial points are shown in Figure 1.
As a function of distance, the CNN method outperformed both Kaleidoscope and Song Scope. Comparing the rolling 0.95 CNN threshold to Kaleidoscope and Song Scope, in general, the CNN had higher rates of detecting occupancy in a recording (Fig. 5). At 50 m, the rolling 0.95 CNN threshold correctly detected occupancy in 95.6% of recordings whereas Kaleidoscope and Song Scope correctly detected occupancy in 94.1% and 95.6% of recordings, respectively. At 100 m, the rolling 0.95 CNN threshold correctly detected occupancy in 86.3% of recordings, whereas Kaleidoscope and Song Scope correctly detected occupancy in 73.5% and 76.0% of recordings, respectively. At 200 m, the rolling 0.95 CNN threshold correctly detected occupancy in 65.5% of recordings, whereas Kaleidoscope and Song Scope correctly detected occupancy in 44.8% and 48.3% of recordings, respectively. Although relatively low detection rates (less than 25% for points greater than 300 m) occurred at the farthest distances with the CNN method, occupancy rates were higher at these distances with CNN than Kaleidoscope or Song Scope. At LCC, the estimated mean detection probability from the model did not reach 90%, even at a distance of 0 m, with either Song Scope or Kaleidoscope. For LCC, the maximum estimated detection probability was 0.88, which was estimated for the south microphone using Song Scope at a distance of 0 m (playback occurred adjacent to the microphone). Areal coverage for all microphones at YEW, although requiring 90% of calls to be identified, was less than 1 ha (

DISCUSSION
If automated signal detection algorithms are available to process the data efficiently and accurately, ARUs have the potential to improve detection of sensitive species in monitoring programs. Our evaluation of NSO detections by ARUs across a range of distances from calling stations and with processing by three methods yielded two important findings. First, probability of detection degraded with distance but in a manner dependent on the location of the source relative to the microphone. For an estimated detection rate of at least 50%, distances ranged from 70-150 m. Second, detection distance depended on the analysis. Although our study was limited to two sites and five microphone locations, the results raise important questions regarding areal coverage. Our results are not prescriptive, but they suggest that extended evaluation of ARUs is warranted before they are deployed broadly as replacements for human observers and protocols with well-documented and consistent detection probabilities (Olson et al. 2005, Farber and Kroll 2012, Kroll et al. 2016. Detection probability at similar distances varied with the path the sound waves travelled to reach the microphone (Fig. 4). The variable effect of a non-homogenous medium on sound propagation has been demonstrated experimentally (Yip et al. 2017a) and through simulation (Royle 2018). Some of the difference may be explained through minor microclimatic variation because these call broadcasts were often separated by 45-60 minutes and occurred across a range of weather conditions (Larom et al. 1997). We suspect that topography and canopy layers between the LCC east 50 m test point and the north microphone were interacting to scatter the sound waves to create positive interference, thereby increasing the sound pressure level (SPL; Balogh et al. 2004, Ostashev et al. 2018). The six-fold variation in detection probability shown in this example was not atypical (Fig. 3) and demonstrates the need for randomized trials to assess detection distance properly.
To detect a call in a recording, the SPL of the arriving call must be of sufficient strength to exceed the background noise level minus a discrimination threshold (Lengagne and Slater 2002). The relationship between relative SPL and distance is a complex interaction between topography and vegetation, with neither calculated nor actual sound pressure levels uniformly decaying with distance (Aylor 1977, Price et al. 1988). Non-uniform variability of detection probability with distance indicates differences in topography and vegetation between individual trial points and the microphone. A properly designed study with random placement of ARUs will average out the shadows. However, such rigor is not always possible. Demographic and project specific monitoring of NSO is transitioning to sampling based on fixed locations within a predetermined grid, which precludes random selection of ARU locations (U.S. Fish and Wildlife Service, unpublished manuscript). In such cases, a detailed inspection of the study area to identify audio "shadows" that are not readily apparent (Fig. 6) should be part of the process. Owls located in these shadows might appear to be calling less frequently, leading to incorrect assumptions of occupancy or nesting status (Duchac et al. 2020). Similarly, relying on the SPL to calculate distance from a target without reliable calibration (Yip et al. 2020) could result in biased estimates from distance sampling techniques.
Area covered by all five microphones varied, and close examination of Fig. 4 suggests that the shape also varies. Although our data suggest that the detection space was not circular, calculating coverage in this manner is useful to demonstrate the microphone and software level differences. At the microphone level, and assuming a detection rate of 50% was acceptable, circular area covered at all YEW microphones was twice that at LCC. The areal coverage differences among software were all < 2 ha. Using a more complicated geometry, such as the ellipse, might provide a more realistic descriptor of areal coverage, but remains an incomplete description. For example, the YEW east microphone elliptical area depends not only on the software, but also on the selection of the axes. Areal coverage from the YEW east microphone ranged from 5.2 ha (primary axis east/ west, secondary axis determined from north trial points using Kaleidoscope) to 11.8 acres (primary axis east/west, secondary axis determined from south trial points using Song Scope). Even more complicated geometries would be required for microphones with large detection shadows such as LCC north, where the west 200 m test point detection rate approached 50% but the west 100 m and west 150 m detections rates were less than 30%. Such geometries are conceptually possible and serve to illustrate the problem of relying on counts of vocalizations to determine density or occupancy without a detailed analysis of the detection area (Llusia et al. 2011). Studies of detection distance assume a point source emitting a uniform spherically spreading wave front, an assumption that is not likely to be met in a practical application (Marten andMarler 1977, Larom et al. 1997). Our modeling suggests that calls broadcast toward the microphone are more likely to be detected than calls broadcast away from the microphone. Calls broadcast with the FoxPro game caller in this study were certainly directional with more energy projected in the direction of the speaker. Similarly, avian anatomy produces a directional wave with more energy in the direction the bird is facing (Fletcher and Tarnopolsky 1999). The exact details of how this effect is manifested in NSO calls would require direct observation of wild birds (Patricelli et al. 2007). Studies that rely on number of calls detected to determine distance will perceive birds calling in the direction of the microphone to be closer than birds facing away from the microphone. Although we could assume random selection of call direction, the territorial nature of NSO suggests that they direct calls with a behavioural response. For example, an owl delivering prey to a nest may call facing the nest, whereas its mate will reply in the direction of the returning owl. A microphone in line with the two birds may detect more calls from the more distant birds since those calls are directed at the microphone. A study with a sufficient number of randomly placed ARUs would avoid these biases. Any interpretation of NSO site status based on the number of calls detected from fixed grid designs will need to account for the microphone locations and position relative to the NSO (Wiley and Richards 1978). Further work is needed to develop recognizers for a wide variety of NSO call types, in particular the nest and contact calls used by pairs around nests and during prey exchanges.
In addition to topographic considerations, the area of effective coverage depends on the percent of correctly identified calls (Table 4 and Appendix 5), which in turn is a function of the choice of classification threshold (Knight and Bayne 2018). Using the Wildlife Acoustics software with fixed score thresholds and accepting that only 10% of broadcast calls need to be identified correctly results in areal coverage roughly half the size of traditional surveys although noting that the traditional coverage area may be upwardly biased (Zuberogoita et al. 2011, Berigan et al. 2019. Requiring 90% of calls to be identified correctly reduces the areal coverage to less than 1 ha in all YEW cases and no coverage for both LCC cases. With the CNN method, varying the score threshold demonstrates the same effect; a lower score increases the area covered (Knight and Bayne 2018). Low detection probability is likely suitable given a sufficiently long temporal scale. Pre-deployment planning will be needed to reconcile the trade offs between time required to process recordings, areal coverage, and length of recording deployment.
The detection area of NSO by ARUs might be different than we have determined for several reasons. Owls call with variable intensity (Bowles et al. 2002); minor differences may have a significant effect because of inverse-square law decay of sound pressure level. In addition, calls broadcast higher in the canopy may carry further (Naguib andWiley 2001, Darras et al. 2016).
We also note that our microphones were placed to maximize the chance of being within range of calling NSOs if they were at their pre-2014 nests, not to maximize the detection space for all possible owl occurrences and call types. It is likely that ideal ARU placement locations exist to maximize detection range but choosing locations in such a manner violates the assumption of random placement of sampling points (Buckland et al. 2004). All trials were conducted during the day, but atmospheric and meteorological conditions are more favourable for transmission at night (Wiley and Richards 1982) when owls are more active and traditional surveys are conducted. The lower signal to noise ratio should result in increased detection distances from all methods. Finally, we used the Wildlife Acoustics SM2+ using the SMX-II microphone, which have been shown to have lower detection distances than other ARUs (Darras et al. 2020, Yip et al. 2017b).
Many studies have demonstrated that manual review of recordings is superior to any automated process (Digby et al. 2013, Wilhite et al. 2020. CNN methods in particular are sensitive to the presence of multiple overlapping species (Ruff et al. 2019).
We acknowledged this possibility and explicitly chose not to compare manual performance with the semi-automated and automated processes. Manual review of recordings at an operational scale is impractical and inefficient (Knight et al. 2017), although new approaches using long-duration false-color spectrograms (Towsey et al. 2018) and statistical post-processing methods Donovan 2020, Knight et al. 2020) may be effective in ameliorating this problem. The semi-automated Song Scope and Kaleidoscope processes make the logistics of reviewing recordings a tractable problem, condensing the 11 hours of recordings reported on in this paper down to 2.1 (Song Scope) and 1.5 (Kaleidoscope) hours of analysis per method (Appendix 1). Better time savings have been reported elsewhere (Digby et al. 2013). It is worth noting that, if the goal of operational surveying is to confirm occupancy of a species, it is unlikely that all recordings would need to be reviewed to achieve this goal because site occupancy can often be determined with a high degree of accuracy from a subset of observations (Sliwinski et al. 2016). The review time needed for CNN methods is trivial in comparison and is a huge leap forward (Ruff et al. 2019). Further reductions in validation time might be achieved by combining multiple algorithms to cross validate the results (Brooker et al. 2020).
Understanding false-positive detection rates is an important question that should be examined before these units are used in operational settings (Priyadarshani et al. 2018). Kaleidoscope and Song Scope are both semiautomated processes, with an observer labeling each detection. The simple nature of a single NSO call type and controlled field conditions used in this study resulted in estimates of precision that likely overstate the true performance of the recognizers when used in an uncontrolled situation (Knight and Bayne 2018). False positives were eliminated from the CNN method by a combination of study design and length of the window. Caution should be taken when translating these results to an operational setting. Further examination of false-positive rates for these methods should be undertaken with a study designed for that purpose. In order to implement the CNN (or any method) in a fully automated fashion, the false-positive rate needs to be near zero. Minimizing the number of CNN false positives may be possible by augmentation of the training data to focus on NSO calls and a careful analysis of thresholds.
The detection distances shown here may be applicable to a small set of other NSO and Barred Owl (Strix varia) calls such as the NSO series location call and the Barred Owl 8-note call. Although less is known about the power associated with these calls, both are similar in tone, frequency, and amplitude characteristics (Forsman et al. 1984, Odom andMennill 2010). Other NSO call types such as barks, whistles, and contact call or the calls of owl species have considerably different acoustic characteristics and, as such, the detection distances shown here may not be applicable to these call types. (Schieck 1997, Steenweg et al. 2019, Stowell et al. 2019. To calculate occupancy of NSO sites based on detection distance, our results indicate that site characteristic should be considered and appropriate caution taken when evaluating new sites. Testing of detection distance needs to consider how site characteristics vary with direction and distance to properly describe the area being sampled. The area available for deployment is likely a function of the proposed study boundaries or management area and, as such, not controllable. Microphone placement within the sites is controllable and warrants detailed pre-deployment planning. Analytical methods are controllable and are improving through the development of machine learning techniques that can rapidly and efficiently process large data (Stowell et al. 2019). Although further advances in machine learning techniques may enable detection of owls across areas similar to those used in traditional surveys, a detection radius of 300 m with the CNN method covers an area only one half as large as assumed in traditional surveys used to make management decisions. However, biases inherent in our study design and the benefit of long-term ARU deployment suggest that monitoring with ARUs has the potential to meet or to exceed the efficacy of traditional survey methods. We have identified several areas that warrant further study, including the effect of directionally broadcast calls, location of receiver relative to caller, and method of analysis.
Responses to this article can be read online at: https://www.ace-eco.org/issues/responses.php/2105 Appendix 1. Wildlife Acoustics (Maynard, Massachusetts, USA) software settings