Detectability of migrating raptors and its effect on bias and precision of trend estimates

Annual counts of migrating raptors at fixed observation points are a widespread practice, and changes in numbers counted over time, adjusted for survey effort, are commonly used as indices of trends in population size. Unmodeled year-to-year variation in detectability may introduce bias, reduce precision of trend estimates, and reduce power to detect trends. We conducted dependent double-observer surveys at the annual fall raptor migration count at Lucky Peak, Idaho, in 2009 and 2010 and applied Huggins closed-capture removal models and information-theoretic model selection to determine the relative importance of factors affecting detectability. The most parsimonious model included effects of observer team identity, distance, species, and day of the season. We then simulated 30 years of counts with heterogeneous individual detectability, a population decline (λ = 0.964), and unexplained random variation in the number of available birds. Imperfect detectability did not bias trend estimation, and increased the time required to achieve 80% power by less than 11%. Results suggested that availability is a greater source of variance in annual counts than detectability; thus, efforts to account for availability would improve the monitoring value of migration counts. According to our models, long-term trends in observer efficiency or migratory flight distance may introduce substantial bias to trend estimates. Estimating detectability with a novel count protocol like our double-observer method is just one potential means of controlling such effects. The traditional approach of modeling the effects of covariates and adjusting the index may also be effective if ancillary data is collected consistently. Détectabilité de rapaces en migration et son effet sur les biais et la précision des estimations de tendance RÉSUMÉ. Les décomptes annuels de rapaces en migration faits à partir de lieux fixes représentent un type de relevé répandu, et la variation des effectifs comptés au fil des ans, ajustée selon l'effort d'inventaire, est utilisée couramment comme indice de l'évolution de la taille des populations. Le fait de ne pas modéliser la variabilité de la détectabilité d'une année à l'autre peut introduire un biais, réduire la précision des estimations de tendance et réduire la capacité à détecter les tendances. Nous avons effectué des décomptes avec double observateurs dépendant lors du suivi annuel des rapaces en migration automnale à Lucky Peak, Idaho, en 2009 et 2010. Nous avons appliqué des modèles de Huggins avec retrait pour une population fermée et sélectionné les modèles basés sur la théorie de l'information afin d'évaluer l'importance relative des facteurs qui affectent la détectabilité. Le modèle le plus parcimonieux incluait les effets de l'identité de l'équipe d'observateurs, de la distance, de l'espèce et du jour durant la saison. Nous avons ensuite simulé 30 années de décomptes comportant une détectabilité individuelle hétérogène, une baisse de population (λ = 0,964) et une variabilité aléatoire non-expliquée du nombre d'oiseaux disponibles. La détectabilité imparfaite n'a pas biaisé l'estimation de la tendance, mais a haussé par moins de 11 % le temps nécessaire pour atteindre une puissance de 80% à détecter une tendance. Nos résultats indiquent que la disponibilité présente une plus grande source de variance que la détectabilité dans les décomptes annuels, de sorte que les efforts visant à prendre en compte la disponibilité amélioreraient la valeur des décomptes réalisés durant la migration. D'après nos modèles, les tendances sur de longues périodes de l'efficacité des observateurs ou de la distance des oiseaux en vol peuvent introduire des biais considérables sur les estimations de tendance. L'estimation de la détectabilité, au moyen d'un protocole de décompte novateur tel que notre méthode de double-observateur, est un exemple de méthodes permettant de contrôler les effets de ce type. L'approche traditionnelle dans laquelle l'effet des covariables est modélisé et l'indice ajusté en conséquence pourrait aussi être efficace si les données complémentaires sont colligées de façon constante.


INTRODUCTION
Population monitoring is essential to avian conservation (Finch andMartin 1995, Dunn 2002).The North American Breeding Bird Survey (BBS) has proven to be an effective monitoring method for many species, but trend estimates for many raptors (Accipitriformes and Falconiformes) that breed in the remote northern reaches of the continent not covered by the BBS are unreliable (Dunn et al. 2005).Breeding season surveys of North American raptors can be difficult and costly because raptors breed at low densities over large ranges (Fuller and Mosher 1981).Species that breed in forests and do not confront intruders may be easier to observe on migration (Fuller and Mosher 1987).
During migration, concentrations of visible raptors occur at predictable locations (Kerlinger 1989, Zalles and Bildstein 2000, Bildstein 2006).At such locations, termed watch-sites, observers record the numbers of each raptor species that pass overhead hourly or daily during migration (Zalles andBildstein 2000, Bildstein et al. 2008).HawkCount (http://hawkcount.org/), an online data repository for watch-site counts in North and Central America, lists 203 watch-sites reporting in the last two years, or with more than five years of counts (E.G. Nolte, personal observation May 2016).Zalles and Bildstein (2000) listed 58 watch-sites outside North America.
A watch-site count is a two-stage sample.In the initial sampling stage, raptors must first fly past the watch-site while an observer is present; only these raptors are available to count.In the second stage, the available raptors may be either counted by an observer, or pass unseen.The count is the product of the available raptors and their probability of detection (Dunn andHussell 1995, Nichols et al. 2009).
Migration counts are an index and cannot be used to estimate population size (Fuller and Mosher 1981, 1987, Dunn 2005); however, migration counts are thought to change roughly in proportion to change in population size, so population trends may be estimated (Dunn and Hussell 1995, Dunn 2005, Farmer et al. 2007, Farmer and Hussell 2008).If, contrary to our assumptions, the index does not change proportionately to change in the population, perhaps because of trends in availability or detection, then trend analysis may be inconclusive or even misleading (Thompson 2002, Johnson 2008).Field workers and analysts collectively spend thousands of hours every year to produce this index, so the underlying assumption deserves rigorous testing.
Two previous studies have examined the factors affecting probability of detection at raptor migration watch-sites.First, Sattler and Bart (1984), working at the Derby Hill watch-site on the shoreline of Lake Ontario in New York, compared the numbers of hawks counted by one observer tasked with watching the entire sky with six times the number counted by a second observer watching only one-sixth of the sky at a time.They found that detectability varied by observer attentiveness, flight density, flight visibility, and species.Specifically, they found that higher birds were less detectable than lower birds and that the observer was more attentive and detected raptors with greater efficiency in dense flights.Furthermore, raptor species that typically soared were detected at higher rates than species that often did not soar.
Second, Berthiaume et al. (2009), at the Observatoire d'oiseaux de Tadoussac, on the shoreline of the Saint-Lawrence estuary in Québec, used a double-observer approach (Nichols et al. 2000) to assess the relative effects of flight behavior and weather.Species affected detectability, with small species having lower detectability than large species.For most species, birds at eye level were most detectable, and detectability decreased with increasing altitude.Cloud cover increased the detectability of high-flying raptors while decreasing the detectability of raptors at lower altitudes.Additionally, the number of raptors migrating in a group had a significant positive effect on detectability.Wind direction and speed, cloud cover, humidity, and hour of the day affected flight altitude, and thereby affected detectability indirectly (Berthiaume et al. 2009).Some potential factors have not yet been adequately investigated.Detectability may be affected by site-specific factors and the number of observers (Kochenberger and Dunne 1985).The detectability studies of Sattler and Bart (1984) and Berthiaume et al. (2009) were both performed at shoreline watch-sites in the northeast where lone observers made official counts.Interobserver variation in detectability exists in avian point counts (Sauer et al. 1994, Kendall et al. 1996, Cunningham et al. 1999, Nichols et al. 2000, Alldredge et al. 2007, Campbell and Francis 2011), and is likely in raptor migration counts (Dunn andHussell 1995, Dunn et al. 2008).No previous study at a raptor migration count has tested for interobserver variation in detectability among more than two unique individual observers, nor compared the performance of any number of unique teams of observers.
With the intent to generalize the results of prior raptor detectability research and investigate observer effects, we undertook a new double-observer study of detectability at a Western ridgeline watch-site with teamed observers.Subsequently, we used our model in simulations of long-term trend analysis to estimate the effect of detectability on statistical power.We undertook this study to gain a greater understanding of the relative importance of various sources of statistical error in migration counts and suggest methodological changes and areas of future research to improve the utility of migration counts for population monitoring.

Study site
The Lucky Peak raptor count is performed each fall by the Intermountain Bird Observatory (formerly Idaho Bird Observatory), a nonprofit research and public education program of Boise State University.At a single point, at least two observers cooperate to count migrating raptors each day, from 25 August to 31 October, as weather permits.Lucky Peak is situated at the southern end of the Boise Ridge, on the western front of the Rocky Mountains, overlooking the Snake River Plain and Boise, Idaho (43° 36'18.7" N, 116° 3'40.6" W;Zalles andBildstein 2000, Ruelas Inzunza 2008).Owing to the elevation of the site (~1000 m above the plain), visible raptors are distributed both laterally and vertically.The watch-site also includes a raptor banding station on the west slope of the mountain, in sight of the observation point.Captured raptors are reported to the migration observers by two-way radio.The watch-site is open to the public, and observers provide interpretation for visitors.

Data collection
We conducted a double-observer survey (Nichols et al. 2000) during the autumn raptor migration count at Lucky Peak in 2009 and 2010.Sampled days were 1 -4 days apart (mean = 1.8, SD = 1.0) on 29 weekend days and 36 weekdays.Four observers were grouped in teams of two.One team, designated primary, was positioned at the traditional lookout point and attempted to count all raptors passing the lookout.The primary observers called out the identification and location of raptors they observed to avoid double-counting.The other team, designated secondary, was positioned approximately three meters behind the primary team.The secondary observers worked together to record, on a separate sheet, only those additional raptors that were not counted by the primary team.Secondary observers could ask the primary observers questions to clarify which bird had been counted, but had to be silent when identifying any birds the primary observers had missed.Therefore, detection by the primary observers was assumed to be unaffected by the activities of the secondary observers, while detection by the secondary observers was conditional on nondetection by the primary observers.Birds captured by the banding station were reported to the observers via radio and were removed from the data.We randomly assigned observers to teams each day.The observation teams remained consistent over the course of each day, except on four days in 2010 when an observer was substituted mid-day.The teams rotated between the primary and secondary roles at the end of each hour.
For individual raptors, observers recorded species and, when possible, age, sex, and color morph, as well as a distance and altitude category (Alt.).Observers scored birds by altitude only when within the range of unaided vision (where differences in background color and viewing angle are greatest when altitude varies), and scored more distant birds strictly by visibility with 10X magnification (Table 1).We chose this system because lateral distance affected apparent size in the same way altitude did, so distance and altitude were difficult to measure separately, and their effects on detectability were likely to be similar enough to complicate inference if modeled independently.Observers classified each bird based on its closest approach to the watchsite, even if it was initially detected further away.

Statistical analyses
Detectability was estimated by fitting a closed-population markrecapture model (closed-capture model; Otis et al. 1978).A closed-capture model, unlike simpler logistic-regression approaches, accounts for the presence of animals that were undetected.Closed-capture models are based on three key assumptions: (1) each "capture" attempt, in this case the attempt of an observer team to detect migrant raptors, has access to the same pool of animals (a closed population), (2) animals are independent in their detection probabilities, and (3) there is no heterogeneity in detection probabilities among individual animals.To relax assumption 3, we used the conditional likelihood approach developed by Huggins (1989Huggins ( , 1991) ) to account for heterogeneity.Individually-varying detectability was modeled as a linear function of covariates related to the observer, flight, weather, and species of each bird.Observer-specific detectability is only estimable for the primary observer role in the dependent double-observer survey design, so our models require one additional assumption: (4) the detection probability for an observer team was not affected by its role (Nichols et al. 2000).The available migrant raptors were considered a closed population because observer teams were positioned closely enough to view the same extent of sky and the two counts occurred simultaneously.We seldom observed hawks migrating in groups of more than four (approximately 3% of observations), so detection of individuals could be considered independent.We excluded Turkey Vultures (Cathartes aura) from our analysis because they migrated in much larger groups.We likewise removed raptors not identified to genus (n = 100) from the analysis.Table 1.Ordinal scale used in estimating effects of distance and altitude on detectability.Migratory flights at Lucky Peak are distributed laterally, with relatively few raptors flying high overhead.Thus, altitude was only noted at close distances, where potential differences in background color and viewing angle were substantial, and altitude could be estimated with confidence.We excluded all birds assigned to category 6 from analysis because they represent birds not within the standard search radius at this site, and not available to all observers because only one spotting scope was present.The distance classification scheme is adapted from flight altitude codes on the data form published by the Hawk Migration Association of North America (2009).Difficult, but possible to see without binoculars.4 Visible only with aid of 10X binoculars (but clearly seen).5 Raptor sometimes fades out while viewing with 10X binoculars.6 Visible only with a ≥ 20X spotting scope.
We used an information-theoretic model-selection approach with Akaike's information criterion corrected for small sample size (AIC c ) as the selection criterion to assess the relative effects of these factors on detection probability (Burnham and Anderson 2002). .Model-fitting was performed using the Huggins closedcapture data type in Program MARK (White and Burnham 1999).We coded raptors recorded by the primary observers with encounter history "11," and raptors recorded only by the secondary observers with encounter history "01."We fixed the value of the probability of recapture (c) equal to one because birds detected by the primary observers could not fail to be detected by the secondary observers.
We measured several covariates related to each of the four hypothesized sources of variation in detectability: observers, migratory flight, weather, and species.We examined independent measurable covariates for correlation and any with coefficients > ± 0.4 we did not use in combination during model-building.Initially, we fit all possible models separately for each of the four hypothetical sources of variation, along with a null model with no covariates, and a model with only the effect of year (42 models).At this stage, we removed any variables that reduced model deviance by < 3 from further consideration, and built a general model.From subsets of variables in the general model, we built an all-combinations candidate model set (64 models).In doing so, we kept together any sets of model parameters describing a single covariate (e.g., five genus variables and wingspan, all describing the species).
We modeled the effects of observer teams (combinations of two individual observers) as dichotomous (dummy) variables.Ten teams, representing pair-wise combinations of seven regular observers, participated under a representative range of conditions (> 7 days).We pooled the 17 other observer teams with insufficient samples.The seven regular observers (Tables 2 and 3) were all recent (2004 -2010, median = 2009) university graduates with B. Sc. degrees from wildlife and natural resource programs.All had prior professional experience assisting with field studies of wildlife (6 -40 months, median = 15), but only one had prior experience observing bird migration (5 months).We used the number of days since the beginning of the season and the hour of the day as covariates to account for possible effects of practice or fatigue.We also modeled a second-order effect to allow for a nonlinear effect of practice (e.g., diminishing returns).
Two variables described the migratory flight.We used the number of birds observed per hour, including vultures (BPH), a naïve estimate of flight density, as a covariate for all birds observed in that hour.We used the distance category (see Table 1) as an individual covariate to model the effect of flight-line.Because we had decided (for the sake of efficient use of model parameters) to model the ordinal categories as a linear covariate, we considered it necessary to include a second-order effect in model-selection to allow for unequal units and nonlinear effects.
Wind speed, wind direction, ambient temperature, and cloud cover category were used to describe the effects of weather.Circular variables cannot be used in linear models so we used the cosine of wind direction as a linear covariate.This number ranges from -1 (wind from the south, a headwind) to 1 (wind from the north, a tailwind).We also used the product of the cosine of wind direction and the wind speed as a covariate.This number was highest for strong tailwinds, and lowest for strong headwinds, with lighter winds and crosswinds having intermediate values.We chose these transformations because the resulting variables were likely to be correlated with the speed of migrating raptors.We chose to limit the number of wind variable interactions to avoid colinearity and make the effect of migration volume and flight line distinguishable from more proximate effects of wind.
We hypothesized that detectability might vary among species because species were of different visible size or flew with different styles.We used an approximate average wingspan for each species (from Sibley 2000) to account for differences in visible size.Raptors of unknown species (n = 417) were assigned a roughly approximate size, based on the information available (e.g., genus, large or small).The second-order effect of wingspan was also considered, in case detectability might increase nonlinearly with size.We used a dichotomous variable for each common genus of raptors observed (Accipiter n = 3088, Buteo n = 1680, Circus n = 331, Falco n = 1525, and Pandion n = 119) to account for differences in flight style among raptors of similar wingspan.Golden Eagles (Aquila chrysaetos n = 27) and Bald Eagles (Haliaeetus leucocephalus n = 4) composed the reference (null) category.
Tests of differences in covariates between years were performed with Fisher Exact Tests for dummy variables and Welch t Tests for quantitative variables (H 0 : x̅ 1 = x̅ 2 ).Means of detectability estimates were calculated with weights of 1 / [p̂1 + ((1-p̂1) p̂2)], where p̂1 is the individual raptor's estimated detectability for the primary observers and p̂2 is the individual raptor's estimated detectability for the secondary observers.The denominator is an estimate of the total probability of the individual being detected by either of the observer teams.Weighting observations by the inverse of the detection probability is necessary to correct for the Table 3. Estimates of coefficents from the most parsimonious model (β) with standard errors (SE), odds ratios (e β ), and AICC modelselection importance weights (Σw).All covariates were scaled to range from 0 to 1 to show relative magnitudes of effects.Asterisks indicate informative variables (H 0 : β ≠ 0, α = 0.15).Unique observer letters represent individuals.Reference categories are: Observer team "Other" (17 teams that participated on < 7 days each) and Genus "Eagles" (Haliaeetus and Aquila pooled).(Horvitz and Thompson 1952).Descriptive values are presented as means ± SD.

Simulation
A simulation of the number of birds passing the watch site and the number that were detected (Appendix 1), was written in R (Revolution R Community 4.3 build of R 2.12.2, Revolution Analytics, Palo Alto, California).The number of birds available to detect in year k (for k > 1) was N k = N 1 λ k-1 + ε k .We made λ = 0.964, so N would be expected to decline by 50% in 20 years.
The Raptor Population Index project adopted a similar benchmark trend (λ = 0.965) to evaluate power (Farmer andHussell 2008, Smith et al. 2008).The ε k were normally distributed with mean 0 and represents the square root of the variance in annual number of birds available, as a proportion of the expected N.
We used our data to define the attributes of a statistical population of possible individuals.We weighted the probability of selecting any record (the attributes of an observed bird) to be simulated by 1/p i to make less detectable attribute combinations occur with realistic frequency.We randomly sampled the population (with replacement) in each year to simulate N k available individuals.We used the attributes of the available individuals (inherited from the original records) in the model to determine their individual probabilities of detection.
Unlike all the other attributes, the observer team coefficient was modified for the purpose of simulating a realistic sequence of years.In most years at Lucky Peak the counts are predominantly made by a single pair of observers, and a pair seldom stays the same from one year to the next.Therefore, we applied a single observer team coefficient to all individuals in a year, and changed the coefficient between years.The distribution of observer team effects was a normal distribution with a mean taken from the estimated values for the 10 observer teams and a variance inflated by 20% because the observers in two years were probably more similar than a 30-year sample of observers would be.
We determined whether each bird was detected using a random number generator and the detection probability for the bird.If the random decimal from a uniform distribution between zero and one was greater than the detection probability, the bird was not detected (otherwise, it was).The sum of birds detected in year k was the annual count C k .
Trends were estimated by fitting an exponential curve to the counts.These regressions were performed for each of 3000 trials for every unique combination of simulation parameters tested.We estimated detectability bias as the absolute difference between the means of trend coefficients from the regressions of C k and N k .We assessed precision with a 90% confidence interval (the fifth to 95th percentiles) of trend estimates.Power to detect the trend was estimated as the proportion of trials in which the upper bound of the 90% confidence interval for the trend estimate parameter (β) was < 0. The effect of variable detection on power was determined using two methods.First, we compared the mean number of years required to achieve 80% power with perfect detection (by regression of N k ) and variable detection (by regression of C k ).Second, we plotted power with perfect and variable detection as a function of survey duration with durations of five to 30 years.The third N 1 of 100 was simulated to show what could be expected when the species is locally rare.The third CV(N k ) of 0.18 approximates the minimum interannual variation in available birds to be expected for any raptor species at any watch-site (Fuller andMosher 1981, Lewis andGould 2000).To examine a possible scenario where trend estimates would be biased in the opposite direction of the true trend, thereby dramatically reducing power, we ran a second pair of simulations in which the mean observer effect trended from low (-0.59) in year 1 to high (0.52) in year 30.

Double-observer trials
Observers detected 6773 raptors in 390 hours on 65 days.Primary observers detected 77% of raptors observed, and secondary observers made 23% of detections (effective sample size = 1571).Many variables differed significantly between years (Appendix 2).
Comparison of AIC c between the year-effect model and models representing other hypotheses suggested that the other covariates had superior explanatory value, so we did not consider year in any additional model-selection to avoid co-linearity (Appendix 3).
The general model (K = 23) included variables for observer, distance, species, cloud cover, wind speed, and day.The most parsimonious model (AIC c = 7098.77,selection weight = 0.26, K = 22) included all the terms of the general model except cloud cover (Table 2).Eight models of the 64 candidates were reasonably competitive in model selection (ΔAIC c < 4) and every one included all the variables for observer, distance, and species (Table 2).The rest of the models had ΔAIC c > 20 (Appendix 2).
Coefficients from the most parsimonious model suggest detection probabilities differed by observer team (Table 3).Detectability greatly decreased with distance beyond the range of unaided vision (Fig. 1).Species with longer wingspans were generally more detectable, but Ospreys (Pandion haliaetus) were unusually difficult to detect for their size (Fig. 2).Weather variables and day had relatively little effect on detectability independent of species, flight, and observers (Table 3), but in four of the top eight models (Table 2) wind speed was estimated to have a negative effect on detectability (Table 3).For definitions of distance categories, see Table 1.Points are weighted mean detectability with bars of ± 1 SD, labeled with the number of individuals detected.The curve shows the model prediction for a hypothetical individual with average covariates.The effect of ordinal distance category was modeled as a quadratic function (Table 3).
Estimated detectability of individual raptors observed ranged from 0.36 to 0.94 for the two primary observers and 0.59 to 0.99 for all four observers.Weighted mean detectability was 0.72 ± 0.11 with the two primary observers and 0.92 ± 0.07 with all four observers.

Simulation
Imperfect detection and heterogeneous detectability affected power mainly by increasing count variance.The bias in trend estimation introduced by imperfect detection was minimal in simulations with no predefined trend in detectability (|β Cβ N | ≤ 2.6 • 10 -4 ).Bias was greater when observer ability was simulated to improve over time (|β Cβ N | ≈ 0.01 Fig. 3).
The relative effect of limited detectability on power is inversely related to variation in the number of raptors available CV(N k ) (Figs. 4 and 5).The effect of heterogeneous detectability on power was minimal and accounted for ≤ 1 year difference in time to attain the 80% power benchmark with realistic parameters and no trend in observer skill.The effect of imperfect detectability on precision and power in both species was more negative when variation in availability was lower and population size was smaller.When the two species were compared with equal N 1 and CV(N k ) values, the decline in power resulting from imperfect detectability was greater in Sharp-shinned Hawks, the smaller and less perceptible species.We estimate that Lucky Peak Hawk-Watch would require 19 years to achieve 80% power to detect a decline in Sharp-shinned Hawks with 90% confidence when the true trend is -3.5% annually.Twenty-five years of counts would be required in the case of Northern Harriers.Harriers' greater probability of detection did not compensate much for the loss of precision from small population size and highly variable availability.When mean observer skill was simulated to improve over the 30 years, the average number of years necessary to achieve 80% power to detect a decline increased to 26 years for Sharp-shinned Hawks and exceeded 30 years for Northern Harriers.Moreover, the decline detected can be expected to be of lesser magnitude than the true trend (Fig. 3).

DISCUSSION
Detectability of migrant raptors at Lucky Peak depended on the observer team, the distance of the migratory flight, and species characteristics.It is important to note that as much as individual detectability varied, the mean detectability was considerably Fig. 3. Precision (90% CI) of trend parameter estimates with increasing study duration from 3000 simulation trials.Solid black lines are from the regressions of simulated counts with imperfect detectability.The dashed black lines are from the regressions of the available population.The gray line is the value for the true trend.When no trend in observer skill was simulated (the observer effect varied in each year but the mean did not change) there was little bias, and only a slight loss of precision from imperfect detection.When the mean observer skill was simulated to improve over time the trend in counts was biased high, relative to the population trend.As a result, increased precision with longer study duration ceases to improve the likelihood of detecting the true magnitude of population decline.
higher than in some other surveys of raptors (McLeod and Andersen 1998, Ayers and Anderson 1999).Interspecific differences in detectability will not bias species-specific estimates of trend for population monitoring.The sizable effects of observers and distance, however, may be detrimental to data quality and should be addressed at raptor watch-sites.Specifically, the number and ability of observers may change over time (Dunn et al. 2008), and changing weather may change flight distance over time (Berthiaume et al. 2009).Longer term trends in these effects would contribute to bias in trend estimates and greater loss of power (Kéry et al. 2009, Paprocki et al. 2014, Crewe et al. 2015).
Apart from the observer effect, our results were consistent with the findings from Berthiaume et al. (2009) suggesting doubleobserver sampling is a robust technique for quantifying relative detectability at many raptor migration watch sites.In both studies detectability was greatest for raptors within the range of unaided vision viewed against sky, lower for raptors viewed against the ground, and declined with increasing distance or altitude.Likewise, smaller species were considerably less detectable than larger species.Ospreys were an exception to this trend and were less detectable than smaller Buteo species and Northern Harriers.The low detectability of Ospreys was more pronounced in this study than in Berthiaume et al. (2009), but was consistent with the results of Sattler and Bart (1984).Ospreys at Lucky Peak in 2009 and 2010 were relatively uncommon (< 2% of raptors detected), and often flew along very different flight lines than the majority of migrants.Observers seeking to detect the greatest proportion of migrants may pay more attention to heavilypopulated flight lines than regions in the field of view with few raptors, making uncommon raptors with atypical migration strategies less consistently detectable.This hypothesis closely resembles the one suggested by Kochenberger and Dunne (1985), to explain low counts of Peregrine Falcons (Falco peregrinus) on busy days at Cape May, New Jersey.If this is true, "specialty" watch-sites with concentrations of species that are rare at highvolume watch-sites may offer excellent monitoring value, e.g., the Florida Keys site for Peregrine Falcons (Lott 2006).Comparing the results of this study with previously published results (Sattler andBart 1984, Berthiaume et al. 2009), it appears some factors may predict detectability better at some sites than others.Cloud cover was associated with greater detectability in all three studies, but the effect was of lesser predictive value at Lucky Peak than at Tadoussac (Berthiaume et al. 2009).This might be expected because Lucky Peak is a mountaintop site where raptors are often detected near or below the horizon, whereas Tadoussac is a shoreline site close to sea level, and birds may be detected at higher angles.Sattler and Bart (1984) observed that cloud cover improved visibility at Derby Hill, another lowelevation shoreline watch-site.At Derby Hill, flight density had a significant direct effect on detectability, whereas at Tadoussac and Lucky Peak flight density was of little value in predicting detectability (Sattler andBart 1984, Berthiaume et al. 2009).This difference may be attributable to the relatively high peak flight densities observed at the Derby Hill watch-site (over 200 raptors in 30 minutes).
Our simulation results support the findings of prior power analyses.Lewis and Gould (2000) estimated the power of trend analysis for seven watch-sites and concluded that an interannual CV of 30% or less was necessary to have 80% power (α = 0.1) to detect a 50% population decline in 25 years, provided the mean number of birds counted per year was at least 20.At their seven watch-sites, among species counted in numbers > 20 per year, only 43% of species-by-site combinations had a CV that met this standard.In our simulations of a slightly faster decline (-50% in 20 years) we estimate that a CV ≤ 38% is necessary to attain 80% power in 25 years (Figs. 4 and 5).Detectability-correction alone is unlikely to increase the number of species or watch-sites from which reliable trend estimates may be obtained because detectability had little effect on CV when CV was ≥ 30%.
The results of these simulations provide insight into the conditions in which double-observer or distance-sampling detectability correction may be useful.The most important consideration is the potential for a trend in detectability over time, as we simulated with a trend in observer ability.Policies may be adopted to attempt to keep such trends to a minimum.Watchsite managers should consider adopting staffing policies that produce minimal changes in observer ability at longer time scales, i.e., month to month or year to year.We concur with Dunn et al. 's (2008) recommendation to use teams of two or more observers and rotate a pool of equivalently trained observers from day to day, instead of employing only one or two observers each year who may be exceptionally skilled or unexpectedly mediocre.Methodological drift may be avoided by producing written protocols, training new observers carefully, and periodically having more experienced observers check the work of those less experienced.Official observers can sometimes be isolated from visitors.Alternatively, a detectability-estimating survey method can be used.
In our simulations without a trend in detectability and low bias, detectability had a substantial effect on power when the number of available birds was consistent from year-to-year (CV(N) < 25%), the species was uncommon at the watch-site (20 to a few hundred observed each year), and individuals of the species were relatively difficult to detect.Few combinations of species and watch-sites are likely to meet these qualifications.Raptor migration counts have rates of detection of 66% or higher (Berthiaume et al. 2009), utilize an index approach (Dunn and Hussell 1995), and are primarily useful for long-term monitoring (Fuller andMosher 1981, 1987), making double-observer or distance-sampling detectability correction less potentially beneficial in this method.
Short-term detectability studies can yield valuable insights into cost-effective ways to improve count protocols.For example, our results show that there may be substantial differences in raptorcounting ability among individuals who are very comparable in terms of bird-watching experience, age, or visual acuity.For this reason, prior screening of observers may not be an effective safeguard against observer variation.Further research into the causes of variation in observer ability may suggest more effective screening criteria, or methods to adjust annual indices for observer ability.
The relative importance of factors affecting availability is in need of further research.Apart from survey effort, the proportion of the population available to count may be affected by changes in migration routes, distances, and timing, as well as rates of fecundity and survival.Temporal data on the rate of passage of raptors at watch-sites are collected at an hourly scale at most watch-sites in North America, providing a rich source of information for availability compensation in trend analyses (Farmer et al. 2007, Farmer andHussell 2008).Collecting similarly useful spatial datasets should be a priority.
This study and Berthiaume et al. (2009)  Present techniques for trend analysis of raptor migration counts avoid addressing the issues of detectability and availability by making no estimate of total migratory volume at any scale (Farmer andHussell 2008, Crewe et al. 2013).This approach limits raptor migration counts to detecting long-term, continental trends in populations.Progress in stable hydrogen isotope analysis (Domenech et al. 2015, Hobson et al. 2015, Nelson et al. 2015) may make producing accurate trends at smaller spatial scales especially informative, by linking fluctuations in certain watchsite counts with climate, habitat, or prey base changes in specific regions.Simple innovations in data-collection and analysis to account for detectability and availability bias, such as recording individual flight distances, rotating observers, and empirically studying the relative efficiency of observers, may enable watchsite networks to engage in more hypothesis-based research addressing current conservation issues of interest (e.g., Dennhardt et al. 2015).
Responses to this article can be read online at: http://www.ace-eco.org/issues/responses.php/894

Fig. 1 .
Fig. 1.Effect of relative distance and altitude on detectability.For definitions of distance categories, see Table1.Points are weighted mean detectability with bars of ± 1 SD, labeled with the number of individuals detected.The curve shows the model prediction for a hypothetical individual with average covariates.The effect of ordinal distance category was modeled as a quadratic function (Table3).

Fig. 4 .
Fig. 4. Sharp-shinned Hawk (Accipiter striatus) trend analysis simulation results.Estimated statistical power (α = 0.1, twotailed test) to detect a significant declining trend (λ = 0.964) by the number of years of study duration.Dashed lines depict power in simulations with detectability = 1.Solid lines depict power in simulations with imperfect detectability.N 1 is the expected available population in the first year and CV(N k ) is the square root of variance in the annual number of birds available as a proportion of N. Each simulated scenario was iterated 3000 times.

Fig. 5 .
Fig. 5. Northern Harrier (Circus cyaneus) trend analysis simulation results.Estimated statistical power (α = 0.1, twotailed test) to detect a significant declining trend (λ = 0.964) by the number of years of study duration.Dashed lines depict power in simulations with detectability = 1.Solid lines depict power in simulations with heterogeneous detectability < 1. N 1 is the expected available population in the first year and CV(N k ) is the square root of variance in the annual number of raptors available as a proportion of N. Each simulated scenario was iterated 3000 times.

Table 2 .
Best candidate models estimating the detectability of migrating raptors in double-observer counts conducted at Lucky Peak in 2009 and 2010.ΔAIC c is the difference in AIC c between the model and the model with the lowest AIC c .L is the model likelihood, and w is the AIC c weight of evidence.K is the number of parameters in the model.For all models, see Appendix 2.
(Dunn et al. 2005).org/vol11/iss2/art9/Analyseswerecarried out for the Sharp-shinned Hawk (Accipiter striatus) and Northern Harrier (Circus cyaneus).Both species were common at Boise Ridge, rarely traveled in groups, and are high priorities for alternative range-wide surveys by Partners in Flight, i.e., are not satisfactorily monitored by the BBS(Dunn et al. 2005).To avoid potentially drastically underestimating the number of distant Sharp-shinned Hawks, we included all birds identified only as Accipiter in the sample.One set of simulations presumed no trend in observer skill over time, included three starting population sizes (N 1 ), and three values for CV(N k ).Historical annual counts at Lucky Peak suggest an N 1 of Sharp-shinned Hawks of roughly 2000, with a CV(N k ) of 0.26.Counts of Northern Harriers suggested an N 1 of roughly 450, with a CV(N k ) of 0.38 (pers.obs.).
both used simple visibility-based metrics to model effects of distance on individual raptors and found similar effects.This suggests that visibilitybased distance and altitude codes, already in use at most watchsites, may be useful covariates for adjusting counts to more accurately reflect the number of raptors present.However, at most sites, the code is recorded hourly, and represents a poorly defined central tendency for all the birds observed in that hour.The hourly measure provides no information on the distribution of distances, or how flight lines differ among species.A visibility-based distance (or altitude) code for each individual raptor or flock can be recorded with very little additional effort (E.G. Nolte, personal observation).

Appendix 2
Descriptive statistics for all covariates by year.Continuous and ordinal variables are presented as mean (SD), and compared with Welch t-tests.Dummy (Boolean) variables are presented as ratios and are compared with Fisher's Exact tests.All models grouped by stage of selection, then ordered by increasing AICC.Lowest AICc was 7098.769.K is the number of model parameters.For relative model likelihoods and AICC weights, see Table2(All but the top eight models have zero relative likelihood).