solivp.blogg.se - False discovery rate sequential testing

If p is less than α, the observed field is said to be field, or globally, significant at level p. Using properties of the binomial distribution, one can calculate the probability p of observing as many or more significant p-values as were actually observed. The basic approach relies on the fact that if the collective null hypothesis holds, and if the p-values are independent, they can be viewed as a sample from a binomial distribution with sample size n and probability of “success” (correctly accepting the null), 1 − α. Statistical climatology texts cover the method as the primary way to deal with multiple hypothesis tests ( Wilks 1995 von Storch and Zwiers 1999). This approach is popular in the climatological literature, with over 300 citations of Livezey and Chen (1983) since its publication. In a seminal paper, Livezey and Chen (1983) proposed a method that determines if a field of individual, or local, hypothesis tests are collectively significant. More recently, climatologists have taken the alternative approach of testing for field significance. For example, with α = 5%, testing for a trend at 1000 locations at which no change really occurred would yield 50 significant locations on average this is unacceptably high.Ĭlimatologists have long recognized the problem of accounting for multiple tests as early as 1914, when conducting multiple tests, Walker adjusted the significance level used for rejection ( Katz 2002). Hence if n such tests are performed when all n null hypotheses are true (the collective null hypothesis), then the average number of tests for which the null is falsely rejected is n α. This paper addresses the problem of evaluating statistical significance when many hypothesis tests are performed simultaneously.Ī single test performed at significance level α has probability α of rejecting the null hypothesis when it is in fact true. The usual setting for such multiple testing in climatological studies involves quantities measured over time, with time providing the replication necessary for calculating the chosen test statistic, such as correlation, trend, or model fit, at each of the locations. It can also arise when evaluating time trends or model performance at many locations. Such a situation can arise when correlating an atmospheric field with a nonspatial quantity, such as a teleconnection pattern, or when correlating two atmospheric fields. If the quantity is measured at multiple locations, this requires testing many hypotheses simultaneously. Despite its unrealistic assumption, based on the simulation results, the authors suggest the use of the straightforward FDR-controlling method and provide a simple modification that increases the power to detect alternative hypotheses.Ĭlimate research often involves an assessment of the statistical significance of a quantity, such as an observed correlation or trend. A very general method that makes no assumptions controls the proportion of falsely rejected hypotheses but at the cost of detecting few alternative hypotheses. In a simulation study involving data with correlation structure similar to that of a real climatological dataset, the simple FDR method does control the proportion of falsely rejected hypotheses despite the violation of assumptions, while a more complicated method involves more computation with little gain in detecting alternative hypotheses. The most straightforward method for controlling the FDR makes an assumption of independence between tests, while other FDR-controlling methods make less stringent assumptions. The paper also investigates the best way to apply a false discovery rate (FDR) approach to spatially correlated data, which are common in climatology. Specifically, it controls a priori the expected proportion of falsely rejected tests out of all rejected tests additionally, the test results are more easily interpretable. The aim of this paper is to introduce the novel “false discovery rate” approach, which controls the false rejections in a more meaningful way. Many such procedures are available, most of which control, for every test, the probability of detecting significance that does not really exist. While the field significance approach determines if a field as a whole is significant, a multiple testing procedure determines which particular tests are significant.

The analysis of climatological data often involves statistical significance testing at many locations.