the rate of failing to detect as statistically significant true improvements of practical importance at any given maximum sample size. While stopping early on average is important, a decrease in statistical power means there is an increase in type II errors, a.k.a. This leaves practitioners with little guidance on how to choose an appropriate sequential test based on a trade-off between power and average sample size. Where power calculations are present in published works in the A/B testing literature, they are often in comparison to other tests of the same class or to fixed sample tests in a sometimes not so straightforward manner, making comparisons to other approaches difficult. Fully sequential tests sacrifice more generalizability as well as more statistical power in order to achieve better average sample size.Ī majority of the literature on sequential tests focuses significantly on the reduction in average sample size and often insufficient attention is paid to the trade-off in terms of loss of statistical power. In general, group sequential tests aim for higher generalizability and power through their monitoring schedule and alpha-spending function while achieving less impressive average sample sizes. the statistical power achieved at any finite sample size.the average sample sizes under various true values of the primary metric.the effect on generalizability (external validity).The three major differentiators, in no particular order, are: MotivationĪn earlier article titled Fully Sequential vs Group Sequential Tests offers a brief history of sequential testing and a comparison of the two major types of such tests. A fixed sample test will serve as an anchoring benchmark.Īn understanding of the trade-offs involved in each type of sequential test should assist experimenters in their choice of a sequential testing method suitable for the scenarios they face. Hence the motivation for this piece which is to provide an easy to digest comparison between sequential statistical tests popular in online A/B testing, namely the Sequential Probability Ratio Test (SPRT), the group-sequential AGILE, and Always Valid Inference – a type of mixture SPRT. Not all sequential statistical tests are made equal, however, and comparisons between the different approaches are rare and/or difficult to translate to practice. Sequential monitoring achieves this superiority by trading statistical power for the ability to stop earlier on average under any true value of the primary metric. In most practical scenarios sequential tests offer a balance of risks and rewards superior to that of an equivalent fixed sample test. In A/B testing sequential tests are gradually becoming the norm due to the increased efficiency and flexibility that they grant practitioners.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |