{"id":596,"date":"2011-10-28T14:37:20","date_gmt":"2011-10-28T22:37:20","guid":{"rendered":"http:\/\/www.deaneckles.com\/blog\/?p=596"},"modified":"2022-01-30T13:03:39","modified_gmt":"2022-01-30T21:03:39","slug":"lossy-better-than-lossless-in-online-bootstrapping","status":"publish","type":"post","link":"https:\/\/www.deaneckles.com\/blog\/596_lossy-better-than-lossless-in-online-bootstrapping\/","title":{"rendered":"Lossy better than lossless in online bootstrapping"},"content":{"rendered":"<p>Sometimes an approximate method is in some important sense better than the &#8220;exact&#8221; one &#8212; and not just because it is easier or faster.<\/p>\n<p>In statistical inference, a standard example here is the Agresti-Coull confidence interval for a binomial proportion: the &#8220;exact&#8221; interval from inverting the binomial test is conservative &#8212; giving overly wide intervals with more than the advertised coverage &#8212; but the standard (approximate) Wald interval is too narrow.<sup><a href=\"#footnote_0_596\" id=\"identifier_0_596\" class=\"footnote-link footnote-identifier-link\" title=\"The Wald interval also gives zero-width intervals when observations are all either y=1 or y=0.\">1<\/a><\/sup> The Agresti-Coull confidence interval, which is a modification of the Wald interval that can be justified on Bayesian grounds, has &#8220;better&#8221; performance than either.<sup><a href=\"#footnote_1_596\" id=\"identifier_1_596\" class=\"footnote-link footnote-identifier-link\" title=\"That is, it contains the true value closer to 100 * (1 &amp;#8211; alpha)% of the time than the others. This example is a favorite of Art Owen&amp;#8217;s.\">2<\/a><\/sup><\/p>\n<p>Even knowing this, like many people I suspect, I am a sucker for the &#8220;exact&#8221; over the approximate. The rest of this post gives another example of &#8220;approximate is better than exact&#8221; that <a href=\"http:\/\/www-stat.stanford.edu\/~owen\/\">Art Owen<\/a> and I recently encountered in <a href=\"http:\/\/arxiv.org\/abs\/1106.2125\">our work on bootstrapping big data with multiple dependencies<\/a>.<\/p>\n<p>The bootstrap is a computational method for estimating the sampling uncertainty of a statistic.<sup><a href=\"#footnote_2_596\" id=\"identifier_2_596\" class=\"footnote-link footnote-identifier-link\" title=\"Hesterberg et al. (PDF) is a good introduction to the bootstrap. Efron (1979) is the first paper on the bootstrap.\">3<\/a><\/sup> When using bootstrap resampling or bagging, one normally draws observations without replacement from the sample to form a bootstrap replicate. Each replicate then consists of zero or more copies of each observation. If one wants to bootstrap online &#8212; that is, one observation at a time &#8212; or generally without synchronization costs in a distributed processing setting, machine learning folks have used the Poisson approximation to the binomial. This approximate bootstrap works as follows: for each observation in the sample, take a Poisson(1) draw and include that many of that observation in this replicate.<\/p>\n<p>Since this is a &#8220;lossy&#8221; approximation, investigators have sometimes considered and advocated &#8220;lossless&#8221; alternatives (Lee &amp; Clyde, 2004). The Bayesian bootstrap, in which each replicate is a n-dimensional draw from the Dirichlet, can be done exactly online: for each observation, take a Exp(1) draw and use it as the weight for that observation for this replicate.<sup><a href=\"#footnote_3_596\" id=\"identifier_3_596\" class=\"footnote-link footnote-identifier-link\" title=\"In what sense is this &amp;#8220;exact&amp;#8221; or &amp;#8220;lossless&amp;#8221;? This online method is exactly the same as the offline Bayesian bootstrap in which one takes a n-dimensional draw from the Dirichlet. On the other hand, the Poisson(1) method is often seen as an online approximation to the offline bootstrap.\">4<\/a><\/sup> Being a sucker for methods labeled &#8220;lossless&#8221; or &#8220;exact&#8221;, I implemented this method in Hive at Facebook, and used it instead of the already available Poisson method. I even chortled to others, &#8220;Now we have an exact version implemented to use instead!&#8221;<\/p>\n<p>But is this the best of all possible distributions for bootstrap reweighting? Might there be some other, better distribution (nonnegative, with mean 1 and variance 1)? In particular, what distribution minimizes our uncertainty about the variance of the mean, given the same number of bootstrap replicates?<\/p>\n<p>We examined this question (Owen and Eckles, 2011, section 3.3) and found that the Poisson(1) weights give a sharper estimate of the variance than the Exp(1) weights: <em>the lossy approximation to the standard resampling bootstrap is better than the exact Bayesian reweighting bootstrap<\/em>. Interestingly, both of these are beat by using &#8220;double-or-nothing&#8221; U{0, 2} weights &#8212; that is, something close to half-sampling.<sup><a href=\"#footnote_4_596\" id=\"identifier_4_596\" class=\"footnote-link footnote-identifier-link\" title=\"Why is this? See the paper, but the summary is that U{0, 2} has the lowest kurtosis, and Poisson(1) has lower kurtosis than Exp(1).\">5<\/a><\/sup> Furthermore, the Poisson(1) and U{0, 2} versions are more general, since they don&#8217;t require using weighting (observations can duplicated) and, when using them as weights, they don&#8217;t require using floating point numbers.<sup><a href=\"#footnote_5_596\" id=\"identifier_5_596\" class=\"footnote-link footnote-identifier-link\" title=\"This is especially useful if one is doing factorial weighting, as we do in the paper, where multiplication of weights for different grouping factors is required.\">6<\/a><\/sup><\/p>\n<div class=\"references\">\n<p>Agresti, A. and Coull, B. A. (1998). Approximate Is Better than &#8220;Exact&#8221; for Interval Estimation of Binomial Proportions. <em>American Statistician<\/em>, 5 (2): 119-126<\/p>\n<p>Efron, B. (1979). Bootstrap methods: Another look at the jackknife. <em>Annals of Statistics<\/em>, 7:1\u201326.<\/p>\n<p>Hesterberg, T., et al. <a href=\"http:\/\/bcs.whfreeman.com\/ips5e\/content\/cat_080\/pdf\/moore14.pdf\">Bootstrap Methods and Permutation Tests<\/a>. In: Introduction to the Practice of Statistics.<\/p>\n<p>Lee, H. K. H. and Clyde, M. A. (2004). Lossless online Bayesian bagging. <em>Journal of Machine Learning Research<\/em>, 5:143\u2013151.<\/p>\n<p>Owen, A. B. and Eckles, D. (2011). Bootstrapping data arrays of arbitrary order. <a href=\"http:\/\/arxiv.org\/abs\/1106.2125\">http:\/\/arxiv.org\/abs\/1106.2125<\/a><\/p>\n<p>Oza, N. (2001). Online bagging and boosting. In <em>Systems, man and cybernetics, 2005 IEEE international conference on<\/em>, volume 3, pages 2340\u20132345. IEEE.<\/p>\n<\/div>\n<ol class=\"footnotes\"><li id=\"footnote_0_596\" class=\"footnote\">The Wald interval also gives zero-width intervals when observations are all either y=1 or y=0. [<a href=\"#identifier_0_596\" class=\"footnote-link footnote-back-link\">\u21a9<\/a>]<\/li><li id=\"footnote_1_596\" class=\"footnote\">That is, it contains the true value closer to 100 * (1 &#8211; alpha)% of the time than the others. This example is a favorite of Art Owen&#8217;s. [<a href=\"#identifier_1_596\" class=\"footnote-link footnote-back-link\">\u21a9<\/a>]<\/li><li id=\"footnote_2_596\" class=\"footnote\">Hesterberg et al. (<a href=\"http:\/\/bcs.whfreeman.com\/ips5e\/content\/cat_080\/pdf\/moore14.pdf\">PDF<\/a>) is a good introduction to the bootstrap. Efron (1979) is the first paper on the bootstrap. [<a href=\"#identifier_2_596\" class=\"footnote-link footnote-back-link\">\u21a9<\/a>]<\/li><li id=\"footnote_3_596\" class=\"footnote\">In what sense is this &#8220;exact&#8221; or &#8220;lossless&#8221;? This online method is exactly the same as the offline Bayesian bootstrap in which one takes a n-dimensional draw from the Dirichlet. On the other hand, the Poisson(1) method is often seen as an online approximation to the offline bootstrap. [<a href=\"#identifier_3_596\" class=\"footnote-link footnote-back-link\">\u21a9<\/a>]<\/li><li id=\"footnote_4_596\" class=\"footnote\">Why is this? See the paper, but the summary is that U{0, 2} has the lowest kurtosis, and Poisson(1) has lower kurtosis than Exp(1). [<a href=\"#identifier_4_596\" class=\"footnote-link footnote-back-link\">\u21a9<\/a>]<\/li><li id=\"footnote_5_596\" class=\"footnote\">This is especially useful if one is doing factorial weighting, as we do in the paper, where multiplication of weights for different grouping factors is required. [<a href=\"#identifier_5_596\" class=\"footnote-link footnote-back-link\">\u21a9<\/a>]<\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>Sometimes an approximate method is in some important sense better than the &#8220;exact&#8221; one &#8212; and not just because it is easier or faster. In statistical inference, a standard example here is the Agresti-Coull confidence interval for a binomial proportion: the &#8220;exact&#8221; interval from inverting the binomial test is conservative &#8212; giving overly wide intervals [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[76],"tags":[],"class_list":["post-596","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/posts\/596","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/comments?post=596"}],"version-history":[{"count":25,"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/posts\/596\/revisions"}],"predecessor-version":[{"id":841,"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/posts\/596\/revisions\/841"}],"wp:attachment":[{"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/media?parent=596"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/categories?post=596"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.deaneckles.com\/blog\/wp-json\/wp\/v2\/tags?post=596"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}