The Fundamentals of Heavy Tails: Properties, Emergence, & Estimation

By Jayakrishnan Nair, Adam Wierman, and Bert Zwart

Abstract

Heavy-tails are a continual source of excitement and confusion across disciplines as they are repeatedly “discovered” in new contexts. This is especially true within computer systems, where heavy-tails seemingly pop up everywhere — from degree distributions in the internet and social networks to file sizes and interarrival times of workloads. However, despite nearly a decade of work on heavy-tails they are still treated as mysterious, surprising, and even controversial.


The goal of our forthcoming book is to show that heavy-tailed distributions need not be mysterious and should not be surprising or controversial. In particular, we demystify heavy-tailed distributions by showing how to reason formally about their counter-intuitive properties; we highlight that their emergence should be expected (not surprising) by showing that a wide variety of general processes lead to heavy-tailed distributions; and we illustrate that most of the controversy surrounding heavy-tails is the result of bad statistics, and can be avoided by using the proper tools.


The book covers mathematically deep concepts such as the generalized central limit theorem, extreme value theory, and regular variation; but does so using only elementary mathematical tools in order to make these topics accessible to anyone who has had an introductory probability course.


A more detailed overview of the topics to be included in the book is below. Additionally, the slides from recent tutorials we have given on heavy-tails provide a high-level glimpse into the topics and perspective of the book.


You can download pre-publication versions of completed chapters here.

Table of contents

  1. Introduction
    • Defining heavy-tailed distributions
    • Examples of heavy-tailed distributions
    • How to use this book

Part I: Properties

  1. Scale invariance, power laws, and regular variation
    • Scale invariance and power laws
    • Approximate scale invariance and regular variation
    • Analytic properties of regularly varying functions
    • An example: Closure properties of regularly varying distributions
    • An example: Branching processes
  2. Catastrophes, conspiracies, and subexponential distributions
    • Conspiracies and catastrophes
    • Subexponential distributions
    • An example: Random Sums
    • An example: Conspiracies and catastrophes in random walks
  3. Residual lives, hazard rates, and long tails
    • Residual lives and hazard rates
    • Heavy tails and residual lives
    • Long-tailed distributions
    • An example: Random extrema

Part II: Emergence

  1. Additive processes
    • The central limit theorem
    • Generalizing the central limit theorem
    • Understanding stable distributions
    • The generalized central limit theorem
    • A variation: The emergence of heavy-tails in random walks
  2. Multiplicative processes
    • The multiplicative central limit theorem
    • Variations on multiplicative processes
    • An example: Preferential attachment and Yule processes
  3. Extremal processes
    • A limit theorem for maxima
    • Understanding max-stable distributions
    • The extremal central limit theorem
    • An example: Extremes of random walks
    • A variation: The time between record breaking events

Part III: Estimation

  1. Estimating power-law distributions: Listen to the body
    • Parametric estimation of power-laws using linear regression
    • Maximum likelihood estimation of power-law distributions
    • Properties of the maximum likelihood estimator
    • Visualizing the MLE via regression
    • A recipe for parametric estimation of power-law distributions
  2. Estimating power-law tails: Let the tail do the talking
    • The failure of parametric estimation
    • The Hill estimator
    • Properties of the Hill estimator
    • The Hill plot
    • Beyond the Hill estimator
    • Guidelines for estimating heavy-tailed phenomena
  3. Estimating multivariate power-law tails: Cautionary tales of tails
    • Parametric estimation of multivariate power-laws
    • The pitfalls of parametric estimation
    • Semi-parametric estimation for multivariate power-laws
    • Guidelines for multi-variate estimation of heavy-tailed phenomena
Contact

California Institute of Technology
1200 E. California Boulevard
MC 305-16
Pasadena, CA 91125

215 Annenberg
(626) 395-6569

adamw@caltech.edu

Assistant

Jolene Brinks
345 Annenberg
(626) 395-2813
jbrink@caltech.edu

Centers/Groups

RSRG, DOLCIT, CSIS, CMI, IST, RSI, CAST