Skip to content
FREE SHIPPING ON ALL DOMESTIC ORDERS $35+
FREE SHIPPING ON ALL US ORDERS $35+

Bayesian Models: A Statistical Primer for Ecologists

Availability:
Only 1 left!
Original price $56.00 - Original price $56.00
Original price $56.00
$73.99
$73.99 - $73.99
Current price $73.99
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.

Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.

This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.

  • Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticians
  • Covers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and more
  • Deemphasizes computer coding in favor of basic principles
  • Explains how to write out properly factored statistical expressions representing Bayesian models

ISBN-13: 9780691159287

Media Type: Hardcover

Publisher: Princeton University Press

Publication Date: 08-04-2015

Pages: 320

Product Dimensions: 6.10(w) x 9.30(h) x 1.00(d)

N. Thompson Hobbs is senior research scientist at the Natural Resource Ecology Laboratory and professor in the Department of Ecosystem Science and Sustainability at Colorado State University. Mevin B. Hooten is associate professor in the Department of Fish, Wildlife, and Conservation Biology and the Department of Statistics at Colorado State University, and assistant unit leader in the US Geological Survey's Colorado Cooperative Fish and Wildlife Research Unit.

Read an Excerpt

Bayesian Models

A Statistical Primer for Ecologists


By N. Thompson Hobbs, Mevin B. Hooten

PRINCETON UNIVERSITY PRESS

Copyright © 2015 Princeton University Press
All rights reserved.
ISBN: 978-0-691-15928-7



CHAPTER 1

Preview

Art is the lie that tells us the truth. — Pablo Picasso

All models are wrong but some are useful. — George E. P. Box


Pablo Picasso was a contemporary of George Box's, a statistician who had an enormous impact on his field, writing influential papers well into his 90s. Both men sought to express truth about nature, but they used tools that were dramatically different — Picasso brushing strokes on canvas, and Box writing equations on paper. Given the different ways they worked, it is remarkable that Box and Picasso made such similar statements about the central importance of abstraction to insight. Abstraction plays a role in all creative human enterprise — in art, music, literature, engineering, and science. We create abstractions because they allow us to focus on the most important elements of a problem, those relevant to the objectives of our work, without being distracted by elements that are not relevant.

Scientific models are, above all else, abstractions. They are statements about the operation of nature that purposefully omit many details and thus achieve insight that would otherwise be discursively obscured. They provide unambiguous statements of what we believe is important. A key principle in modeling and statistics — in science for that matter — is the need to reduce the dimensions of a problem. A data set may contain a thousand observations. By reducing its dimensions to a model with a few parameters, we are able to gain understanding.

However, because models are abstractions and reduce the dimensions of a problem, we must deal with the elements we choose to omit. These elements create uncertainty in the predictions of models, so it follows that assessing uncertainty is fundamental to science. Scientists, journalists, logicians, and attorneys alike can rightly claim to make statements based on evidence, but only scientific statements include evidence tempered by uncertainty quantified. We know what is certain only to the extent that we can say, with confidence, what is uncertain. Sharpening our thinking about uncertainty and learning how to estimate it properly is a main theme of this book.

Your science will have impact to the extent that you are able to ask important questions and provide compelling answers to them. Doing so depends on establishing a line of inference that extends from current thinking, theory, and questions to new insight qualified by uncertainty (fig. 1.1.1). This book offers a highly general, flexible approach to establishing this line of inference. We cannot help you pose novel, interesting questions, but we can teach an approach to inference applicable to an enormous range of research problems, an approach that can be understood from first principles and that can be unambiguously communicated to other scientists, managers, and policy makers. We emphasize that understanding the principles of this framework allows you to customize your analyses to accommodate the inevitable idiosyncrasies of specific problems in research.

We sketch that framework in this chapter to give a general sense of where this book is headed, a preview we use to motivate the development of concepts and principles in the chapters that follow. There should be details of our approach that are unfamiliar, otherwise you probably don't need this book. Soon enough, we will explain those details fully. For now, we offer a somewhat abstract overview followed by a concrete example as an enticement to read on. It will be rewarding to return to this section after you have worked through the book. We hope you will be pleasantly surprised by your increased understanding of our small preview. The only part of this chapter that is essential for the remainder of the book is understanding our notation, which we describe in section 1.1.1.


1.1 A Line of Inference for Ecology

Virtually all research problems in ecology share a set of features. We want to understand how the state of an ecological system changes over time, across space, or among individuals. We seek to understand why those changes occur. Our understanding usually depends on a sample drawn from all possible instances of the state because we want to make statements about a system that is too large to observe fully. The observations in that sample are often related imperfectly to the true state. In the subsections that follow, we lay out an approach first described by Berliner (1996) for modeling the imperfect observations that arise from a process we want to understand. It does not apply to all research problems, but it is sufficiently general and flexible that it applies to most.


1.1.1 Some Notation

Before we proceed, we must introduce some notation. Boldface lowercase letters will indicate vectors (e.g., θ, a), and lightface lowercase letters, scalars (θ, a). Bold capital letters will be used for matrices (e.g., A). The symbol θ will indicate a vector of parameters, and, of course, θ will indicate a single parameter. The letter y will indicate a vector of data, Y a matrix, and y or yi a single observation. Corresponding notation using x, x , and xi will be used for predictor variables, also called covariates. The notation [a|b, c] will be used for the probability distribution of the random variable a conditional on the parameters b and c. Deterministic models will be denoted by g() with arguments necessary for the model within the parentheses. Notation will be added as needed, in context.


1.1.2 Process Models

Process models include a mathematical statement that depicts a process and a way to account for uncertainty about the process. To compose a process model we start by thinking about the true state (z) of an ecological system. That state could be the size of a population, the flux of nitrogen from the soils of a grassland, the number of invasive plants in a community, or the area of landscape annually disturbed by fire. We seek to understand influences on that state, the things that cause it to change. We write an equation, a deterministic model that represents our ideas about the behavior of the state of interest and the quantities that influence it. When we say the model is deterministic, we mean that for a given set of parameters and inputs, it will make precisely the same predictions. We use the notation gp, x) to represent the deterministic part of a process model, where g() is any mathematical function, θp is a vector of parameters in the model, and x is one or more explanatory variables that we hypothesize influence the true state.

Our deterministic model is an abstraction, so it follows that we have omitted influences on the true state from the model, and we must deal with our omissions. If we model aboveground net primary production of a grassland as a function of growing season precipitation, we have brushed aside the influence of grazing intensity and precipitation that occurs during the dormant season. If we model reproductive success of individuals as a function of age and genotype, we have ignored variation contributed by their nutritional status. A model of harvest from a fishery based on observations of stock size and sea temperature omits the effect of variation in the food web. We recognize that these neglected influences shape the behavior of the true state by treating them stochastically, by estimating a parameter, σ2p, that subsumes all the unmodeled influences on the true state. Including this stochastic component allows us to estimate a statistical distribution (fig. 1.1.2A) for the true state,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1.1.1)

where the bracket notation [z|g(θp, x), σ2p] means the distribution of z conditional on gp, x) and σ2p. If the notation is somewhat unfamiliar at this point, don't worry; equation 1.1.1 simply says that if we know the functional form g( ) and the values of θp, x, and σ2p, we can specify the probability distribution of the true state, z (fig. 1.1.2 A).

We want to determine the probability distribution of the true state as well as the probability distributions of the parameters in our model. Doing so requires evaluating the predictions of the process model against data. The data can be obtained in experiments or observational studies; they can be measurements we plan to collect or have already collected. This linkage between process models and observations is discussed next.


1.1.3 Sampling Models

We can rarely observe all instances of the true state in the system we study. Instead, we take a sample of i = 1, ..., n observations of the true state and we notate the ith observation as ui. This sample might be biomass from plots on a grassland landscape where we seek to understand the true state, aboveground productivity. It might be presence or absence of an exotic fungus on trees in a stand where we want to understand infestation of the stand. It might be classifications of zooplankton in aliquots from a stream where we want to estimate the stream's species richness. Uncertainty arises because our sample assuredly will not represent the true state perfectly. Again, we represent this uncertainty stochastically using a probability distribution relating the true state to an observation

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1.1.2)

where σ2p represents sampling variation (fig. 1.1.2 B). Expression 1.1.2 implicitly assumes that we can observe instances of the true state without bias, which simply says that if we collect many observations, then the average (i.e., also called the expected value of the observations, E(u)) of the observations equals the mean of the distribution of the true state, E(u) = z. (Expectation will be treated in detail in chapter 3.) We realize samples of the true state is a nuanced concept — soldier on, things will become clear in the next section.


1.1.4 Observation Models

The assumption that we can observe the true state perfectly may not be reasonable. When we count animals, some are overlooked. When we use Lidar to estimate the heights of 10,000 trees, we do not measure the height of each tree using a ladder and a meter tape (thankfully) but instead observe backscatter from a laser beam. When we measure nitrogen mineralization, we do not follow the fate of individual nitrogen atoms but measure the net change in the extractable soil ammonium pool over time. The mismatch between what we observe and the true state requires a model of the observations, which we notate as d(θo, ui), where θo are parameters. It is important to understand that ui is the quantity we would observe if we could perfectly observe the instance of the true state in a draw from all the instances, without any bias injected by our observation process. We use yi to notate the actual measurements we have in hand, including error resulting from the way we observe the ui. The observation model serves to eliminate the bias found in our observations, yi relative to an instance (ui) of the true state drawn from the distribution of z. The probability distribution of the observations (fig. 1.1.2 C) arising from the observed instances of the true state is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1.1.3)

where the σ2o represents all the influences on the yi that are not represented in d(θo, ui). As a simple example, the σ2o could be the variance of the predictions of a regression model used to calibrate observations against true values. We emphasize that model d(θo, ui) is needed to offset bias in the yi. If our observations are unbiased, then there is no need for an observation model, (i.e., yi = ui) and the uncertainty in our data arises solely from sampling variation (i.e., eq. 1.1.3). For simplicity, we have ignored the sampling variation and observation errors that might influence the x, but these could be handled in the same as we have done for the y (i.e., we would use probability distributions like eqs. 1.1.2 and 1.1.3).


1.1.5 Parameter Models

Because the approach we sketch is Bayesian, we also require models of the parameters expressing what we knew about the parameters when we began our investigation, that is, our prior knowledge. This knowledge is expressed in probability distributions, one for each parameter we seek to estimate

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (1.1.4)

These distributions must have numeric arguments that specify our current knowledge of the probability distribution of the parameters. The arguments can be chosen to make the distributions informative or vague, but as you will see, we will encourage you to make priors as informative as knowledge and scholarship allows. We might know a lot about a parameter or we might know very little.


1.1.6 The full Model

We are now equipped to write a mathematical expression representing our ideas about the operation of an ecological process linked to data in a way that includes all sources of uncertainty — in the process, in our sample of the process, and in the way we observe it (fig. 1.1.2)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.1.5)

Equation 1.1.5 is Bayesian and hierarchical. It is Bayesian because it treats the unobserved quantities as random variables. This treatment allows us to make statements about the probability distributions of all of the unobserved quantities based on the observed ones. It is hierarchical because the z and ui are found on both sides of a conditioning symbol "|", illustrating a powerful tool for simplifying problems that we will discuss more fully soon. The observation model includes our knowledge of the relationship between the true state and our observations of it and the uncertainty that occurs because that relationship is imperfect. The sampling model includes the uncertainty that comes from observing a subset of instances of the true state. The process model represents our hypotheses about the ecological process by specifying a probability distribution defining our knowledge and our uncertainty about the true state and the factors that control its behavior. The parameter models allow us to exploit previous estimates of parameters that we have made ourselves or that have been made by others. Together these models provide a line of inference extending from concepts to insight for a broad range of research problems (fig. 1.1.1). We can use equation 1.1.5 to obtain estimates of unobserved states, parameters, and quantities of interest derived from parameters and states. All of these estimates are properly tempered by uncertainty in a statistically coherent way.

In the remainder of this book, we develop the principles needed to understand equation 1.1.5 and to apply it to research problems in ecology. We tailor it to match the needs of the particular problem at hand. But first, we provide an example of its use.


(Continues...)

Excerpted from Bayesian Models by N. Thompson Hobbs, Mevin B. Hooten. Copyright © 2015 Princeton University Press. Excerpted by permission of PRINCETON UNIVERSITY PRESS.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

<

What People are Saying About This

From the Publisher

"This pitch-perfect exposition shows how Bayesian modeling can be used to quantify our uncertain world. Ecologists—and for that matter, scientists everywhere—are aware of these uncertainties, and this book gives them the understanding to do something about it. Hobbs and Hooten take us on a signposted journey through the culture, construction, and consequences of conditional-probability modeling, readying us to take our own scientific journeys through uncertain landscapes."—Noel Cressie, University of Wollongong, Australia

"Hobbs and Hooten provide a complete guide to Bayesian thinking and statistics. This is a book by ecologists for ecologists. One of the powers of Bayesian thinking is how it enables you to evaluate knowledge accumulated through multiple experiments and publications, and this excellent primer provides a firm grounding in the hierarchical models that are now the standard approach to evaluating disparate data sets."—Ray Hilborn, University of Washington

"In this uniquely well-written and accessible text, Hobbs and Hooten show how to think clearly in a Bayesian framework about data, models, and linking data with models. They provide the necessary tools to develop, implement, and analyze a wide range of ecologically interesting models. There's something new and exciting in this book for every practicing ecologist."—Aaron M. Ellison, Harvard University

"Hobbs and Hooten provide an important bridge between standard statistical texts and more advanced Bayesian books, even those aimed at ecologists. Ecological models are complex. Building from likelihood to simple and hierarchical Bayesian models, the authors do a superb job of focusing on concepts, from philosophy to the necessary mathematical and statistical tools. This practical and understandable book belongs on the shelves of all scientists and statisticians interested in ecology."—Jay M. Ver Hoef, Statistician, NOAA-NMFS Alaska Fisheries Science Center

"Tackling an important and challenging topic, Hobbs and Hooten provide non-statistically-trained ecologists with the skills they need to use hierarchical Bayesian models accurately and comfortably. The combination of technical explanations and practical examples is great. This book is a valuable contribution that will be widely used."—Benjamin Bolker, McMaster University

"This excellent book is one of the best-written and most complete primers on Bayesian hierarchical modeling I have seen. Hobbs and Hooten anticipate many of the common pitfalls and concerns that arise when non-statisticians are introduced to this material. Researchers across a wide range of disciplines will find this book valuable."—Christopher Wikle, University of Missouri

Table of Contents

Preface ix

I Fundamentals 1

1 PREVIEW 3

1.1 A Line of Inference for Ecology 4

1.2 An Example Hierarchical Model 11

1.3 What Lies Ahead? 15

2 DETERMINISTIC MODELS 17

2.1 Modeling Styles in Ecology 17

2.2 A Few Good Functions 21

3 PRINCIPLES OF PROBABILITY 29

3.1 Why Bother with First Principles? 29

3.2 Rules of Probability 31

3.3 Factoring Joint Probabilities 36

3.4 Probability Distributions 39

4 LIKELIHOOD 71

4.1 Likelihood Functions 71

4.2 Likelihood Profiles 74

4.3 Maximum Likelihood 76

4.4 The Use of Prior Information in Maximum Likelihood 77

5 SIMPLE BAYESIAN MODELS 79

5.1 Bayes’ Theorem 81

5.2 The Relationship between Likelihood and Bayes’ 85

5.3 Finding the Posterior Distribution in Closed Form 86

5.4 More about Prior Distributions 90

6 HIERARCHICAL BAYESIAN MODELS 107

6.1 What Is a Hierarchical Model? 108

6.2 Example Hierarchical Models 109

6.3 When Are Observation and Process Variance Identifiable? 141

II Implementation 143

7 MARKOV CHAIN MONTE CARLO 145

7.1 Overview 145

7.2 How Does MCMC Work? 146

7.3 Specifics of the MCMC Algorithm 150

7.4 MCMC in Practice 177

8 INFERENCE FROM A SINGLE MODEL 181

8.1 Model Checking 181

8.2 Marginal Posterior Distributions 190

8.3 Derived Quantities 194

8.4 Predictions of Unobserved Quantities 196

8.5 Return to the Wildebeest 201

9 INFERENCE FROM MULTIPLE MODELS 209

9.1 Model Selection 210

9.2 Model Probabilities and Model Averaging 222

9.3 Which Method to Use? 227

III Practice in Model Building 231

10 WRITING BAYESIAN MODELS 233

10.1 A General Approach 233

10.2 An Example of Model Building: Aboveground Net Primary Production in Sagebrush Steppe 237

11 PROBLEMS 243

11.1 Fisher’s Ticks 244

11.2 Light Limitation of Trees 245

11.3 Landscape Occupancy of Swiss Breeding Birds 246

11.4 Allometry of Savanna Trees 247

11.5 Movement of Seals in the North Atlantic 248

12 SOLUTIONS 251

12.1 Fisher’s Ticks 251

12.2 Light Limitation of Trees 256

12.3 Landscape Occupancy of Swiss Breeding Birds 259

12.4 Allometry of Savanna Trees 264

12.5 Movement of Seals in the North Atlantic 268

Afterword 273

Acknowledgments 277

A Probability Distributions and Conjugate Priors 279

Bibliography 283

Index 293