Statistics Guide

A Statistics Primer

Thanks to reports from statisticians all over the globe, sites like Geo Hive are able to estimate that the world population will be 9.5 billion in 2050, that the number of people in the world in 1950 was 2.5 billion, that India has the highest daily growth of any country, and that the Tokyo metropolitan area, at 36.6 million individuals, is the largest in the world. Statistics allow lay readers to gain an understanding, however limited, of the scope of their world, the true size of entire industries, the effects of communicable diseases in lost productivity, and much, much more.

The following primer gives a small taste of the kinds of methods that statisticians use in order to convert seemingly overwhelming data into digestible numbers. Throughout the sections below are hyperlinks to tutorials, online textbooks, articles, organizations and other resources that are directed toward readers with a limited or intermediate-level background in mathematics. All of the resources compiled here were selected for their depth and accessibility.


Essentials

The field of statistics uses mathematical principles to convert qualitative phenomena into numbers. Statistical data can at best only give people insight into a few aspects of a problem: standardized test scores, for example, may be able to measure the aptitude of populations, but not always of individuals. Nevertheless, the simplification of complex systems, such as human populations, the economycomputers, and so on, into numbers allows statisticians to give researchers the “big picture” of a population trend over a period of time and to make predictions. The process of generating stats is complicated enough to constitute its own field with its own terms, definitions of which provide an understanding of what statisticians do:

  • Description: one of two main divisions in statistics as a discipline (the other being inference). Descriptive statistics represent current or historical data “as is” without extrapolating additional information from the numbers.
  • Generalizability: how widely an experiment, poll, or other research method can be applied broadly. For example, a survey of voting trends in rural Georgia is not generalizable to the entire U.S., whereas a survey of roughly the same size from regions all over the country is more generalizable.
  • Inference or Prediction: use of probability and other methods to understand the implications of descriptive methods. Probability itself is measured from 0 to 1 which can easily be converted into a percent chance of an event occurring.
  • Population: a group of information, free agents, or variables, which is not necessarily made up of people. Because researchers can rarely assess every member of a population, they limit themselves to samples, or small, hopefully representative, portions.
  • Standard Deviation: the amount of variability that exists around the “average” in a survey or experiment. High standard deviation means that there is little consistency in a study, whereas low standard deviation suggests a correlation. For example, if consumers in the 18 to 24 age range tend to buy one product that other age ranges do not, there will be a low standard deviation, showing a correlation between age and purchasing habits.
  • Variables: the aspects of a phenomenon being studied, including nominal (categories, such as “age” and “gender”), ordinal (numbers in an order, such as First Grade, Second Grade, etc.), and interval (numbers, including percentages and times). Variables are often represented symbolically in mathematics. For example, in X = 4, X is a variable.

An online book about the subject from Colorado State University, Writing Guide: Introduction to Statistics and a PDF textbook by Keone Hon, An Introduction to Statistics, go into far more detail about the basics. A more concise explanation of the essentials, which focuses heavily on terminology, can be found at Research Methods: Knowledge Base.


Models

In statistics, models are associations between variables. For example, the correlation between the rarity of an element and its value (gold is both rarer and more expensive than, say, copper) is a simple statistical model. Models are organized by the types of variables under consideration. For instance:

  • linear regression model examines the relationship between two numeric variables. For example, a linear regression model applied to the mileage of a vehicle and its price will show a correlation, as higher mileage cars tend to be cheaper than low-mileage ones.  Other factors, such as the car’s make and model, can complicate this model, which is why the actual work of statisticians is far more complicated.
  • The t-Test is applied to a nominal variable (see above) and a numeric variable. One of the most commonly exemplified t-Tests are those that use the nominal variable of gender to find a sex-specific correlation with spending habits, height, weight, and other variables that can be measured quantitatively.
  • chi-squared test compares two nominal variables. For example, the percentage of business executives (nominal variable) who prefer to drive a certain brand of car (nominal variable) can produce charts and tables that show that, for instance, 45 percent of CFOs prefer to drive a Lexus whereas 18 percent of CEOs would rather drive a Bentley. Neither variable is itself a statistic, but numerical data can be extracted from their relationship.

Many other models are used, but the preceding are some of the most basic. The primer to statistical models that informed this section comes from Will G. Hopkins of the online Sports Science journal. Otherwise, a long article about the subject by Peter McCullagh from the Annals of Statistics, What is a Statistical Model?, provides an introduction to the subject for readers with some background in mathematics, as do the first few chapters of the textbook, Statistical Models by A. C. Davison.


Applications

One of the key applications of statistics with which most people are familiar is demography, the field that provides a name for the more common term, demographics. Populations of people and their distribution in regions is the domain of demographers. The work of these specialists is compiled by sites like Geo Hive, which draws upon various reports to keep track of the world’s population and that of specific regions. Other specialties includeeconometrics, the discipline that informs economic theory through statistical methodology, epidemiology, which relies on statistics to determine how diseases spread in populations, and statistical computing, which uses methodology made possible with computers to analyze populations. Richard Lowry of Vassar College describes some other practical applications of the field in his online book.