Statistics - Basics and Terminology



Statistics - Basics and Terminology



What is Statistics?

Statistics is a branch of applied mathematics. it can be defined as the science of creating, developing, and applying techniques by which uncertainty of inductive inferences may be evaluated. Statistics is a field of study concerned with collection, organization, summarizing and analysis of data and drawing inferences about population data when only a part of the data (data sample) is obtained.


There are two major types of statistics:

  1. Descriptive statistics: consists of methods for organizing and summarizing information. It includes the construction of graphs, charts, and tables and the calculation of descriptive measures such as mean, measures of variation, and percentiles.

  2. Inferential statistics: consists of methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population; such as t-test, ANOVA, correlation, and regression.


Variables in Statistics

It is a characteristic that takes on different values in different things, persons, and places. It is anything that, when measured, can produce two or more different scores. For example, a person's eye color is a variable, which could have the value (observation) of "Brown" for one person and "Blue" for another.


There are several types of variables in statistics. The major two types are Qualitative and Quantitative variable (Figure 1):

1. Qualitative variables: are non-numerically valued variables. Some of qualitative variables can be ordered or ranked (ordinal) and some of them can’t be ordered (nominal) such as eye color.

2. Quantitative variables: are numerically valued variables. they are classified as continuous or discrete:

  • Continuous variables: their values form some interval of numbers or fractional amounts (e.g. a person's height or weight).

  • Discrete variables: their values can be classified into discrete classes. They are not observed on a continuous scale due to the existence of gaps between possible data (e.g. number of petals of a flower).

types of variables in statistics, types of data

Figure 1: Main types of variables in statistics.



Distribution

The scores or observations we initially measure in a study are called the raw scores or raw data. Descriptive statistics help us to convert (boil down) the raw data into an interpretable score. There are several ways to do this, but the starting point is to count the number of times each score occurred which is known as The Frequency Distribution.


Frequency Distribution is simply the number of times each value (or range of values) occurs in a data set (Figure 2). Usually, frequency distribution values are presented in table, or graphic such as histogram or bar chart.

Suppose we have 182 roses with different colors, the table and figure below shows the number (frequency) of each color.


data distribution, frequency distribution, example

Figure 2: Color of one hundred eighty-two rose. Source: Statistics for the life sciences. 4th ed. Samuels, Myra L., Jeffrey A. Witmer, and Andrew Schaffner



Population and Samples

A population, or universe, consists of all possible values of a variable. A sample is a part of a population. Usually the intention is to use sample information to make an inference about a population. Therefore, the sample must be representative of the population if it is to lead to valid inferences.

To obtain a representative sample, we derive the sample data through the principle of randomness. Randomness is a procedure for generating data in random (no recognizable patterns or regularities). As a consequence, the laws of probability apply and can be used in drawing inferences.



Important Statistical Terms

- A distribution is the general name for any organized set of data.

- Frequency (f) is the number of times that a score or observation occurs.

- Frequency distribution is a listing of all classes along with their frequencies.

- Relative frequency is the ratio of the frequency of a class to the total number of observations.

- Simple random sampling is a sampling procedure for which each observation in a data set is equally likely to be obtained.

- Parent population (Parent distribution) is the population that is originally sampled.

- Parameter is a numerical descriptive measure of a population, e.g. population mean (μ) and variance (σ) are parameters.

- A point estimate (statistic) is a numerical descriptive measure obtained from the observed data sample, and is used as our best guess of the unobserved population parameter; such as data sample mean and variance.




References

  • Heiman, G. W. (2011). Basic statistics for the behavioral sciences (6th ed.). USA: Cengage Learning.

  • Samuels, M. L., Witmer, J. A., & Schaffner, A. (2012). Statistics for the Life Sciences (4th ed.): Pearson Education, Inc.

  • Weiss, N. A., & Weiss, C. A. (2012). Introductory statistics (9th ed.): Pearson Education, Inc.