# Statistics - Basics and Terminology

## What is Statistics?

**Statistics** is a branch of applied mathematics. it can be defined as the science of creating, developing, and applying techniques by which uncertainty of inductive inferences may be evaluated. Statistics is a field of study concerned with collection, organization, summarizing and analysis of data and drawing inferences about population data when only a part of the data (data sample) is obtained.

There are two major types of statistics:

**Descriptive statistics**: consists of methods for organizing and summarizing information. It includes the construction of graphs, charts, and tables and the calculation of descriptive measures such as mean, measures of variation, and percentiles.**Inferential statistics**: consists of methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population; such as t-test, ANOVA, correlation, and regression.

**Variables in Statistics**

It is a characteristic that takes on different values in different things, persons, and places. It is anything that, when measured, can produce two or more different scores. For example, a person's *eye color* is a variable, which could have the value (observation) of "Brown" for one person and "Blue" for another.

There are several types of variables in statistics. The major two types are Qualitative and Quantitative variable (**Figure 1**):

**1. Qualitative variables:** are *non-numerically valued variables*. Some of qualitative variables can be ordered or ranked (** ordinal**) and some of them can’t be ordered (

**) such as eye color.**

*nominal***2. Quantitative variables:** are numerically valued variables. they are classified as continuous or discrete:

: their values form some interval of numbers or fractional amounts (e.g. a person's height or weight).*Continuous variables*: their values can be classified into discrete classes. They are not observed on a continuous scale due to the existence of gaps between possible data (e.g. number of petals of a flower).*Discrete variables*

**Figure 1**: Main types of variables in statistics.

**Distribution**

The scores or observations we initially measure in a study are called the raw scores or raw data. __Descriptive statistics__ help us to convert (boil down) the raw data into an interpretable score. There are several ways to do this, but the starting point is to count the number of times each score occurred which is known as The Frequency Distribution.

Frequency Distribution is simply the number of times each value (or range of values) occurs in a data set (Figure 2). Usually, frequency distribution values are presented in table, or graphic such as histogram or bar chart.

Suppose we have 182 roses with different colors, the table and figure below shows the number (frequency) of each color.

**Figure 2**: Color of one hundred eighty-two rose. Source: Statistics for the life sciences. 4th ed. Samuels, Myra L., Jeffrey A. Witmer, and Andrew Schaffner

**Population and Samples**

A **population**, or universe, consists of all possible values of a variable. A **sample** is a part of a population. Usually the intention is to use sample information to make an inference about a population. Therefore, the sample must be *representative* of the population if it is to lead to valid inferences.

To obtain a representative sample, we derive the sample data through the principle of randomness. Randomness is a procedure for generating data in random (no recognizable patterns or regularities). As a consequence, the laws of probability apply and can be used in drawing inferences.

**Important Statistical Terms**

**- A distribution **is the general name for any organized set of data.

**- Frequency** (**f**) is the number of times that a score or observation occurs.

**- Frequency distribution** is a listing of all classes along with their frequencies.

**- Relative frequency** is the ratio of the frequency of a class to the total number of observations.

**- Simple random sampling **is a sampling procedure for which each observation in a data set is equally likely to be obtained.

**- Parent population (Parent distribution)** is the population that is originally sampled.

**- Parameter** is a numerical descriptive measure of a population, e.g. population mean (μ) and variance (σ) are parameters.

**- A point estimate (statistic)** is a numerical descriptive measure obtained from the observed data sample, and is used as our best guess of the unobserved population parameter; such as data sample mean and variance.

**References**

Heiman, G. W. (2011). Basic statistics for the behavioral sciences (6th ed.). USA: Cengage Learning.

Samuels, M. L., Witmer, J. A., & Schaffner, A. (2012). Statistics for the Life Sciences (4th ed.): Pearson Education, Inc.

Weiss, N. A., & Weiss, C. A. (2012). Introductory statistics (9th ed.): Pearson Education, Inc.