# Data Analysis in Research: Fundamentals & Steps

**What is Data?**

According to Webster’s new Collegiate Dictionary, data is defined as "factual information" and this information could be numbers, characters, images, or other method of recording, in a form which can be used as a basis for reasoning, discussion, or calculation.

Another definition of data was introduced by Fabio Nelli (2015), the author of Python Data Analytics book. The author stated that "data are the events recorded in the world. Anything that can be measured or even categorized can be converted into data. Once collected, these data can be studied and analyzed both to understand the nature of the events and very often also to make predictions or at least to make informed decisions."

**Types of Data**

The two primary data types are **quantitative **and **qualitative data** (Figure 1). In scientific research, we usually classify researches into two core area or types; quantitative and qualitative research. Obviously, this classification is based on the type of data collected and its analysis methods. However, there are researcher use a mix of quantitative and qualitative data for their research project (mixed research) to “have the best of both worlds,” in order to credibly address a particular question and make well informed decisions.

Also, there are other types of research which describe the nature of the research and the data used such as exploratory research, cross-sectional research and longitudinal research.

**Figure 1**: Types of data

**Quantitative Data**

Quantitative data is data that that can be quantified or measured in numerical terms (Figure 2). Quantitative variables can tell you "how many," "how much," or "how often".

Quantitative data values could be either * discrete *or

*. Discrete data is the type of data that has clear spaces between values; the data variables cannot be divided into smaller parts. For example, the test questions answered correctly, or the number of kids in a class (you cannot say 1.5 kid). Continuous data is data that can be divided into smaller and smaller units such as width, height, weight, temperature, etc. In summery, the discrete data is countable while continuous data is measurable.*

**continuous****Qualitative Data**

Qualitative data is a type of data that is collected either as a verbal or narrative format through interviews, focus groups, opened/closed ended questionnaire items, etc. A simple way to look at qualitative data is to think of qualitative data in the form of words (not measurements). Usually, we present the values of such data in the form of words, names, symbols or number codes.

Qualitative data values could be either * nominal *or

*. Nominal data is a type of qualitative data that can be classified or labeled without a natural order or rank such as gender, eye color, marital status, occupation status, etc. (Figure 2). Ordinal data is qualitative data categorized in a particular order or on a ranging scale. For example, questionnaire responses on a scale from satisfied to unsatisfied are considered ordinal data.*

**ordinal****Figure 2**: Quantitative and qualitative data

**Data analysis in Research**

Data in raw format no meaning, only when interpreted does it take on meaning and become useful. By analyzing data, we can find patterns to perceive information, and then information can be used to gain insight and to enhance knowledge. Generally, data analysis can be defined as the collection, transformation, and organization of data in order to draw conclusions, make predictions, or drive informed decision-making.

Statistics is the essential science for data analysis process. Statistics is the science of designing studies or experiments, collecting data and modeling or analyzing data for the purpose of decision making and scientific discovery. Ott and Longnecker (2015) described the statistics as "the science of learning from data".

In this context, we approach the study of statistics by considering the essential seven-step process in conducting a research study and learning from data:

Defining the research problem

Planning your research design

Collecting the data

Cleaning the data

Analyzing data

Visualizing the analysis results

Interpreting the results

**1. Defining the Research Problem**

The first step in any data analysis process is to define the research questions and objectives. This step is essential because the data analysis results should answer these questions and attain the objectives.

In most research studies such as biomedical ones, investigators hypothesize about the relationships of two or more factors. Hypothesis testing is then used as a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population. In this procedure we need to the transform the research question into a null hypothesis (H0), and an alternative hypothesis (Ha). Generally, the null hypothesis statements a negative statement indicating "no relationship or difference between study groups", while the alternative hypothesis is a positive statement indicating "there is a relationship or difference between study groups. For example:

H0: There is no relationship between intelligence and academic results.

Ha: There is a relationship between intelligence and academic results.

**2. Planning your Research Design**

After defining the research problem and formulate the hypothesis, you need to plan your research design. A research design is the framework of research methods and techniques chosen by a researcher to conduct a study. It is your overall strategy for data collection and analysis. It also determines the statistical tests you will use to test your hypothesis or answer the research questions later on. While planning or designing your research, you need to include the following essential elements:

Accurate purpose statement.

The method applied for analyzing collected details.

Type of research methodology

Techniques to be implemented for collecting and analyzing research.

Timeline.

Measurement of analysis.

Possible limitations or obstacles to the research.

Settings for the research study.

**3. Collecting the Data**

Data collection is a systematic process of gathering information from relevant sources such as:

Interviews.

Questionnaires and surveys.

Observations.

Documents and records.

Focus groups.

Oral histories.

**4. Cleaning the Data**

Data cleaning is the process of fixing or removing corrupted, incorrectly formatted, duplicate, incomplete data, and outliers. Data cleansing improves data quality and helps provide more accurate, consistent and reliable results.

Data can be cleaned in a number of ways. For relatively small data, data cleaning can be done in spreadsheet program such as Microsoft Excel and Google Sheets. While cleaning big data is done usually by a programming language such as Python, R and Matlab.

**5. Analyzing Data**

After collecting and cleaning the data, you can organize and summarize the data using descriptive statistics. The fundamental statistics used to summarize and describe the data are the mean, median, standard deviation or standard error, and measures of frequency distribution.

Also, inferential statistics are also used to test hypotheses and make estimates or predictions about the population. The most inferential statistics used in research include Student t-test, Analysis of variance (ANOVA), Correlation, Regression analysis, and Chi square test.

**6. Visualizing the Analysis Results**

Data visualization is the representation of analysis results by using visual elements such as tables, charts, plots, maps and graphs (Figure 3).

After data has been collected, cleaned and analyzed, it must be visualized to help the researcher to find patterns and relationships and consequently draw conclusions and make predictions.

**Figure 3**: Common data visualization types

Many statistical software and programming languages, such as SPSS, SAS, Python and R, can be used for data analysis process and provide you with all required results and visualizations.

**7. Interpreting the Results**

After analyzing and presenting the results, you need to explain them, show exactly how they answer your research questions and make an argument in support your overall conclusion. While interpreting the results, you should also consider the following:

Identifying patterns, relationships and correlations among the data.

Discussing whether the results met your expectations or supported your hypotheses.

Showing whether what you found confirms or does not confirm the findings of previous studies in the literature review.

Explaining unexpected results and evaluating their significance.

End with a well-constructed conclusions and recommendations.