About the instructor: Dr. Ghazala Ali Khan – Additional Director – PASTIC-PSF
What is Data?
▶Collection of facts, figures, and statistics that are used to inform, analyze, and make decisions is called Data.
Types of Data:
▶It can be in the form of numbers, words, images, sounds, or videos.
▶Data can be quantitative (numerical) or qualitative (non-numerical).
Examples of data include: –
Numbers: sales figures, temperatures, stock prices
Text: customer feedback, social media posts, survey responses
Images: photos, videos, medical scans
Audio: voice recordings, podcasts
Other: sensor/ satellite readings, GPS locations, website traffic
Data can be used to: –
-Identify patterns and trends
-Make predictions and forecasts
-Inform business decisions
-Evaluate performance and progress
Answer questions and solve problems In today’s digital age, data is generated and collected at an unprecedented scale, and its effective use has become a key driver of innovation, competitiveness, and success in various fields.
Scientific Data
▶Scientifically, data refers to a collection of quantitative (numerical) or qualitative (non-numerical) values that are obtained through observation, measurement, or experimentation.
▶These values are typically numerical or categorical in nature and are used to describe phenomena, patterns, or relationships.
In a scientific context, data is usually considered to be:
1.Empirical:
Data is based on observation or measurement, rather than on intuition or assumption.
2.Systematic:
Data is collected in a structured and organized way, using a predetermined methodology.
3.Quantifiable:
Data can be represented by numbers or categories, allowing for statistical analysis and mathematical manipulation.
Important to Remember:
Data is intended to be free from personal biases and subjective interpretations.
Verifiable:
Data can be checked and validated through repetition or cross-validation.
Examples of scientific data include:
▶ Sensor readings from a laboratory instrument
▶ Survey responses from a population sample
▶ Gene expression levels from a biological experiment
▶ Climate patterns from satellite imagery
Data Analysis and Interpretation
Major Steps for Data analysis:
Involves three major steps,
1.Cleaning and organizing the data for analysis (Data Preparation / Compilling)
2.Describing the data
3.Testing Hypotheses and Models
Statistics
▶A set of mathematical procedures for describing, synthesizing, analyzing, and interpreting quantitative data.
▶The selection of an appropriate statistical technique is determined by the research design, Hypothesis, and the data collected.
Compiling Data for Analysis
▶Data must be accurately recorded and systematically organized to facilitate data analysis:
▶Tabulating: organizing the data in a systematic manner
▶Coding: assigning numerals (e.g., ID) to data
Important To Remember:
▶Data analysis is the process of finding the right data to answer your question,
▶Understanding the processes underlying the data,
▶Discovering the important patterns in the data,
▶and then communicating your results to have the biggest possible impact
Parameters
indices (keywords/hints) calculated by the researcher for an entire population
Statistics
indices calculated by the researcher for a sample drawn from a population
Descriptive statistics
permit the researcher to describe many pieces of data with a few indices (tables/graphs/etc, T-test, T-test)
Types of descriptive statistics
1. Graphs
representations of data enabling the researcher to see what the distribution of scores look like
2. Measures of central tendency
indices enabling the researcher to determine the typical or average score of a group of scores
3. Measures of variability
Mode / Median, Mean
Mode – the value that occurs the most times
Median – the middle of the distribution, the number where half of the values are above and half are below
Mean – average of all values of this variable in the data set
Range of values – from minimum value to maximum value
Data Analysis Procedure
▶Coding – process of translating information gathered from questionnaires or other sources into something that can be analyzed
▶Involves assigning a value to the information given—often value is given a label
▶Coding can make data more consistent:
▶Example: Question = Sex
▶Answers = Male, Female, M, or F
▶Coding will avoid such inconsistencies
Coding Systems
▶Common coding systems (code and label) for dichotomous variables:
▶ 0=No 1=Yes
(1 = value assigned, Yes= label of value)
▶OR: 1=No 2=Yes
▶When you assign a value you must also make it clear what that value means
▶In first example above, 1=Yes but in second example 1=No
▶As long as it is clear how the data are coded, either is fine
▶You can make it clear by creating a data dictionary to accompany the dataset
Few frequently used Software for Data Analysis
Statistics 8.1
State X
SPSS
R
MS Excel
Also read: Sampling and Sampling Methods By Dr. Ghazala Ali Khan
Follow Us on