Why do researchers use inferential statistics
Instead, he suggested that we may assume a theory that the sun will rise every day without necessarily proving it, and if the sun does not rise on a certain day, the theory is falsified and rejected.
Likewise, we can only reject hypotheses based on contrary evidence but can never truly accept them because presence of evidence does not mean that we may not observe contrary evidence later. Because we cannot truly accept a hypothesis of interest alternative hypothesis , we formulate a null hypothesis as the opposite of the alternative hypothesis, and then use empirical evidence to reject the null hypothesis to demonstrate indirect, probabilistic support for our alternative hypothesis.
A second problem with testing hypothesized relationships in social science research is that the dependent variable may be influenced by an infinite number of extraneous variables and it is not plausible to measure and control for all of these extraneous effects.
Hence, even if two variables may seem to be related in an observed sample, they may not be truly related in the population, and therefore inferential statistics are never certain or deterministic, but always probabilistic. How do we know whether a relationship between two variables in an observed sample is significant, and not a matter of chance? Sir Ronald A. Fisher, one of the most prominent statisticians in history, established the basic guidelines for significance testing.
The significance level is the maximum level of risk that we are willing to accept as the price of our inference from the sample to the population. If the p-value is less than 0. We must also understand three related statistical concepts: sampling distribution, standard error, and confidence interval. A sampling distribution is the theoretical distribution of an infinite number of samples from the population of interest in your study.
However, because a sample is never identical to the population, every sample always has some inherent level of error, called the standard error. If this standard error is small, then statistical estimates derived from the sample such as sample mean are reasonably good estimates of the population. The precision of our sample estimates is defined in terms of a confidence interval CI. Jointly, the p-value and the CI give us a good idea of the probability of our result and how close it is from the corresponding population parameter.
Most inferential statistical procedures in social science research are derived from a general family of statistical models called the general linear model GLM. A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line. Hence, a GLM is a system of equations that can be used to represent linear patterns of relationships in observed data.
The simplest type of GLM is a two-variable linear model that examines the relationship between one independent variable the cause or predictor and one dependent variable the effect or outcome. Let us assume that these two variables are age and self-esteem respectively. The bivariate scatterplot for this relationship is shown in Figure From the scatterplot, it appears that individual observations representing combinations of age and self-esteem generally seem to be scattered around an imaginary upward sloping straight line.
We can estimate parameters of this line, such as its slope and intercept from the GLM. In GLM, this equation is represented formally as:. Note that a linear model can have more than two predictors. To visualize a linear model with two predictors, imagine a three-dimensional cube, with the outcome y along the vertical axis, and the two predictors say, x 1 and x 2 along the two horizontal axes along the base of the cube.
The GLM for regression analysis with n predictor variables is:. In the above equation, predictor variables x i may represent independent variables or covariates control variables. Descriptive statistics are used to summarize the data and inferential statistics are used to generalize the results from the sample to the population. In turn, inferential statistics are used to make conclusions about whether or not a theory has been supported, refuted, or requires modification.
Descriptive statistics are used to organize or summarize a set of data. Examples include percentages, measures of central tendency mean, median, mode , measures of dispersion range, standard deviation, variance , and correlation coefficients. Measures of central tendency are used to describe the typical, average and center of a distribution of scores. The mode is the most frequently occurring score in a distribution. The median is the midpoint of a distribution of scores. The mean is the average of a distribution of scores.
Measures of dispersion are also considered descriptive statistics. They are used to describe the degree of spread in a set of scores.
So are all of the scores similar and clustered around the mean or is there a lot of variability in the scores? The range is a measure of dispersion that measures the distance between the highest and lowest scores in a distribution.
The standard deviation is a more sophisticated measure of dispersion that measures the average distance of scores from the mean.
The variance is just the standard deviation squared. So it also measures the distance of scores from the mean but in a different unit of measure. Typically means and standard deviations are computed for experimental research studies in which an independent variable was manipulated to produce two or more groups and a dependent variable was measured quantitatively.
Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes.
Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations. Data transformations help you make your data normally distributed using mathematical operations, like taking the square root of each value.
Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.
A statistic refers to measures about the sample , while a parameter refers to measures about the population. A sampling error is the difference between a population parameter and a sample statistic.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
Have a language expert improve your writing. Check your paper for plagiarism in 10 minutes. Do the check. Generate your APA citations for free!
APA Citation Generator. Home Knowledge Base Statistics An introduction to inferential statistics. An introduction to inferential statistics Published on September 4, by Pritha Bhandari. Inferential statistics have two main uses: making estimates about populations for example, the mean SAT score of all 11th graders in the US. What is sampling error? What is hypothesis testing? Is this article helpful? Pritha Bhandari Pritha has an academic background in English, psychology and cognitive neuroscience.
As an interdisciplinary researcher, she enjoys writing articles explaining tricky research concepts for students and academics.
0コメント