DSST / Principles of Statistics
Congrats on taking the plunge into Principles of Statistics! This brief guide and 10 question practice should make your exam prep nearly painless.
The exam covers intro-level topics taught in a math or business statistics course, such as probability, correlation, regression, sampling distribution and inferential statistics. It will have 100 questions and take two hours.
Use of a non-programmable calculator is allowed, so be sure to use one to help keep to the time limit. A standard normal table (or z table) will be provided as one of the first few items within the exam. Be sure to select “mark for review” when you see the table, so that you can refer back to it throughout the exam.
To simplify your prep, focus on the five study topics below. Each topic is listed with its percentage weighting on the exam. Good luck!
Simply reading published research studies will make this topic somewhat familiar as it closely follows the research process of identifying data, collecting it, interpreting it, and presenting it. Bonus points to you for any experience working on research studies.
Data is available in two broad categories — quantitative and qualitative.To study either, a level of measurement must be applied based upon whether or not the data can be ordered, has equal intervals, or has a true zero. The level of measurement — nominal, ordinal, interval, or ratio — defines how values are assigned for data collection and analysis. There are primary and secondary sources of data. Primary data is collected directly by the researcher, while secondary data has been collected by another source and provided to the researcher. A full population may be available to collect, but most frequently a sample is taken from the population. A sampling method must be defined for collection such that the size and characteristics of the sample avoid bias when compared to the full population. Descriptive statistics and visual representations are used to gain insight into the data. Measures of frequency distribution, central tendency, and dispersion are descriptive statistics that assist in understanding collected information. Then, both the data and its descriptive statistics can be represented visually for easier presentation in graphs, plots and histograms.
Probability can be simply defined as the likelihood of an event taking place. The concept is fairly simple, but calculating probability involves considering how different events relate to one another as well as the methods and rules determining their probabilities.
Probability can be theoretical, measuring an expectation of what could occur, or experimental, measuring what happens in a test. The set of all possible outcomes from either probability measure is the sample space. Within the sample space, a defined subset of one or more possible outcomes, or event, can be assigned a probability. For a coin flip experiment, the only defined events are heads or tails. Events can be categorized as independent, like that coin flip, or dependent upon another event, such as one player’s hand in a card game. Addition and multiplication rules are applied to calculate probability and conditional probability can be calculated for dependent events. Another factor in calculating probabilities is whether or not the events are combinations (not ordered) or permutations (order matters). The distribution of probabilities can either be discrete (unique, finite values) or continuous (repeatable values). Histograms and density curves can be used as visualizations for these distributions.
Expanding on the concepts from the first study topic — Foundation of Statistics — this topic moves into data analysis. Data analysis is all about finding relationships in the data collected for a research study. It represents the “fun part” of research studies because it leads to conclusions and discoveries.
Calculating and understanding correlations for their strength and direction are key for this section, including visualization of correlated data series using a scatter plot. Linear regressions are covered in this area of the exam with a focus on finding a best fit model, understanding each factor in the model, and using the model for prediction. The factors in a linear regression model are the variables (independent and dependent) and constants (y-intercept and slope). Residuals, or the difference between actual and predicted y-values, can be calculated and visualized on a scatter plot to help find a best fit model.
Distributions have been addressed in other study topics and now this topic will go a bit deeper into this concept with sampling distributions. A sampling distribution differs from the other topics because it reflects the distribution of values from all possible samples taken from a set population. The distribution can be described by its graphical shape on a histogram, median and standard deviation. Z-scores can also be calculated to determine the direction and number of standard deviations an observation is from the mean of the sample. As noted at the beginning of the study guide, a z-score table will be provided in the exam.
The central limit theorem is also part of this study topic. Part of probability theory, it essentially states that, if a population is very large, then its distribution can be considered normal. It is an important component to the study of probability because it allows for the use of normal calculations on a sample from a large, non-normal population.
The most significant portion of the exam is on this study topic, which boils down to two things:
• What conclusions can be made based on sample data?
• How much confidence can we have about these conclusions?
Confidence intervals determine if the mean of a sample represents the mean of the population. Significance testing is a type of inference used to test evidence in the sample data for how well it describes the population. The alternative hypothesis represents what is being tested and is only being tested as having an effect on the population or not, which is called the null hypothesis. Tests can be done in one direction (one-tailed) or for either direction (two-tailed). To test an alternative hypothesis against the null, P-values measure the probability it is true, z-tests compare it to the known population data, and t-tests compare samples for significant differences.
Type-1 or Type-2 errors can be made when testing an alternative hypothesis, so the probability of these errors can be understood with significance level and power tests. Analysis of variance (or ANOVA) is a method for comparing two populations and should be reviewed for both one-way and two-way processes. The final segment of this study topic is non-parametric testing, such as the chi-square test for goodness of fit.
80, 90, 98, 90, 95, 85, 98, 67, 100, 60, 90, 75
Correct Answer: A. -0.05
Explanation: A z-score is a standardized value of observations in a sample that reflects the number of standard deviations away from the mean for each observation. The z-score will be positive for observations higher than the sample mean and negative for observations below the mean. It is calculated as (observation – mean)/standard deviation. The exam score sample mean is 86 and the closest score to the mean is 85, for a z-score of (85-86)/13 = -0.05.
80, 90, 98, 90, 95, 85, 98, 67, 100, 60, 90, 75
Correct Answer: B. 0.29
Explanation: The probability that a student selected at random would be a woman is 6/12 (6 women/12 total students) or 0.50. The probability that a student selected at random scored an A on the exam is 7/12 (7 scores of 90 or more/12 students) or 0.58. Using the multiplication rule, the combined probability of both events is 0.50*0.58 = 0.29.
80, 90, 98, 90, 95, 85, 98, 67, 100, 60, 90, 75
Correct Answer: D. 1
Explanation: The mean of the exam scores is 86 and the standard deviation of the scores is 13. Thus, two standard deviations equal 26 and a student would need a 60 or below to take the exam again.
Correct Answer: A. A range that is 95% certain to contain the mean of the population.
Explanation: A confidence interval is an estimate of an unknown, such as the mean of the full population, with an indication of accuracy of the estimate. A 95% confidence interval reflects that the estimate is correct 95% of the time and incorrect 5% of the time.
Correct Answer: C. Both small OR smaller than the significance level
Explanation: A smaller P-value reflects stronger evidence against the null hypothesis. In addition, if the P-value is less than or equal to the significance level, the null hypothesis can be rejected.
Correct Answer: D. Density curve
Explanation: Since a continuous random variable can take all values among some group of intervals, the probability any event can be described as the area below a density curve. A probability histogram describes a discrete random variable. Answer choices C. and D. represent types of distributions that can be depicted by a density curve.
Correct Answer: C. Is sample biased AND the null hypothesis accepted?
Explanation: Inferential statistics can only support or reject that a relationship exists in the data. If an alternative hypothesis is accepted, then the relationship is not due to chance. If the null hypothesis cannot be rejected, then it does not necessarily mean it can be accepted. The null hypothesis could still be rejected with a new sample, changes to the research design, or by testing a different alternative hypothesis.
Correct Answer: C. Ratio
Explanation: Nominal is the lowest level as its value is only a label with no ordering implied. Ordinal is one step higher because its values can be ranked. Interval is the next step higher because the intervals between values have meaning. A ratio has all these attributes plus a defined scale that makes zero a meaningful value.
Correct Answer: B. Strong relationship; when x increases, y decreases
Explanation: With an absolute value between 0.5 and 1, the relationship between x and y is relatively strong. The negative sign on the correlation coefficient reflects an inverse or negative relationship between x and y, such that the variables rise and fall opposite of one another.
Correct Answer: B. The populations are classified in two categories.
Explanation: One-way ANOVA is used to analyze more than two populations when there is only one way to classify these populations. To analyze more than two populations when there are two ways to classify the populations, two-way ANOVA is used.
Textbooks are great as far as they go, but I’d generally recommend you opt for this exam guide instead. It tends to cut through the confusion and help you accelerate your learning process.
Ok, so the DSST website isn’t the most inviting, but it will give you the best approximation of the real exam experience. Also, the official practice test is quite affordable (currently just $5 per practice exam).
Another website with a very dated design, but as ancient as it looks, this is actually an incredibly valuable resource. Basically, you get a massive set of flashcards that you can use to study human resources management and to really solidify that knowledge so you’re ready for the exam.