a.What is the dependent variable we are interested in?
b.Describe what kind of variable the variable identified in part a. is.
c.Select the three (3) categorical variables that you think will best help to understand variation in the variable identified in part a. Ensure that at least one of these variables has 3 or more categories. Explain why you have chosen these variables.
d.Select the three (3) numerical variables that you think will best help to understand variation in the variable identified in part a. Explain why you have chosen these variables.
e.Explain what issues, if any, exist with the observations in the dataset. If any issues were identified, explain how you dealt with them (eg. excluded the observation/used an average/median etc.)
a.Construct a frequency distribution of the variable identified in Question 1a. Be careful not to use too many or too few classes.
b.Plot the information in your frequency distribution in a fully labelled histogram
c.Use appropriate numerical descriptive measures to further summarise/describe the variable identified in Question 1a.
a.Use appropriate tables or graphs to help describe the three categorical variables you have chosen in Question 1c. Use only one table or graph for each variable.
b.Use appropriate tables or graphs to help describe the three (3) numerical variables you have chosen in Question 1d. Use only one table or graph for each variable.
Use the variable for which you presented a frequency distribution (Question 2a) and use the same classes you used for the frequency distribution. Select one other categorical variable from the three selected in Question 1c that has least three categories.
a.Construct a contingency table using these two variables.
b.Identify all joint and marginal probabilities.
c.Comment on the joint and marginal probabilities shown in your table.
d.Calculate the conditional probabilities for one column/row (whichever is longer) of your contingency table.
From 1980 – 1990, the average year 12 score for a student intending to go to University was 80.
a.Using the data provided for this assignment, test whether the average score of students planning to go to university in 2015 has:
- changed OR
- increased OR
(just select ONE of these options to test).
Explain why you have chosen to test i, ii or iii.
When conducting the test, be sure to clearly show your working. Use α= 0.05.
b.Comment on your result above. Do you think it has any implications for universities? Society?
c.Construct a 95% confidence interval for the mean population year 12 score for students intending to go to university in 2015.
d.Construct a 95% confidence interval for the mean population year 12 score for students not intending to go to university in 2015.
e.Compare your results for c. and d. What do they suggest about mean scores of students intending to go to university in 2015 and the mean scores of students not intending to go to university in 2015?
f.If (prior to collecting data) we considered that an acceptable sampling error for each of our 95% confidence intervals constructed in parts c and d was 5, what sample size would we require? (Use sNot_going_uni2015 =30, and s Going_uni2015= 20).
a.Use one of categorical variables selected in Question 1c that has at least 3 categories and conduct a one-way ANOVA on the variable identified in Question 1a.
b.Interpret your results.
c.Do you think there are any problems with your results?
Using the contingency table you constructed in question 4a, test the hypothesis that the two variables are independent of one another. Use α = 0.05. Be sure to state your conclusions and discuss what they mean.
a.Conduct a regression analysis on the variable identified in Question 1a and at least one of the variables identified in Question 1d.
b.Plot the data and line of best fit. Explain what the regression line means.
c.Discuss whether there is a significant relationship between the dependent variable and the independent variable(s) and what this means.Use α = 0.05.
d.Explain what R2 (simple linear regression), or R2 adjusted (multiple linear regression) tells you about the relationship between the dependent and independent variable(s).
e.Explain whether it was appropriate to use linear regression on the variables you have selected. Show evidence to support your case.
a.Summarise your results
b.On the basis of your results, explain whether your choice of variables (Questions 1c & 1d) have been useful in explaining variation in the variable identified in Question 1a.
c.Based on the results you obtained, and your response to Question 9b above, what advice would you give to a policy maker interested in improving Year 12 scores?
d.Do you think there were additional variables in the dataset (ie. variables you did not select for analysis) that would have helped you to better understand variability in the variable identified in Question 1a? Explain.
e.Do you think there are any additional variables for which information should have been collected? Explain your reasoning.