1.True or False: Mark each statement True or False. Please explain.
a)For a given dataset, if the standard deviation is 0, then the mean, the median, and the mode of the dataset are identical.
b)Since the sample is always smaller than the population, the sample mean is always smaller than the population mean.
2.Normal Distribution: At Emerson Insurance, the process of issuing an insurance policy consists of two steps, underwriting the policy (which is the process of evaluation and classification of the policy) and rating the policy (which is the calculation of the premiums). The time to underwrite a policy is approximately normally distributed with a mean of 150 minutes and a standard deviation of 30 minutes. The time to rate a policy is approximately normally distributed with a mean of 75 minutes and a standard deviation of 25 minutes.
a)What is the probability that it takes 120 minutes or less to underwrite a policy?
b)Emerson claims that 95% of the policies are underwritten within 3 minutes. What should so be? In other words, what is the 95th percentile of the underwriting time?
c)It is known that the sum of two random variables that follow a Normal distribution follows a Normal distribution. Let U be the time to issue a policy, that is to underwrite (X) and rate (Y) the policy. The mean of U is computed as
μu= μx+ μy
and the variance is computed as
σu²= σx²+ σy²+2 σx σy Correl(X.Y)
It is estimated that the correlation between the variables X (time for underwriting) and Y (time for rating), Correl(X,Y) = 0.37.
Using the above information, calculate what is the probability that it takes less than 2 hours to issue a policy? What is the 95% percentile of the total time to issue a policy?
3.Confidence Interval and Hypothesis Testing: Josh Bernstein decides to invest one million dollars in a hedge fund. He is considering two candidates: Arrow Asset Management’s flagship fund Arrow Growth, and BTS Asset Management’s BTS Alpha. The annual returns (net fees) of the two funds from 2005–2014 are summarised in Q3 Hedge Fund.xls.
a)James Locke, the fund manager of Arrow Growth is proud of the high return of the fund. In fact, he claims that the average return of his fund will be at least 2% higher than that of BTS Alpha. Based on the data, can you accept James’ claim? Please explain clearly how you set up the statistical test and draw your conclusion.
b)If you are the sale manager of BTS, what do you think is the selling point of your fund?
4.Hypothesis Testing: As part of its cost-cutting efforts, a UK bank is considering to outsource its phone customer service department to an overseas service provider. However, they are not sure whether the service quality will be as good as the in-house one. To test the water, the management conducted a five-day experiment, in which they randomly route an incoming call to either their in-house customer service department or the overseas service provider one. At the end of the phone conversation, they randomly select 2% of customers and ask whether he/she is satisfied with the service. The data of customer satisfaction is summarised in Q4 customer satisfaction.xls.
a) Use the data to test whether the out-sourced service is as good as the in-house one. Please state and explain your hypotheses clearly, carry out the appropriate statistical test, and interpret the result.
b)If the customer service department is outsourced, the direct cost saving is 600,000 pounds per year. However, customer dissatisfaction is costly and it was estimated that every unsatisfied customer will cost the bank 10 pounds in terms of future profit. On average, the bank receives half a million phone calls from customers per year. Given the above information, would you recommend the bank to switch to the overseas service provider? Please base your recommendation on statistical tests.
5)Linear Regression: You are hired as a consultant by VK Office Supply Ltd., which sells consumables related to printing to companies (consulting firms, law firms, and hedge funds, etc) in Chicago. Allison Jones, a co-owner of the company, wants to understand better what drives revenue from each client by using available data. As a close friend of Allison’s, you are confident that you can help Allison answering her questions using statistics. After a meeting with the sales manager, you obtained data for 48 key clients. The data is included in Q5_VK.xls. In the data, revenue from each customer is in the unit of 1000 dollars.
Table 5.1 Simple Regression with the Revenue as the dependent variable the no of printers as the independent variable
a)Allison explains, as her company sells printing consumables, its revenue from each client should increase in the number of printers the client has. Adam, a previous intern, has already run a simple regression between revenue and the no. of printers. The Excel output is presented in Table 5.1. Can you explain to Allison the meaning of the coefficient of No. of Printers?
b)Allison is puzzled with the result in Table 5.1. She argues that is if a client with zero printers, it cannot bring in any revenue, so the intercept of the regression should be zero. Is Allison correct? Please Explain.
c)Vaguely remembering regression analysis from her MBA at LBS, Allison believes that in order to correctly quantify the effect of no. of printers, Adam should run a multiple regression. However, Adam disagreed, arguing that as Allison is only interested in the relationship between revenue and no of printers, a simple regression is the way to go. Whose view do you support, and why?
d)Supporting Allison’s point of view, you plan to run a multiple regression with revenue as the dependent variable, and all other available data as independent variables. Looking at the data, you find that the data also includes which sector the client is in, such as Consulting, Financing and Law, which cannot be included as-is into the regression. How can you incorporate the sector variable in the regression? Please explain exactly what you should do and why?
e)Please carry out the multiple regression, report your regression results, interpret the regression coefficients of the multiple regression, and comment on the overall quality of the model.
f)A potential customer Allison is pitching is a law firm with 650 employees and 450 printers. Using the model you run in e), can you suggest what Allison should expect them revenue from this client to be?
g)Based on her experience, Allison claims that the more printers a company has, the lower the usage per printer will be, so the relationship between revenue and no. of printers may not be linear. Please identify and quantify this relationship. Please explain to Allison what the new coefficient means.
h)Finally, Allison believes each printer from a Law firm generates more revenue than other sectors. Can you explain how to incorporate this feature into your regression model? Please carry out the analysis for Allison.