MSE 609 Quantitative Data Analysis

OneDay3發表於2024-11-09

原文網址 : https://www.cnblogs.com/comp9321/p/18536476

MSE 609 Quantitative Data AnalysisMidterm 3Instructions:

Prepare your answers using Jupyter Notebook or R Markdown, and submit as a PDF or HTMLdocument. Ensure your submission is clear, organized, and well-formatted.Use complete sentences when explaining, commenting, or discussing. Provide thorough answerswithin the context of the problem for full credit.

Show all work and reasoning in your submission. Your grade will depend on the clarity, detail,and correctness of your answers.

The exam is open book and open notes. You may use textbooks, course notes, and approvedcoding tools (e.g., Jupyter Notebook or R Studio). However, using generative AI tools (e.g., largelanguage models) is not permitted.
Total points = 100.
The exam duration is 1 week. Submit your completed exam by Thursday, November 14, at 11:59
Late submissions will not be accepted.
Upload your submission to Crowdmark in PDF or HTML format1 (42 points total) The data file question1.csv contains information about the economies of 366metropolitan areas (MSAs) in the United States for the year 2006. The dataset includes variablesuch as the population, the total value of all goods and services produced for sale in the city thatyear per person (“per capita gross metropolitan product”, pcgmp), and the share of economicoutput coming from four selected industries.(1 points) Load the data file and confirm that it contains 366 rows and 7 columns. Explain whythere are seven columns when only six variables are described in the dataset.

(1 points) Compute summary statistics for the six numerical columns.(4 points) Create univariate exploratory data analysis (EDA) plots for population and per capitaGMP. Use histograms and boxplots, and describe the distributions of these variables(4 points) Generate a bivariate EDA plot showing per capita GMP as a function of populationDescribe the relationship observed in the plot.

(3 points) Using only basic functions like mean, var, cov, sum, and arithmetic operations, calculate the slope and intercept of the least-squares regression line for predicting per capita GMPbased on population(3 points) Compare the slope and intercept from your calculations to those returned by the function in R. Are they the same? Should they be(3 points) Add both regression lines to the bivariate EDA plot. Comment on the fit and whetherthe assumptions of the simple linear regression model appear to hold. Are there areas where thfit seems particularly good or poor?(3 points) Identify Pittsburgh in the dataset. Report its population, per capita GMPthe percapita GMP predicted by your model, and the residual for Pittsburgh(2 points) Calculate the mean squared error (MSE) of the regression model. That is, comput(2 points) Discuss whether the residual for Pittsburgh is large, small, or typical relative to thMSE2k. (4 points) Create a plot of residuals (vertical axis) against population (horizontal axis). Whapattern should you expect if the assumptions of the simple linear regression model are valid? Doethe plot you generated align with these assumptions? Explain(3 points) Create a plot of squared residuals (vertical axis) against population (horizontal axis)What pattern should you expect if the assumptions of the simple linearegression model are代寫MSE 609 Quantitative Data Analysis valid?Does the plot you generated align with these assumptions? Explain3 points) Carefully interpret the estimated slope in the context of the actual variables involvedin this problem, rather than using abstract terms like ”predictorvariable” or ”X”points) Using the model, predict the per capita GMP for a city with a population that is 105higher than Pittsburgh’(3 points) Discuss what the model predicts would happen to Pittsburgh’s per capita GMP if policy intervention were to increase its population by 105 people.3 (40 points total) In real-world data analysis, the process goes beyond simply generating amodel and reporting the results. It’s essential to accurately frame theproblem, select appropriateanalytical methods, interpret the findings, and communicate them in a way that is accessible to anaudience that may not be familiar with advanced statistical methods.Research Scenario: Coral shells, known scientifically as Lithoria crusta, are marine mollusksthat inhabit rocky coastal areas. Their meat is highly valued as a delicacy, eaten raw or cooked inmany cultures. Estimating the age of Lithoria crusta, however, is difficultsince their shell size iinfluenced not only by age but also by environmental factors, such as foodsupply. The traditionalmethod for age estimation involves applying stain to a shell sample and counting rings under amicroscope. A team of researchers is exploring whether certain physical characteristics of Lithoria crusta, particularly their height, might serve as indicators of age. They propose using a simplelinear regression modelwith normally distributed errors to examine the association between shellheight and age, positing that taller shells are generally older. The dataset for this research isavailable at question2.csv.(3 points) Load the data. Describe the research hypothesis.(4 points) Examine the two variables individually (univariate). Find summary measures foeach (mean, variance, range, etc.). Graphically display each and describe your graphs. What is theunit of height?(4 points) Generate a labeled scatterplot of the data. Describe interesting features or trends youobserve. points) Fit a simple linear regression to the data, predicting the number of rings using theheight of the Lithoria crusta(4 points) Generate a labeled scatterplot that displays the data and the estimated regressiofunction line (you may add this to the previous scatterplot). Describe the fit of the line(5 points) Perform diagnostics to assess whether the model assumptions are met. If not, appropriately transform the height and/or number of rings and re-fit your model. Justify your decisionsnd re-check your diagnostics(4 points) Interpret your final parameter estimates in context. Provide 95% confidence intervalsfor β0 and β1, and interpret these in the context of the problem.(ints) Determine whether there is a statistically significant relationship between the heigh4and the number of rings (and hence, the age) of Lithoria crusta. Explain your findings in thecontext of the problem.(4 points) Find the point estimate and the 95% confidence interval for the average number ofrings for a Lithoria crusta with a height of 0.128 (in the same unit aother observations of height)Interpret this in the context of the problem.(4 points) We are interested inpredicting the number of rings for a Lithoria crusta with a heightof 0.132 (in the same unit as other observations of height). Find the predicted value and a 99%prediction interval.(3 points) What are your conclusions? Identify a key finding and discuss its validity. Canyou come up with any reasons for what you observe? Do you have any suggestions or recommendations for the researchers? How could this analysis be improved? (Provide 6–8 sentences in total.)5 (18 points total) Load the stackloss data:data(stackloss)names(stackloss)help(stackloss)(3 points) Plot the data and describe any noticeable patterns or trends.(5 points) Fit a multiple regression model to predict stack loss from the three other variables.The model isY = β0 + β1X1 + β2X2 + β3X3 + ϵ where Y is stack loss, is airflow, X2 is water temperature, and X3 is acid concentration. Summarize the results of theregression analysis, including the estimatedcoefficients and their interpretation(points) Construct 90 percent confidence intervals for the coefficients of the linear regressionmodel. Interpret these intervals in the context of the proble(3 points) Construct a 99 percent prediction interval for a new observation when Airflow = 58,Water temperature = 20, and Acid = 86. Interpret the prediction interval(4 points) Test the null hypothesis H0 : β3 = 0. What is the p-value? Based on a significancelevel of α = 0.10, what is your conclusion? Explain your reasoning.6

[論文解讀]A Quantitative Analysis Framework for Recurrent Neural Network
2020-10-22
Framework
Web Scraping & Data Analysis
2024-10-24
WebAPI
Statistics and Data Analysis for Bioinformatics
2024-10-30
ORM
Time Series Analysis (Best MSE Predictor & Best Linear Predictor)
2023-02-08
Exercise 5: Field data acquisition and analysis
2024-10-03
UI
EBIS4043 Big Data Analysis and Applications
2024-10-26
APP
python_for_data_analysis_2nd_chinese_version
2024-11-06
Python
RISK ANALYSIS
2024-11-05
Flutter Analysis Options
2020-05-29
Flutter
HanLP Analysis for Elasticsearch
2019-04-22
HanLPElasticsearch
win10怎麼關閉mse_win10中mse如何關閉
2020-09-06
Win10
An Analysis of Sequential Recommendation Datasets
2024-04-24
A Security Analysis Of Browser Extensions
2020-08-19
pytorch contributing - matmul analysis
2024-07-22
PyTorch
MSE 風險管理功能釋出
2022-11-03
Slither: A Static Analysis Framework For Smart
2020-10-28
Framework
Elasticsearch Analysis 分析器
2021-09-09
Elasticsearch
MSE標籤路由支援JDK 11嗎？
2023-05-03
路由JDK
Oracle Respones-Time Analysis Reports
2019-02-28
Oracle
Fishing for Hackers: Analysis of a Linux Server Attack
2020-08-19
LinuxServer
ECON705 Housing Aﬀordability Analysis
2024-11-15
Problems in Mathematical Analysis (American First Edition)
2024-08-17
Analysis of Set Union Algorithms 題解
2024-08-02
Go
SAP QM Certificate of Analysis – Incoming Certificate
2021-07-21
Big Data and Data Warehousing
2020-03-24
go~在阿里mse上使用redis.call
2024-04-06
Go阿里Redis
MSE 結合 Dragonwell，讓 Java Agent 更好用
2022-11-10
GoJava
what-i-learned-from-analysis-vuepress
2019-11-07
Vue
09.elasticsearch-analysis-normalizer應用
2020-10-19
ElasticsearchORM
R語言-Survival analysis（生存分析）
2019-05-31
R語言
CS209A Analysis of the Olympic Historical Dataset
2024-10-20
量化交易（Quantitative Trading）系統原始碼開發詳情/Python編寫技術
2023-05-10
原始碼Python
mse~路由實現某個頁面的灰度功能
2024-10-31
路由
阿里雲 MSE 支援 Go 語言流量防護
2022-07-28
阿里Go
微服務引擎 MSE 8 月產品動態
2022-09-05
微服務
MSE 費芮新金融行業標杆案例
2022-08-22
行業
win10 mse防毒軟體怎麼關閉防護_win10 mse防毒軟體如何關閉防護
2020-03-19
Win10防毒
ME5701 Linear stability analysis of Mathieu equation
2024-11-02

MSE 609 Quantitative Data Analysis

相關文章