EBIS4043 Big Data Analysis and Applications

hellyou發表於2024-10-26

原文網址 : https://www.cnblogs.com/goodlunn/p/18504353

The purpose of this assignment is to make sure that you are picking up the R based analytics skills (Please do not use other tools to generate the answers!) that have been introduced in this class and check your ability. (Total 50 marks)
1.Use the dataset available at iSpace.
2.Make sure to have the entire process from data loading to analysis and interpretation in the submission.
3.All your answers including your identity, codes, interpretation should be in one file: HTML generated from Rmarkdown file (.rmd). Any sort of multiple files will be graded as zero mark.
4.You can discuss the coding for this assignment with your friends. However, any visible overlap in your interpretation will be considered plagiarism.
5.The use of any generative AI tool is strictly prohibited for this assignment. If such use is detected, it will be considered an attempt at plagiarism.
6.There can be more than one correct answer to every question. Use any technique that you learned from the classroom.
7.If needed, use 20240614 as a random number seed.

Data Description
This dataset is originally from the Orange Telecom’s churn dataset, which consists of customer information known to the telecom company, along with a churn indicator (“TRUE” = canceled the subscription, “FALSE” = otherwise). Regarding the customer information, the dataset contains customers’ location, extra service plans (e.g., international roaming and voice mail services), usage (in terms of minutes, no. of calls, charged fees, …), and so on. All customers in the dataset are from the United States.

Questions
1.Write and execute R code to build and test the below regression equation for predicting the value of the Churn variable using the dataset with 1) Linear Probability Model (LPM) and 2) Logistic regression model. Transforming & creating variables appropriately if needed. Which model has a better fit? (Total 10 marks)

Where CS.contacted: = 0 if the customer has never contacted customer service, = 1 otherwise, and Total.all.charge: = Sum of all fees charged to the telecom customer for calls, except for customer service calls, and 代寫EBIS4043 Big Data Analysis and Applications Total.all.time: = Sum of all time the customer spent on calls, except for customer service (in minutes).
2.Using the LPM model estimated for question 1, plot the effect of Total.all.charge on Churn in the case of CS.contacted = 0 and CS.contacted = 1 while the values of other predictors are held at their mean values. (Total 10 marks)

3.Write and execute R code to build and test the below regression equation for predicting the values of the Churn variable using all predictors in the dataset with 1) Linear Probability Model (LPM) and 2) Logistic regression model. Please use 5-fold cross-validation for both models. (Total 10 marks)
Hint1: use the caret package.
Hint2: use as.factor() function to convert a variable into a factor variable.

4.Based on the results from question 3, which model is preferred for prediction, in terms of accuracy at the threshold of 0.3? (Total 10 marks)
Hint: use data.frame() function to convert the list output from predict() into a dataframe.

5.Do you think the LPM model developed in question 3 can be used for predicting whether a Canadian customer will be churned? Please provide at least two reasons for your answer based on this document and answers you have generated so far. (Total 10 Marks)

Big Data and Data Warehousing
2020-03-24
Web Scraping & Data Analysis
2024-10-24
WebAPI
Statistics and Data Analysis for Bioinformatics
2024-10-30
ORM
MSE 609 Quantitative Data Analysis
2024-11-09
Exercise 5: Field data acquisition and analysis
2024-10-03
UI
The series “Studies in Big Data” (SBD)
2020-06-24
有限元分析與應用 | Finite Element Method (FEM) Analysis and Applications
2024-04-17
APP
ISIT312 Big Data Management
2024-10-17
python_for_data_analysis_2nd_chinese_version
2024-11-06
Python
Kafka教程大全指引 - DZone Big Data
2019-10-23
Kafka
Java vs Big data 哪種程式語言更好？
2019-05-13
Java
帶讀 |《Designing Data-Intensive Applications》(中文：資料密集型系統設計)
2023-01-21
APP
Applications1
2018-12-04
APP
Graph Theory with Applications
2024-05-04
Graph TheoryAPP
HMAC: Introduction, History, and Applications
2024-04-22
MacAPP
2.3.6.2 Synchronization of Multiple Applications
2020-03-16
APP
2.3.3.3.2 Applications at Different Versions
2020-03-16
APP
LLM multiple modal applications
2024-09-17
APP
Deploying LLM Applications with LangServe
2024-12-01
APPGse
RISK ANALYSIS
2024-11-05
Flutter Analysis Options
2020-05-29
Flutter
HanLP Analysis for Elasticsearch
2019-04-22
HanLPElasticsearch
An Analysis of Sequential Recommendation Datasets
2024-04-24
A Security Analysis Of Browser Extensions
2020-08-19
pytorch contributing - matmul analysis
2024-07-22
PyTorch
Slither: A Static Analysis Framework For Smart
2020-10-28
Framework
Elasticsearch Analysis 分析器
2021-09-09
Elasticsearch
TKDE 2017：A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
2018-06-22
APP
Oracle Respones-Time Analysis Reports
2019-02-28
Oracle
Fishing for Hackers: Analysis of a Linux Server Attack
2020-08-19
LinuxServer
ECON705 Housing Aﬀordability Analysis
2024-11-15
Problems in Mathematical Analysis (American First Edition)
2024-08-17
Analysis of Set Union Algorithms 題解
2024-08-02
Go
SAP QM Certificate of Analysis – Incoming Certificate
2021-07-21
Nanotechnology Applications in the Food Industry.epub 免費下載
2019-01-27
NaNAPP
【論文筆記】A review of applications in federated learning（綜述）
2022-05-01
筆記ViewAPP
what-i-learned-from-analysis-vuepress
2019-11-07
Vue
09.elasticsearch-analysis-normalizer應用
2020-10-19
ElasticsearchORM

EBIS4043 Big Data Analysis and Applications

相關文章