DBA3803: Predictive Analytics in Business
Overview
Analytics is best learned by applying the methods and techniques to real-world data and problems. For this project:
1. Identify a real-world problem or an area where predictive analytics can be applied.
2. Obtain real-world data and conduct exploratory data analysis.
3. Apply a predictive analytics method or a combination of methods to solve your problem / answer your question(s).
4. Analyze the results and present your findings and conclusions through a comprehensive report.
Deadline for submission: Friday, Nov 15, 2024, at 2359
Data Sources
Each team is expected to identify an appropriate data source and gather the data. The project scope will largely be driven by the data you find and use. Some possible datasets and associated real-world applications/problems are linked below, but we encourage you to be creative and look into other data sources as well.
• Toronto Open Data Catalogue: Contains various datasets on everything from ambulance locations to traffic cameras.https://open.toronto.ca/
• HetRec:Hosted by the University of Minnesota, contains several open datasets for recommender engines, such as Delicious, Last.FM, Movielens, and IMDB.
https://grouplens.org/datasets/movielens/
• SQuAD: Stanford Question Answering Dataset, contains parsed Wikipedia articles and crowdsourced questions for trivia bots and personal assistants.
https://rajpurkar.github.io/SQuAD-explorer/
• Million Song Dataset:hosted by Columbia University, contains several open datasets for music information analytics.http://millionsongdataset.com/
• Singapore Open Data Portal: data sets collected by Singapore public agencies have been made available and accessible to the public.https://data.gov.sg
• Search for datasets on GitHub, Kaggle, Reddit /r/opendata, and elsewhere!
• If you have an interesting project idea and are missing some crucial data, you could collect your own (for example with surveys, scraping websites, or manual collection).
Final Report
Your final report is a crucial component of this module. It is an opportunity for you to showcase your understanding, analysis, and interpretation of the ideas and methods discussed in the class. The report should be a comprehensive document that follows a structured format, providing a clear and concise presentation of your work. It should include the following sections: Abstract, Introduction, Data, Methods, Results, Discussion, Conclusion, and References. You can also include an appendix for additional technical details, supplementaty data analysis, references, code, etc.
1. Abstract (0.5 page maximum)
The abstract provides a brief summary of the entire project, including the problem statement, data, methods, results, and conclusions. The abstract should be concise but informative, allowing readers to understand the essence of your work.
2. Introduction (1 page maximum)
The introduction should focus on defining the context of your analysis, the problem statement, the objectives of the study, and the data and method used. Clearly define the scope of the project and provide a rationale for why it is important or relevant. End the introduction with a clear thesis statement that outlines what the reader can expect from the rest of the report. The introduction should include the following sections:
• Background and context : Brief background description of the domain of your problem.
• Problem statement: A clear problem statement identifying the question(s) you will answer.
• Data: A description of the sources and datasets that you plan to use, including key variables.
• Methods: An outline of the predictive analytics methods and models implemented in the study.
3. Data (2 pages maximum)
Detail the data sources used in your analysis. Include information on the type of data, its origin, and any preprocessing steps you performed. Discuss any challenges or considerations related to the data, such as missing values or data quality issues.
4. Methods (3 pages maximum)
Explain the analytical methods and techniques you employed in your project. This should include a description of any statistical or machine learning models, algorithms, or tools used. Provide enough detail for the reader to understand your approach but avoid unnecessary technical jargon.
5. Results (3 pages maximum)
Present your findings in a clear and organized manner. Use visualizations methods such as charts, graphs, or tables to enhance the interpretation of your results. Highlight key trends, patterns, or insights that emerge from your analysis.
6. Discussion (0.5 page maximum)
Interpret the results in the context of your research questions or objectives. Discuss any unexpected findings and compare your results to existing literature or industry benchmarks. Address the limitations of your analysis and propose potential explanations or areas for further investigation.
7. Conclusion (0.5 page maximum)
Summarize the key takeaways from your analysis, and clearly state the answer to your questions. Finally, discuss the implications of your findings for the broader field or problem area.
8. Appendix (No page limit)
For your additional technical details, supplementaty data analysis, references, code, etc. Please make sure that it is well referenced from the main text.
Additional tips
• Use clear and concise language, avoid lengthy or needless descriptions and paragraphs.
• Ensure a logical flow between sections to facilitate the understanding.
• Include relevant codes, algorithms, or technnical details in an appendix if necessary.