MLE 5217 : Take-Home Dataset Classification

hellyou發表於2024-10-20

Dept. of Materials Science & Engineering NUS

MLE 5217 : Take-Home Assignments Lecturer Sasani Jayawardhana

Objectives

Based on the chemical composition of materials build a classification model to distinguish metals and non-metalsModel 1), and then build a regression model to predict the bandgap of non-metallic compounds (Model 2).

Please use a separate jupyter notebook for each of the models.

Data The data contains the chemical formula and energy band gaps (in eV) of experimentally measured compounds.These measurements have been obtained using a number of techniques such as diffuse reflectance, resistivitymeasurements, surface photovoltage, photoconduction, and UV-vis measurements. Therefore a given compoundmay have more than one measurement value.

Tasks

Model I (30 marks)

Dataset: Classification data.csv

Fit a Support Vector Classification model to separate metals from non-metals in the data. Ensure that you:

  • Follow the usual machine learning process.
  • Use a suitable composition based feature vector to vectorize the chemical compounds.
  • You may use your judgement on how to differentiate between metals & non-metals. As a guide, two possibleoptions are given below.Option 1 : for metals Eg = 0, and Non-metals Eg > 0Option 2: for metals Eg 0.5, for non-metals Eg > 0.5
  • Use suitable metrics to quantify the performance of the classifier.
  • For added advantage you may optimize the hyper-parameters of the Support Vector Classifier. Note: Optimization algorithms can require high processing power, therefore may cause your computer to freeze (Ensureyou have saved all your work before you run such codes). In such a case you may either do a manualoptimization or leave the code without execution.
  • Comment on the overall performance of the model.

Model II (30 marks)

Dataset: Regression data.csv

Fit a Regression Equation to the non-metals to predict the bandgap energies based on their chemical composition

  • Use a suitable composition based feature vector to vectorize the chemical compounds. You may try multiplefeature vectors and analyse the outcomes.
  • You may experiment with different models for 代 寫MLE 5217 : Take-Home Dataset Classification regression analysis if required.
  • Comment on the overall performance of the model and suggest any short-comings or potential improvements.September 2024Important : Comments
  • Write clear comments in the code so that a user can follow the logic.
  • In instances where you have made decisions, justify them.
  • In instances where you may have decided to follow a different analysis path (than what is outlined in thetasks), explain your thinking in the comments.
  • Acknowledge (if any) references used at the bottom of the notebook.

Submission

  • Ensure that each of the cells of code in the final Jupyter notebooks have been Run for output (Except forthe hyper-parameter optimization if any).
  • The two models (I and II) have been entered in two separate notebooks.
  • Name the files by your name as ”YourName 1.ipynb” and ”YourName 2.ipynb”
  • It is your responsibility to Ensure that the correct files are being submitted, and the file extensionsare in the correct format (.ipynb).
  • Submission will be via Canvas, and late submissions will be penalized.

Evaluation

The primary emphasis will be on the depth and thoroughness of your approach to the problem. Key areas of focuswill include:

* Data Exploration: Demonstrating a thorough investigation of the data, exploring different analyticalpossibilities, and thoughtfully selecting the best course of action.

* Implementation: Translating your chosen approach into clean and efficient code.

* Machine Learning Process: Executing the machine learning process correctly and methodically, ensuringproper data handling, model selection, and evaluation.

* Clarity of Explanation: Providing clear explanations of each step, with logical reasoning for the decisions made.

*Critical Analysis: Identifying any limitations of the approach, suggesting potential improvements, and makingrelevant statistical inferences based on the results.

================================================================

相關文章