6333

ST635 Final Project
Introduction:
Every year many applicants are disappointed by the outcomes of graduate school applications. In research, there are actually some factors(variables) that, together, make up for a huge impact of final results of getting admitted by their dream schools. It is not only compelling but also essential to find out the role each variable plays in the process to help students better understanding which part of work they should focus and improve more in his or her personal condition in order to succeed in getting admitted.
Data Description:
The data we are using is Graduate Admission dataset for our analysis. In particular, we want to find out the relationships of a student’s studying scores and odds of getting admitted by graduate schools. In the following steps, we are using logistic model, linear regression, decision tree model, and clusters to find best model to analyze the result.
The dataset contains several parameters which are considered important during the application for Masters Programs. The parameters included are : 1. GRE Scores ( out of 340 ) 2. TOEFL Scores ( out of 120 ) 3. University Rating ( out of 5 ) 4. Statement of Purpose and Letter of Recommendation Strength ( out of 5 ) 5. Undergraduate GPA ( out of 10 ) 6. Research Experience ( either 0 or 1 ) 7. Chance of Admit ( ranging from 0 to 1 )
Because the number of observations is too small, we may introduce cross validation method to stabilize the prediction accuracy.
Methodology:
Particularly, logistic regression model after separating data into training data and running a logistic regression, we can successfully predict the status of admission for every candidate. Specifically, based on the false positive rate of the results, we find the best model for use. We are using unsupervised and supervised data in particular, checking PCA to explain better sensitivity rate and accuracy. Similarly, for linear regression model, we run linear regression model and look for the possibility of getting admitted above 0.5 is a good indicator, based on positive and negative rate. While running decision tree model, we separate raw data into training data, and then we form basic decision trees using given data in specific conditions we generate. Finally, we analyze and find out the best fitted model with numbers of nodes and numbers of layers. Last but not least, the clusters we are using hierarchy analysis to decide the number of groups and categorize individuals with common characteristic, and then conduct non-hierarchy analysis to based on group we decide.
Motivation:
The reason why we chose this model is we are interested in both finding out the relationship and helping out student to succeed in all perspectives in order to get into good schools. Nevertheless, this is what we have been through. Therefore we are familiar with dataset variables. There are over 400 observations and 7 variables that make it feasible to be predicted and analyzed. The intersection of our background and experience led us to focus on the admission research dataset dedicated towards statistics. Moreover, it is an unique application or rather an improvement of what we learnt in class.

Attachments:

Project-Final….docx

Related

DISCLAIMER

QUICK LINK

WE ACCEPT

CONTACT US