**Introduction**

The College Scorecard is a data set that combines data on college admissions rates, test scores, as well as average student debt, and salaries six and ten years after graduation. The government has prepared the data files. The College Scorecard website provides a full and extensive data dictionary. The dataset was used at the Gartner challenge in the year 2017. A competition is available on Kaggle: https://www.kaggle.com/kaggle/college-scorecard

**Key Business Questions**

- Based on data on test scores, cost to attend, size of the school, and salary, which schools offer a good education
- Are there certain schools or states where student debt and loan default rates are high compared to earnings?

- What about diversity ? is there a good ratio of male/female enrollees and ethnic and racial mix?
- What can we find out about college students' income distribution? E.g. Do only rich kids go to Harvard and Yale??

**What schools are good Schools considering test scores, diversity and family income:** The good schools which provide good return in terms of income.Good Schools considers high SAT scores. Salary of the students will be higher in the schools which has high SAT scores.

**What about diversity of students:** 65% of total students who go to Harvard belong to families with annual Household income below 48000 $ and 43.4% of total students belong to families with annual Household income below 30000 $ annual. So the assumption that only rick kids go to Harvard gets rejected that only rich kids go to Harvard and Yale.

**A number of undergraduate students per college heat map and the relationship between earning, debt, and default rate: **Colleges in the West and East Coasts fare better in higher graduates? earnings and lower default rates. However, the debt rate among graduates is high. This can be attributed to the high cost of education in good schools. While most south college students have lower earnings and high default rates.

While considering the size of schools in terms of the number of students per college, Arizona has the highest college size. So on average Arizona has more students per college than other states have students per college.

**College Segmentation: **The **f**ollowing variables have been considered to segment the colleges in order to get a better understanding of colleges:

- Cost to attend
- Debt
- Default rate
- Earnings
- Household Income
- Percentage of students with loan
- SAT math & verbal score
- Total students in college

The variables have been standardized and the segmentation is done using the k-means clustering approach. Each line in the plot indicates one institution and the average values of the variables will be displayed in the table below based on the lines selected by the user. Ex: Cluster 9 and cluster 11 represents high SAT score and low default rate but cluster 5,6 and 8 represents vice versa.

**
Linear Regression Analysis to test the variable significance: **A regression analysis was performed to understand how significant the effect of each variable is on the measure derived to represent college quality. median earning of a student after 6 years of graduation is the dependent variable. We use the p-values in the Table of Coefficients to determine if a variable has any effect on the college quality measure. A variable is said to have an effect (positive or negative) on the college quality measure if it has a p-value of less than 0.05. Among the list of significant variables, a larger absolute value of parameter estimates indicates a more important driver for salary: Debt, SAT Math scores, Household Income, and Cost of attendance are more significant and important variables to affect earnings than others.

