Insurance - Statistical Learning Project

Data Description: The data at hand contains medical costs of people characterized by certain attributes.

Domain: Healthcare

Context: Leveraging customer information is paramount for most businesses. In the case of an insurance company, attributes of customers like the ones mentioned below can be crucial in making business decisions. Hence, knowing to explore and generate value out of such data can be an invaluable skill to have.

Attribute Information: age: age of primary beneficiary sex: insurance contractor gender, female, male bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^2) using the ratio of height to weight, ideally 18.5 to 24.9 children: Number of children covered by health insurance / Number of dependents smoker: Smoking region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest. charges: Individual medical costs billed by health insurance. Learning Outcomes:  Exploratory Data Analysis  Practicing statistics using Python  Hypothesis testing

Objective: We want to see if we can dive deep into this data to find some valuable insights.

1. Import the necessary libraries.

2. Read the data as a data frame.

3. Perform basic EDA which should include the following and print out your insights at every step.

3(a). Shape of the data.

3(b). Data type of each attribute.

3(c). Checking the presence of missing values.

3(d). 5 point summary of numerical attributes.

3(e). Distribution of ‘bmi’, ‘age’ and ‘charges’ columns.

3(f). Measure of skewness of ‘bmi’, ‘age’ and ‘charges’ columns.

3(g). Checking the presence of outliers in ‘bmi’, ‘age’ and ‘charges columns.

3(h). Distribution of categorical columns (include children).

3(i). Pair plot that includes all the columns of the data frame.

4. Answer the following questions with statistical evidence.

4(a). Do charges of people who smoke differ significantly from the people who don't?

Conclusion: Rejecting the Null hypothesis as the p-value is lesser than 0.05. It tells us that the paid charges by the smokers and non-smokers is significantly different.Smokers pay higher charges in comparison to the non-smokers

4(b). Does bmi of males differ significantly from that of females?

Conclusion: Accepting Null hypothesis as the p-value is greater than 0.05. Hence, BMI does not change significantly on basis of Gender.

4(c). Is the proportion of smokers significantly different in different genders?

Conclusion: Rejecting Null hypothesis as the p-value is less than 0.05. So, Smoking Habits differs from gender to gender.

4(d). Is the distribution of bmi across women with no children, one child and two children, the same?

Conclusion: Accepting Null hypothesis as the p-value is greater than 0.05. So, the number of children does not bring any difference in women's bmi.

And, Project is over!!!

Completed by: Ganpat Patel Email: ganpat.patel.012@gmail.com