MTH 522 Friday 06 October

Title:  Model Evaluation and Comparison

For model evaluation, I have employed two key metrics that is Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). These metrics help us quantify the accuracy of our predictions and provide insights into how well our models are performing. During this evaluation I evaluated the performance of our machine learning models using MSE and RMSE and observed values of 0.24 and 0.22 respectively.

Although looking at the data, we can observe that there are very few features to actually build a model. However, we have tried making a linear regression model to predict the % DIABETIC feature.

As the dataset is quite extensive we need to decide which features to prioritize for our model and Fine-tuning the selected machine learning algorithms for optimal performance was to be a complex task.

 

MTH 522 Wednesday 04 October

Title: Feature selection and model selection

I am not just diving into feature selection and model selection but also paying attention to data visualization. Visualization is a powerful tool for uncovering insights and patterns within our dataset.

So throughout the last few days, I researched and identified a set of potential machine learning algorithms suitable for our task and initiated the process of implementing these algorithms and testing their initial performance on our dataset also created informative visualizations that highlight trends and relationships between variables. These visualizations have already provided valuable insights into our dataset’s characteristics.

MTH 522 Monday 02 October

 

Title: Feature Selection

So in the past few days, I focused on focus on feature selection. Feature selection plays a crucial role in building an effective machine-learning model. I need to determine which features from our dataset are the most relevant and informative for predicting diabetes rates in US counties.

I have reviewed the dataset and performed an initial feature analysis to identify potential predictors of diabetes rates. Through data visualization and statistical analysis, I have gained insights into which features might have a strong impact on diabetes rates.

From now I am pushing myself more deeper into the relationships between features and diabetes rates. Identify correlations and dependencies. Utilize feature importance techniques and explore techniques such as mutual information or feature importance scores from tree-based models to objectively rank features.

MTH 522 Friday 29 September

Title: Data Visualization Preview

So I  made some scatter plots to see how things are related. One of my scatter plots compares the percentage of inactive people to the percentage of obese people. It’s showing that when there are more inactive folks, there tend to be more obese people in those places.

On top of that, I created maps with colors to show how things vary in different parts. These maps will help us see which areas in the United States have high or low diabetes rates, obesity rates, and inactivity levels.

At last, I made some bars and shapes to see how numbers are spread out. I have used these to understand how diabetes rates, the percentage of inactive people, and the percentage of obese people are spread out and if there are any numbers that are very different from the rest. And I think that these pictures are really important for us to understand our data better, and we will make more and make them even better as we go on.

MTH 522 Wednesday 27 September

Title: Analyzing US health data project progress

I continued to make good progress in our analysis of health data across different parts of the United States. Building on what we started in the first week we have gone deeper into our analysis and learned more about how diabetes, obesity, and physical inactivity are related.

During this week, we will do some tests to investigate specific questions about how these health issues are connected. For example, we looked to see if places with a lot of overweight people also have a lot of diabetes cases. These tests gave us evidence to understand these connections better.

We also expanded our work by looking at maps to see where health issues are more common. This helps us see patterns and areas that might need more attention when it comes to public health.

Furthermore, we started making early recommendations for ways to make things better. In places where we found both high rates of diabetes and obesity we came up with some ideas to help people get healthier. These early recommendations will be the basis for more detailed planning in the next phases of our project.

We have been keeping detailed records of our progress so we can share our findings and insights with others effectively. This documentation is essential for creating a comprehensive report that will be useful for policymakers, public health experts, and the communities we aim to help.

MTH 522 Monday 25 September

Title: Group project work

Over the past few days, we’ve made significant progress in our project focused on analyzing data from the CDC (Centers for Disease Control and Prevention) related to diabetes, obesity, and physical inactivity rates across different counties in the United States. We started the project with a clear understanding of our goals and gathered essential datasets, including the 2018 CDC data and a list of common FIPS codes for the three variables we’re interested in.

A substantial amount of effort went into cleaning and organizing the data to make sure it’s accurate and dependable. This involved dealing with missing data, fixing inconsistencies, and handling any unusual data points. We also combined the datasets using FIPS codes, which are unique identifiers for counties, to create a unified dataset that we can use for further analysis.

Next, we conducted Exploratory Data Analysis (EDA), which means we calculated summary statistics and created various charts and graphs to visualize the data. EDA helped us get a preliminary sense of what the data looks like and allowed us to spot any initial trends or patterns.

We also performed correlation analysis to understand how different health factors are related to each other and, where relevant, looked into spatial patterns to see if there are any geographic trends.

 

MTH 522 Wednesday 20 September

 

title : Crab Shell Sizes

Today in class, we tackled an interesting math problem. We looked at a special type of math model that’s really good at handling data that doesn’t follow the usual rules. This data was about crabs and their shell sizes.

We had two types of measurements: one for the crab’s shell size after it molted (which means shed its old shell), and another for the shell size before molting. Our challenge was to figure out if we could predict the “before molting” size based on the “after molting” size.

What made this exciting was that our data didn’t behave like the numbers we usually work with. It was kind of messy and didn’t follow the typical patterns.

By the end of the class, we learned a lot about using math to make predictions, even when things get a bit tricky. I’m looking forward to using what we’ve learned in our future projects.

 

MTH 522 Monday 18 September

title : CDC 2018diabetesdata: interactionmodel

In todays lecture we work on analysis of the CDC 2018 diabetes dataset, we’re delving into the connection between obesity and diabetes. By constructing both linear and quadratic models, we aim to uncover how are given features   relates to diabetes risk.

The linear model provides insights into a straightforward relationship, while the quadratic model allows us to explore potential nonlinear patterns in the data. This research holds promise for better understanding and addressing the complex interplay between obesity and diabetes, with implications for public health strategies and interventions.

MTH 522 Wednesday 13 September

Title: Introduction to P-value

Today’s class was all about important things like p-values and heteroskedasticity. P-values help us figure out how important our findings are in statistics, while heteroskedasticity is a problem when the spread of our data isn’t consistent in regression analysis.

To spot heteroskedasticity, we square the leftover errors (residuals). This makes it easier to see if the spread of errors changes as we look at different parts of our data. We do this because spotting these changes is vital, and squared residuals help us do that.

When we find heteroskedasticity, we have tools like weighted least squares to fix it. These tools use squared residuals to help make our regression analysis more accurate.

 

Mth 522 Monday 11 september

Title: Description about Viscosity and the Kurtosis 

I had recently started working on the assignment dataset.  Since I am new to machine learning i am more dedicated to learn the basic concepts and start working on my first project for advance statistics class.

In today’s lecture, we got to know about viscosity and the kurtosis concept. Now I am going to understand this concept and how these terms would be useful in my dataset.