Fascinating Data Sets to improve your Data Science skills | Part-1

Fascinating Data Sets to improve your Data Science skills | Part-1

We have created an archive of data sets for you to use, to practise and improve your skills as a Data Scientist. This will be a 3-part blog series, so look out for the other parts. 

This repository carries a range of themes, difficulty levels, sizes and attributes. These data sets are categorised that way, hence making it suitable for everyone. 

They offer the ability to challenge one’s knowledge and get hands-on practice to boost their skills in areas, including but not limited to, exploratory data analysis, data visualisation, data wrangling and machine learning.

We recommend you test yourself with all the distinct data sets we’ve provided. Feel free to use them in any way you wish.

1) Find out the age of Abalone from physical measurements

 

Level: Beginner

 

Recommended Use: Regression Models

 

Domain: Environment

 

Click here for: Dataset

 

2) Predict student’s knowledge level

Level: Beginner

 

Recommended Use: Classification/Clustering

 

Domain: Education/Web

 

Click here for: Dataset

This data set has 403 rows and 6 columns. It is a real data set about the students’ knowledge status on the subject of Electrical DC Machines.

 

3) Can you predict the fuel-efficiency of a car?

Level: Intermediate

 

Recommended Use: Regression Models

 

Domain: Automobiles

 

Click here for: Dataset

This dataset has 398 rows, 9 columns and provides mileage, horsepower, model year and other technical specifications for cars.

 

4) Was that chest pain an indicator of a heart disease

Level: Intermediate

 

Recommended Use: Classification Models

 

Domain: Health Sciences

 

Click here for: Dataset

This data set provides health examination data among 303 patients who were presented with chest pain and might have been suffering from heart disease. The data set has 14 attributes to find whether the diagnosed patient was found to have a heart disease or not.

 

5) Predict total number of demand of orders

Level: Intermediate

 

Recommended Use: Regression Models

 

Domain: Business

 

Click here for: Dataset

This intermediate level data set has 60 rows and 13 columns. The data was collected during 60 days and is from a real database in a Brazilian logistics company. It has twelve predictive attributes and a target that is the total orders for daily treatment.

 

6) Find out if a donor will give blood in March 2007

Level: Intermediate

Recommended Use: Classification Models

Domain: Business

 

Click here for: Dataset

This data set has 748 instances and 5 attributes. The data is from a donor database, Blood Transfusion Service Center in Hsin-Chu City, in Taiwan. The centre drives their blood transfusion service bus to a university in Hsin-Chu City to gather blood donated about every 3 months.

 

7) Forecast pollution level of a city

Level: Intermediate

 

Recommended Use: Regression Models

 

Domain: Environment

 

Click here for: Dataset

This data set has 43,824 rows and 13 columns. It contains the PM2.5 data from the US Embassy in Beijing. Meteorological data from Beijing Capital International Airport is also included. The data set can be used for pollution level forecasting using the Air Quality attributes provided. It will also offer experience in Multivariate Time Series Forecasting.

 

8) Will the patient survive for at least one year after a heart attack

Level: Intermediate

 

Recommended Use: Classification Models

 

Domain: Automobiles

 

Click here for: Dataset

This data set has 132 rows and 12 columns. It provides data that can be used for classifying if patients will survive for at least one year after a heart attack. All patients listed in the data set suffered heart attacks at some point in the past. Some are still alive and some are not.

 

9) Detect Autistic Spectrum Disorder (ASD) Cases

Level: Advanced

 

Recommended Use: Classification Models

 

Domain: Healthcare/Social Sciences

 

Click here for: Dataset

This advanced level data set has Autistic Spectrum Disorder (ASD) Screening Test Data for 704 adults and has 21 attributes including test takers’ demographics. It also has 10 questions that test takers answered in screening tests. The status of a test taker on ASD is determined and recorded under the Class/ASD variable.

 

10) Estimate the probability of Default

Level: Advanced

 

Recommended Use: Classification Models

 

Domain: Business/Finance

 

Click here for: Dataset

 

This data set has 30,000 rows and 24 columns. The data set could be used to estimate the probability of default payment by credit card client using the data provided.

Pin


Read more such blogs. Explore our A-Z blog page for even more product management insights.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts