How do I perform feature selection in a disease prediction data set

+1 vote
For feature selection in a disease dataset, should I be familiar with the biology or science behind the disease for doing the feature selection?
Jul 30, 2018 in Data Analytics by Anmol
• 1,780 points

edited Aug 20, 2018 by Anmol 1,394 views

1 answer to this question.

0 votes

Feature selection is based equally upon logic and hit and trial. Logically selecting features is tried first then comes the hit and trial approach.

Selecting features logically includes using the below listed approaches to filter out the un-required features or choose the most dominant one.

  1. Correlation plot
  2. Checking for co-linearity among variables
  3. Selecting variables based on business insight or common knowledge
  4. Building a linear model to check coefficient values assigned to the model

Once you have logically selected a predefined set of response variables, you can use hit and trial approach to combine, add or remove response variables.

Combining can be beneficial in case the target variable is binary, example being obese, having diabetes, having irregular blood pressure can all be combined together to predict a disease.

answered Aug 20, 2018 by Abhi
• 3,720 points

Related Questions In Data Analytics

0 votes
1 answer
+1 vote
2 answers

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 14, 2018 in Data Analytics by zombie
• 3,790 points
29,306 views
0 votes
1 answer

How can I calculate mean per group in a data.frame?

You can use aggregate function for calculating ...READ MORE

answered May 24, 2018 in Data Analytics by zombie
• 3,790 points
4,487 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
2,147 views
+1 vote
2 answers

What is the difference between correlation and covariance?

Correlation and Co-variance both are used as ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
4,865 views
0 votes
1 answer

What is the difference between random forest and decision trees?

The basic difference is that Random Forest ...READ MORE

answered Jul 30, 2018 in Data Analytics by Abhi
• 3,720 points
2,812 views
+1 vote
2 answers

What is the difference between LDA and PCA for dimensionality reduction?

Principal Component Analysis (PCA) is an unsupervised ...READ MORE

answered Mar 7, 2019 in Data Analytics by Seema
• 140 points
15,596 views
0 votes
1 answer

How do I become a data scientist step by step?

I am assuming that you are a ...READ MORE

answered Jul 26, 2018 in Data Analytics by Abhi
• 3,720 points
1,337 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,720 points
1,576 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP