Selecting the root node in a decision tree is selecting the attribute that splits the data into subsets that are most pure with respect to the target variable. It is usually made on one of a number of criteria that evaluate the quality of the split. Some of these criteria include:
Gini Impurity: Employed mainly in the CART (Classification and Regression Trees) algorithm, Gini impurity calculates the impurity of a dataset. The attribute that produces the lowest Gini impurity following the split is selected as the root node.
Entropy and Information Gain: In the ID3 algorithm (and its descendants such as C4.5), entropy is a measure of uncertainty or disorder in a set. Information gain is the decrease in entropy following a split of a dataset on an attribute. The attribute with the maximum information gain is chosen to be the root node.
Chi-Squared Statistic: For situations in which the data is not numeric, the chi-squared test would be employed to consider how much the observed distribution varies from the expected distribution assuming independence. The attribute with the most extreme chi-squared value would then be chosen as the root.