You inquired about the code x = r2_score(y_true, y_pred) in Python.
Note that:
y_pred |
y_true |
It stands for 'prediction' of the y-variable |
y_true stands for true value of the y-variable |
predicted value is not the raw data. At times it gives line of best fit, |
y_true are the raw numbers collected during an experiment, survey, or scientific study. |
Suppose that you have a model of a teenager's height as a function of age.
Age(in years) |
10 |
12 |
14 |
16 |
Height (inches) |
55 |
60 |
65 |
70 |
The term "age" refers to the number of years since one was born
In addition, we round down to the next entire year. A youngster who is 10.81773 years old is given the age of ten.
An anticipated value might be that you believe a youngster of ten years old is 55 inches tall on average.
If you undertake a research in which you measure the height of 1,038 10-year-old children, you will discover that they are not all exactly 55 inches tall.
The collection of true y-values refers to the raw data (measured heights of youngsters).
Statisticians frequently calculate error by comparing a child's measured height to the expected height.
Shiny, a 10-year-old girl, is 52 inches tall (rounded to the nearest whole inch). Shina's height was estimated to be 55 inches. Between the true and forecasted figures, there is a 3 inch disparity. Statisticians frequently prefer a single number for a data collection rather than 1,038 separate ones.
Converting the difference between the projected and actual heights of the children into a positive number is one option. For instance, a -5 becomes a +5. The average positive difference (in inches) between actual and forecasted height is then computed.
It's crucial to consider the absolute difference. Some youngsters are shorter (-2 inches) than expected, while others are taller (+7 inches).
If negative numbers are allowed, the average difference between average and actual height is always 0.
-
Take a look at the 1,038 actual heights.
-
Take 55 inches off your real height.
-
The result of adding the height disparities without converting to positive values
-
Result is always zero.
In reality, one definition of mean is a number x such that when you calculate the difference between each data point and x, then add the results, the result is zero.
The disparities are usually squared by statisticians. Shina's squared inaccuracy is +4 inches because she is short (-2 inches). multiples of a negative number A positive number is never a negative number.
The negative signs are removed by squaring. The negative indicators are removed when the absolute value is used. In fact, there are a million different strategies to get rid of the negative indicators.
Nobody has a formula that perfectly computes for data-set A and data-set B, depending on which data-set is more "spread-out."
It's tough to tell what matters to people. In any case, mean-square-error is preferable to nothing. IT determines how dispersed a data set is.
Are the data-points all wide apart from the average
What if a 10-year-old child's true average height was 55 inches? Consider what would happen if the true standard deviation was 4 inches.
Assume you took a random sample of 1,038 youngsters, each 10 years old, in that fictitious universe.
Your sample-variance is 7.1091 inches (based on experimental data).
What are the chances that a group of 1,038 kids will have a difference of 7.1091 inches or more?
If your model is true, how likely is it that the data will deviate as much as you observed from the model's prediction?
If the data you get is significantly different from the projected value, your model is most likely flawed.
In either case, the R-squared value is:
100% if the differences between the data and the prediction are sufficiently explained by random chance. 0% if the data does not fit the model at all.
For example, if you throw a fair-coin 1,000 times, it's likely that 491 of the results will be heads rather than exactly 500 "heads."