However, we were shown a formula which only used AREA to predict value and one formula that used ASSESSMENT (home tax assessment value). Intuitively, you can tell that they are positively correlated to some degree and this was reflected in the formula.
MARKET1 = X + a AREA + e
MARKET2 = Y + b AREA + c ASSESS + e
Where 'e' is the error.
However, in the two formulas, b was less than a. What does that mean? Some of the 'correlated' value between AREA and ASSESS is encompassed both in 'a' and 'c'.
But what if they are VERY highly correlated (or if you deliberately choose one factor which was a linear construction of another factor for a correlation of 1), you can see that it is impossible to create a 'factor' as the two items will move in perfect harmony. Imagine AREA is perfectly correlated to ASSESS or
ASSESS = d AREA
then
MARKET1 = X + a AREA + e
MARKET2 = Y + b AREA + c ASSESS + e
However, if ASSESS = d AREA (perfectly linearly correlated)
then
MARKET2 = Y + b AREA + cd AREA + e
MARKET2 = Y + (b + cd) AREA + e
Since there are only one 'real' factor, this would mean that:
- MARKET1 = MARKET2, the second model would be identical to the first model.
- 'a' would be expressed as (b + cd)
- X = Y
Another interesting result is that if 'b' is not significantly different from 'c', it shows that correlation is probably low.
Also, if any of the 'weight' letters (except for 'e') is close to zero, it implies that these prediction factors don't actually have any bearing on model, but high numbers don't necessary imply importance (it really matters what the factor variables are measured in as it has a relative impact on the final prediction).
No comments:
Post a Comment