Page 18-51
Additional equations for linear regression
The summary statistics such as
Σ
x,
Σ
x
2
, etc., can be used to define the
following quantities:
−
=
⋅
−
=
−
=
∑
∑
∑
=
=
=
n
i
i
n
i
i
x
n
i
i
xx
x
n
x
s
n
x
x
S
1
1
2
2
1
2
1
)
1
(
)
(
2
1
1
2
2
1
2
1
)
1
(
)
(
−
=
⋅
−
=
−
=
∑
∑
∑
=
=
=
n
i
i
n
i
i
y
n
i
i
y
y
n
y
s
n
y
y
S
−
=
⋅
−
=
−
−
=
∑
∑
∑
∑
=
=
=
=
n
i
i
n
i
i
n
i
i
i
xy
n
i
i
i
xy
y
x
n
y
x
s
n
y
y
x
x
S
1
1
1
1
2
1
)
1
(
)
)(
(
From which it follows that the standard deviations of x and y, and the
covariance of x,y are given, respectively, by
1
−
=
n
S
s
xx
x
,
1
−
=
n
S
s
yy
y
, and
1
−
=
n
S
s
yx
xy
Also, the sample correlation coefficient is
.
yy
xx
xy
xy
S
S
S
r
⋅
=
In terms of
x,
y, S
xx
, S
yy
, and S
xy
, the solution to the normal equations is:
x
b
y
a
−
=
,
2
x
xy
xx
xy
s
s
S
S
b
=
=
Prediction error
The regression curve of Y on x is defined as Y =
Α
+
Β⋅
x +
ε
. If we have a set
of n data points (x
i
, y
i
), then we can write Y
i
=
Α
+
Β⋅
x
i
+
ε
I
, (i = 1,2,…,n),
where Y
i
= independent, normally distributed random variables with mean
(
Α + Β⋅
x
i
) and the common variance
σ
2
;
ε
i
= independent, normally distributed
random variables with mean zero and the common variance
σ
2
.