Chain Rule of Entropy

Since

H(X,Y) = for all x sub i, sum of p(x sub i) times log2 (1 over p(x sub i)) + for all x sub i, for all y sub j, sum of p(x sub i, y sub j) times log2 (1 over p(y sub j given x sub i))
Thus,
H(C) = H(X) + H(Y|X)
This is the Chain Rule for Entropy.

Why is 1 = for all y sub j, sum of p(y sub j given x sub i)?


proof for 1 = for all y sub j, sum of p(y sub j given x sub i)

Alternatively,

p(x sub i) = product of p (x sub i) and for all y sub j, sum of p(y sub j given x sub i)
Thus p(xi) = p(xi) only if 1 = for all y sub j, sum of p(y sub j given x sub i).



Heart of Entropy Algebra

Recall, that the chain rule

H(C) = H(X) + H(Y|X)
is derived from the fact that
p(xi, yj) = p(yj|xi) p(xi)
But, p(xi, yj) is also
p(xi, yj) = p(xi|yj) p(yj)
which will therefore lead us to
H(C) = H(X,Y) = H(Y) + H(X|Y)
Thus,
H(X) + H(Y|X) = H(Y) + H(X|Y)
This is the heart of entropy algebra.

Example application of the chain rule.

Let,

cij = ⟨xi, yj, zk
Then,
H(C) = H(X, Y, Z) = H(X, ⟨Y, Z⟩) applying chain rule = H(X) + H(⟨Y, Z⟩|X) = H(X) + H(Y, Z|X) applying chain rule = H(X) + H(Y|X) + H(Z|X, Y)

Generalization of Chain Rule

H(A0,A1,...,An-1) = H(A0) + H(A1 given A0) + H(A2 given A0, A1) + H(A3 given A0, A1, A2) + ...
Thus resulting in
H(A0, A1, …, An−1) = H(A0) + H(A1|A0) + H(A2|A0, A1) + … H(An−1|A0, A1, A2, … An−2)

Next:

Mutual Information (p:4) ➽