Mutual Information

In general,

H(Y|X) ≠ H(X|Y)
but, because of the chain rule:
H(C) = H(X) + H(Y|X) = H(Y) + H(X|Y) H(X) − H(X|Y) = H(Y) − H(Y|X)
Recall that, H(Y|X) measures the uncertainty remaining in Y. Thus,
H(X) is the original uncertainty in X and H(X given Y) is remaining uncertainty in X given Y
Given the knowledge of Y, the reduction in uncertainty in X is called the mutual information.
I(Y; X) = I(X; Y)
I(X; Y) or I(Y; X) is called Mutual Information and implies that mutual information is reciprocal.

What is the practical meaning of I(X; Y)?


Mutual information, I(X; Y) implies that, given the knowledge of Y, I(X; Y) measures how much we know about X.

To further elucidate this let us consider three cases:

  1. If x = f(y) then it implies that knowing y tells us about x. In other words, given any y all the x are known and hence there is no uncertainty H(X|Y) = 0. Therefore
    I(X; Y) = H(X) − H(X|Y)
    = H(X)      
    which is maximum mutual information.
  2. If xf(y) then it implies that knowing x and y are independent to each other. In other words, given any y nothing about x is known and hence there is maximum uncertainty H(X|Y) = H(X). Therefore
    I(X; Y) = H(X) − H(X|Y)
    = 0           
    which is no (= minimum) mutual information.
  3. Given some knowledge of Y so that there is uncertainty in X and uncertainty remaining in X is H(X|Y) ≤ H(X).
Thus, possible range of mutual information is
0 ≤ I(X;Y) ≤ H(X)

Notice that:

With minimum uncertainty, mutual information is maximum

and

With maximum uncertainty, mutual information is minimum

Application example of mutual information:

Consider a communication system. Here information source X emits symbols x0, x1, x2, x3, … into information channel which in turn emits y0, y1, y2, y3, … into sink Y.

communication system

Therefore,
  1. In ideal communication system, every symbol yi emitted by the information channel communicates with every symbol xi emitted by the information source. In other words,

    Y should tell us about X

    and

    X should tell us about Y.

    Therefore, given Y uncertainty remaining in X is minimum, H(X|Y) = 0. Hence
    I(X; Y) = H(X) − H(X|Y)
    = H(X)      
    which is maximum mutual information.
  2. In non-ideal communication system, every symbol yi emitted by the information channel may not communicate with every symbol xi emitted by the information source. Therefore, given Y there exists some uncertainty remaining in X, H(X|Y) ≠ 0.
    Since the range of equivocation is
    0 ≤ H(X|Y) ≤ H(X)
    but H(X|Y) ≠ 0 thus
    0 < H(X|Y) ≤ H(X)
    Therefore
    I(X; Y) = H(X) − H(X|Y)
    < H(X)      
    which means that information is lost in transmission.

Illustration of "loss in transmission"

Let us assume four binary digits x0, x1, x2 & x3 fed into an encoder box encoding three binary digits c4, c5 & c6.

Encoder

The outputs: x0, x1, x2, x3, c4, c5 & c6 are called Code Word and also called Hamming code (if systematically done).

For instance,

x0, x1, x2, x3, c4, c5 & c6: 7 outputs x0, x1, x2 & x3: 4 inputs
Thus, Hamming code (7,4).

Observe that c4, c5 & c6 are redundant data or error correction. Unlike information error, we don't mind data error. Also note that ⟨c4, c5, c6⟩ = f(x0, x1, x2, x3).

Other applications of mutual information (other than communication system):



Next:

Summary (p:5) ➽