In general,
![]() |
![]() |
☛
Mutual information, I(X; Y) implies that, given the knowledge of Y, I(X; Y) measures how much we know about X.
To further elucidate this let us consider three cases:
-
If x = f(y) then it implies that knowing y tells us about x. In other words, given any y all the x are known and hence there is no uncertainty H(X|Y) = 0. Therefore
I(X; Y) = H(X) − H(X|Y) = H(X) which is maximum mutual information. -
If x ≠ f(y) then it implies that knowing x and y are independent to each other. In other words, given any y nothing about x is known and hence there is maximum uncertainty H(X|Y) = H(X). Therefore
I(X; Y) = H(X) − H(X|Y) = 0 which is no (= minimum) mutual information. - Given some knowledge of Y so that there is uncertainty in X and uncertainty remaining in X is H(X|Y) ≤ H(X).
Notice that:
With minimum uncertainty, mutual information is maximum
and
With maximum uncertainty, mutual information is minimum
❷Consider a communication system. Here information source X emits symbols x0, x1, x2, x3, … into information channel which in turn emits y0, y1, y2, y3, … into sink Y.

Therefore,
-
In ideal communication system, every symbol yi emitted by the information channel communicates with every symbol xi emitted by the information source. In other words,
Y should tell us about X
and
X should tell us about Y.
Therefore, given Y uncertainty remaining in X is minimum, H(X|Y) = 0. HenceI(X; Y) = H(X) − H(X|Y) = H(X) which is maximum mutual information. -
In non-ideal communication system, every symbol yi emitted by the information channel may not communicate with every symbol xi emitted by the information source. Therefore, given Y there exists some uncertainty remaining in X, H(X|Y) ≠ 0.
Since the range of equivocation is0 ≤ H(X|Y) ≤ H(X) but H(X|Y) ≠ 0 thus0 < H(X|Y) ≤ H(X) ThereforeI(X; Y) = H(X) − H(X|Y) < H(X) which means that information is lost in transmission.
Let us assume four binary digits x0, x1, x2 & x3 fed into an encoder box encoding three binary digits c4, c5 & c6.

The outputs: x0, x1, x2, x3, c4, c5 & c6 are called Code Word and also called Hamming code (if systematically done).
For instance,
Observe that c4, c5 & c6 are redundant data or error correction. Unlike information error, we don't mind data error. Also note that 〈c4, c5, c6〉 = f(x0, x1, x2, x3).
Other applications of mutual information (other than communication system):
- Instrumentation
- Numerical algorithm
- Signal processing (eg. Filters)