More on Mutual Information (MI)
Recall that, in communication system information source emits X into information channel which in turn emits Y.

We therefore observe Y, i.e., we know Y. The question then is:

What does knowing Y tell us about X?

This YX relationship is called mutual information.

Which is,

"reduction in uncertainty of X given the knowledge of Y".

There are at least two approaches to deriving the mutual information:

1. Shannon's approach (using Entropy Algebra)
I(X; Y) = H(X) − H(X|Y)
2. Statistical approach (using Probability distribution). where, p(xi) p(yj) = 0 because if X & Y were statistically independent I(X;Y) = 0.
Shannon's approach was illustrated in lecture 3.

## Statistical approach of deriving I(X; Y) and also showing I(X; Y) = I(Y; X).

Since, mutual information is the relative entropy between the joint distribution and the product distribution p(xi) p(yj),

But,
p(xi, yj) = p(xi|yj) p(yj) = p(yj|xi) p(xi)
And it is easier to find conditional probability than joint probability and hence for more convenient calculation we use the form p(xi|yj) p(yj) or p(yj|xi) p(xi).

Thus,

Therefore, I(X; Y) = I(Y; X) is numerically equal.

## Partial mutual information

From the above (statistical) expression of I(X; Y)

Thus, We shall call I(xi; Y) and I(X; yi) the partial mutual information.