More on Mutual Information (MI)
Recall that, in communication system information source emits X into information channel which in turn emits Y.

observing output Y in communication system
We therefore observe Y, i.e., we know Y. The question then is:

What does knowing Y tell us about X?

This YX relationship is called mutual information.

Which is,

"reduction in uncertainty of X given the knowledge of Y".

There are at least two approaches to deriving the mutual information:

  1. Shannon's approach (using Entropy Algebra)
    I(X; Y) = H(X) − H(X|Y)
  2. Statistical approach (using Probability distribution).
    I(X; Y) = for all x sub i, for all y sub j, product of p(x sub i, y sub j) and log base 2 of (p(x sub i, y sub j) over (product of p(x sub i) and p(y sub j)))
    where, p(xi) p(yj) = 0 because if X & Y were statistically independent I(X;Y) = 0.
Shannon's approach was illustrated in lecture 3.

Statistical approach of deriving I(X; Y) and also showing I(X; Y) = I(Y; X).

Since, mutual information is the relative entropy between the joint distribution and the product distribution p(xi) p(yj),

I(X; Y) = for all x sub i, for all y sub j, product of p(x sub i, y sub j) and log base 2 of (p(x sub i, y sub j) over (product of p(x sub i) and p(y sub j)))
But,
p(xi, yj) = p(xi|yj) p(yj) = p(yj|xi) p(xi)
And it is easier to find conditional probability than joint probability and hence for more convenient calculation we use the form p(xi|yj) p(yj) or p(yj|xi) p(xi).

Thus,

I(X; Y) = for all x sub i, for all y sub j, sum of products of p(x sub i given y sub j) times p(y sub j) times log base 2 of (p(x sub i given y sub j) over p(x sub i) OR I(X; Y) = for all x sub i, for all y sub j, sum of products of p(y sub j given x sub i) times p(x sub i) times log base 2 of (p(y sub j given x sub i) over p(y sub j)
Therefore, I(X; Y) = I(Y; X) is numerically equal.



Partial mutual information

From the above (statistical) expression of I(X; Y)

I(X; Y) = for all y sub j, sum of I(X; y sub j) OR I(X; Y) = for all x sub i, sum of I(x sub i; Y)
Thus,
I(X; Y) = for all y sub j, sum of I(X; y sub j) OR I(X; Y) = for all x sub i, sum of I(x sub i; Y)
We shall call I(xi; Y) and I(X; yi) the partial mutual information.

Next:

Directed graphs (p:2) ➽