Understanding more about the inequality H(Y) < H(C)
Recall the example,

Information with Noise labels H(X), H(N), H(Y)

Since, p(xi, ηj) = p(xi) p(ηj) ≠ 0

H(C) = H(X) + H(N|X) = H(X) + H(N) = 1 + 0.9212928 = 1.9212928
Hence H(Y) < H(C).


Most functions are information lossy

For the system with minimal specifications

Information with Noise labels p(X), p(N), p(Y)

This can be shown in directed graph as

directed graph without arrows

But we know that p(Y) = transitional probability × p(X).
Thus,

p(y0) = p(y0|x0) p(x0) p(y0|x0) = p(y0) ÷ p(x0) = 0.05 ÷ 0.5    = 0.1
Similarly,
p(y1) = p(y1|x0) p(x0) p(y1|x0) = p(y1) ÷ p(x0) = 0.4 ÷ 0.5    = 0.8
 
p(y sub 2 given x sub 0) = 0.1 and p(y sub 2 given x sub 1) = 0.1

p(y3) = p(y3|x1) p(x1) p(y3|x1) = p(y3) ÷ p(x1) = 0.4 ÷ 0.5    = 0.8
p(y4) = p(y4|x1) p(x1) p(y4|x1) = p(y4) ÷ p(x1) = 0.05 ÷ 0.5    = 0.1
Therefore the directed graph is
directed graph with arrows
We know that symbol xi combine with symbol ηj to give symbol yk, i.e., xi + ηj = yk.
Therefore,

Given symbol xi we know about yk

Also

Given symbol yk we know about ηj

Thus p(yk|xi) = p(ηj|xi + ηj = yk ).

Hence we compute the mutual information between X and Y following the addition of N as follows

I(X; Y) = 0.9
Therefore given X there is uncertainty in Y, i.e., information is lost in transmission. This can further be explained as follows. For this case H(X) = 1 and
I(X; Y) = H(X) − H(X|Y) H(X|Y) = 1 − 0.90 = 0.1
This implies that given the knowledge of Y there is some uncertainty remaining in knowledge of X. Hence,

"Entropy of outcome of a function is not necessarily the same as entropy of the compound symbol".

This explains why in the above example system

H(Y) < H(C)
or
H(Y) < H(X; N)
Note that
H(Y) ≤ H(C)
but never
H(Y) ≥ H(C)
Therefore (in our example), since xi + ηj = yk = f (xi, ηj) we state

"Most functions are information lossy".

Identifying confounding causes for information loss and solution to improve SNR (signal-to-noise ratio)

In our example the statement "Most functions are information lossy". applies because

directed graph with arrows
The ⬤ y2 receives inputs from both x0 and x1, and confounds the input.

How can this confounding of input be minimized?

One solution is

Normalization to improve SNR (signal–to–noise ratio).

In our example

  N = {−1, 0, +1}
p(N) = {0.1, 0.8, 0.1}
Keeping p(N) = {0.1, 0.8, 0.1} unchanged N = {−1, 0, +1} is normalized to
N = {-0.5, 0, +0.5}
Thus the system with normalized noise results in
Information with Normalized noise

whose directed graph is

Directed graph of Information with Normalized noise

Therefore we have

Fully Labelled Information with Normalized noise

Notice that H(X, N) = H(Y). This is because

H(X, N) = H(C) = H(X) + H(N) = 1.921928
From
I(x sub i; y sub k) = for all y sub k, sum of (product of p(x sub i) times p(y sub k given x sub i) times log of base 2 (p(y sub k given x sub i) over p(y sub k)))
For the above case we get, I(X;Y) = 1.

Therefore information loss from the original symbol X after the output Y is given by

H(X|Y) = H(X) − I(X;Y) = 1 − 1 = 0
Hence there is no longer any confounding symbols.



Next:

Information in terms of usability (useful/useless) (p:2) ➽