Understanding more about the inequality H(Y) < H(C)
Recall the example,

Since, p(xi, ηj) = p(xi) p(ηj) ≠ 0

H(C) = H(X) + H(N|X) = H(X) + H(N) = 1 + 0.9212928 = 1.9212928
Hence H(Y) < H(C).

## Most functions are information lossy

For the system with minimal specifications

This can be shown in directed graph as

But we know that p(Y) = transitional probability × p(X).
Thus,

p(y0) = p(y0|x0) p(x0) p(y0|x0) = p(y0) ÷ p(x0) = 0.05 ÷ 0.5    = 0.1
Similarly,
p(y1) = p(y1|x0) p(x0) p(y1|x0) = p(y1) ÷ p(x0) = 0.4 ÷ 0.5    = 0.8

p(y3) = p(y3|x1) p(x1) p(y3|x1) = p(y3) ÷ p(x1) = 0.4 ÷ 0.5    = 0.8
p(y4) = p(y4|x1) p(x1) p(y4|x1) = p(y4) ÷ p(x1) = 0.05 ÷ 0.5    = 0.1
Therefore the directed graph is
We know that symbol xi combine with symbol ηj to give symbol yk, i.e., xi + ηj = yk.
Therefore,

Given symbol xi we know about yk

Also

Given symbol yk we know about ηj

Thus p(yk|xi) = p(ηj|xi + ηj = yk ).

Hence we compute the mutual information between X and Y following the addition of N as follows

Therefore given X there is uncertainty in Y, i.e., information is lost in transmission. This can further be explained as follows. For this case H(X) = 1 and
I(X; Y) = H(X) − H(X|Y) H(X|Y) = 1 − 0.90 = 0.1
This implies that given the knowledge of Y there is some uncertainty remaining in knowledge of X. Hence,

"Entropy of outcome of a function is not necessarily the same as entropy of the compound symbol".

This explains why in the above example system

H(Y) < H(C)
or
H(Y) < H(X; N)
Note that
H(Y) ≤ H(C)
but never
H(Y) ≥ H(C)
Therefore (in our example), since xi + ηj = yk = f (xi, ηj) we state

"Most functions are information lossy".

## Identifying confounding causes for information loss and solution to improve SNR (signal-to-noise ratio)

In our example the statement "Most functions are information lossy". applies because

The ⬤ y2 receives inputs from both x0 and x1, and confounds the input.

How can this confounding of input be minimized?

One solution is

Normalization to improve SNR (signal–to–noise ratio).

In our example

N = {−1, 0, +1}
p(N) = {0.1, 0.8, 0.1}
Keeping p(N) = {0.1, 0.8, 0.1} unchanged N = {−1, 0, +1} is normalized to Thus the system with normalized noise results in

whose directed graph is

Therefore we have

Notice that H(X, N) = H(Y). This is because

H(X, N) = H(C) = H(X) + H(N) = 1.921928
From For the above case we get, I(X;Y) = 1.

Therefore information loss from the original symbol X after the output Y is given by

H(X|Y) = H(X) − I(X;Y) = 1 − 1 = 0
Hence there is no longer any confounding symbols.