An alternate view of state of the channel

In addition to defining 'state' based on symbol input X (set of all x symbols) we can also define 'state' of the channel with respect to Y (set of all y symbols). Hence,

p(Y) = sum from a=0 to g-1, sum from b=0 to h-1 of(p(y sub b Given S sub a) times meu sub a)
where g is the total possible number of states and h is the total possible number of y symbols.
Figure 1 Consider DMS plus duo-binary channel as information source Y

Thus for the above example
p(Y) = sum from a=0 to 2-1, sum from b=0 to 3-1 of(p(y sub b Given S sub a) times meu sub a)

Referring to the state diagram

Figure 4 State diagram of Fig 3 system

we see that given the state S0 probability that y = y0 = −2 or p(y0 | S0) = 0.5. Similarly p(y1 | S0) = 0.5 but p(y2 | S0) = 0. Thus
p(y sub 0 Given S sub 0) = 0.5, p(y sub 1 Given S sub 0) = 0.5, p(y sub 2 Given S sub 0) = 0, p(y sub 0 Given S sub 1) = 0, p(y sub 1 Given S sub 1) = 0.5 and p(y sub 2 Given S sub 1) = 0.5

Let,

capital Pi sub 0 is p(y sub 0), capital Pi sub 1 is p(y sub 1) and capital Pi sub 2 is p(y sub 2)
Note that Π0 + Π1 + Π2 = 1. We can therefore write the equation for the output probabilities as
vector of length 3, capital Pi at n is equal to the product of (matrix 3 by 2 of transitional probabilities, element(1,1) = p(y sub 0 Given S sub 0), element(1,2) = p(y sub 0 Given S sub 1), element(2,1) = p(y sub 1 Given S sub 0), element(2,2) = p(y sub 1 Given S sub 1), element(3,1) = p(y sub 2 Given S sub 0), element(3,2) = p(y sub 2 Given S sub 1)) and (vector 2 by 1 of meu sub 0 at n and meu sub 1 at n)
Thus
vector Capital Pi at n is equal to matrix F times vector mu at n
where  vector mu is the state probability vector and the matrix of state probabilities
matrix of 3 by 2, F = [f sub 00, f sub 10; f sub 01, f sub 11; f sub 02, f sub 12]
such that fij is the transition probability fij = p(yj | Si). Since given a particular state, sum of all probabilities from this state (probabilities in a particular column) must equal 1, f00 + f01 + f02 = f10 + f11 + f12 = 1.

Hence for the above example

vector of length 3, capital Pi at n is equal to [0.5, 0; 0.5, 0.5; 0, 0.5] times [mu sub 0 at n; mu sub 1 at n]
Like with P, the F elements in each column adds to 1 but the elements in each row may not.

The output probability at steady–state (ss) is derived from the limit

limit as n tends to infinity of vector capital Pi at n is equal to steady-state vector of capital Pi = [0.25; 0.5; 0.25]
Note that
steady-state vector of capital Pi = [0.25; 0.5; 0.25] = [p(y = -2); p(y = 0); p(y = +2)]
which was computed at the start of the analysis.

Note that the conditional probabilities shown in the state diagram are all equivalent–

p(S sub 1 Given S sub 0) is equivalent to p(x sub 1 at n Given x sub 0 at n-1) is equivalent to p (y sub 1 Given S sub 0)
In our example Y = {y0 = −2, y1 = 0, y2 = +2}.

Why is the system called Hidden Markov Process?


Since

H(X) = 1, H(Y) = from b=0 to 3-1, sum of (capital Pi sub b times log base 2 of (1 over capital Pi sub b)) = 1.50)
Thus
H(Y) is greater than H(X)
If H'(Y) is the error rate, then because of the above inequality we should suspect that the error rate should be
H'(Y) is less than H(Y)
Note that the entropy H(Y) grows as the output grows, that is
H(Y) = H(y sub 0, y sub 1, ..., y sub n-1)
But, by definition of entropy rate
H'(Y) = limit as n tends to infinity (H(y sub 0, y sub 1, ..., y sub n-1) over n)
Thus it is fair to assume that the error rate is less than the entropy.

Recall that for information source with memory,

Markov process was developed to compute error rates.

However, for channels with memory the output sequence (y0, y1, …, yn−1) is not a Markov process but the State Model is (a Markov process). Thus

p(y sub 0, y sub 1, ..., y sub n-1) is not equal to p(y sub n Given y sub n-1)
and the channel is a function
rule f such that ordered pair (x sub n-1, S sub n) is mapped into y sub n-1
where knowledge about yn−1 does not provide knowledge about the state of channel. Therefore the channel states are called hidden and the system is called Hidden Markov process. The fact that the channel is a function is what makes channels with memory difficult to analyze. One way is to use Gallagher's diagram.

Next:

Gallagher's diagram and Entropy Rate (p:3) ➽