8792

In this exercise, you will provide a simpler proof for a special case of corollary 11.6. Assume that each Xi ∈ X is a binary-valued random variable, parameterized with a single parameter qi = Q(x1i).

a. By considering the derivative of F[Φ, Q] and using corollary 11.6 without using Lagrange multipliers.

b. Now, prove corollary 11.6.

Corollary 11.6

In the mean field approximation, Q(Xi) is locally optimal only if

where Zi is a normalizing constant.

This representation shows that Q(Xi) has to be consistent with the expectation of the potentials in which it appears. In our grid network example, this characterization implies that Q(Ai,j) is a product of four terms measuring its interaction with each of its four neighbors:

Each term is a (geometric) average of one of the potentials involving Ai,j. For example, the final term in the exponent represents a geometric average of the potential between Ai,j and Ai,j+1, averaged using the distribution Q(Ai,j+1).