The island algorithm is an algorithm for performing inference on hidden Markov models, or their generalization, dynamic Bayesian networks. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes.
The island algorithm is a modification of belief propagation. It trades smaller memory usage for longer running time: while belief propagation takes O(n) time and O(n) memory, the island algorithm takes O(n log n) time and O(log n) memory. On a computer with an unlimited number of processors, this can be reduced to O(n) total time, while still taking only O(log n) memory.[1]
The algorithm
For simplicity, we describe the algorithm on hidden Markov models. It can be easily generalized to dynamic Bayesian networks by using a junction tree.
Belief propagation involves sending a message from the first node to the second, then using this message to compute a message from the second node to the third, and so on until the last node (node N). Independently, it performs the same procedure starting at node N and going in reverse order. The i-th message depends on the (i-1)-th, but the messages going in opposite directions do not depend on one another. The messages coming from both sides are required to calculate the marginal distribution for a node. In normal belief propagation, all messages are stored, which takes O(n) memory.
The island begins by passing messages as usual, but it throws away the i-th message after sending the (i+1)-th one. When the two message-passing procedures meet in the middle, the algorithm recurses on each half of the chain.
Since the chain is divided in two at each recursive step, the depth of the recursion is log(N). Since every message must be passed again at each level of depth, the algorithm takes O(n log n) time on a single processor. Two messages must be stored at each recursive step, so the algorithm uses O(log n) space. Given log(N) processors, algorithm can be run in O(n) time by using a separate processor to do each recursive step (thus taking N/2 + N/4 + N/8 ... = N time on a single processor).
References
- ↑ J. Binder, K. Murphy and S. Russell. Space-Efficient Inference in Dynamic Probabilistic Networks. Int'l, Joint Conf. on Artificial Intelligence, 1997.