Implementing a Markov Chain Algorithm in Perl for English Text Generation

Markov Chain Algorithmin Perl Michael Conway CS 265 May 4, 2011

Markov Chain Algorithm Goal: Mimic proper English composition. 1. Populate prefix hash table with suffix lists. 2. Start at the beginning and jump from prefix to prefix, printing suffixes.

Perl Implementation # markov.pl: markov chain algorithm for 2-word prefixes $MAXGEN = 10000; $NONWORD = "\n"; $w1 = $w2 = $NONWORD; # initial state while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w1}{$w2}}, $_); ($w1, $w2) = ($w2, $_); # multiple assignment } } push(@{$statetab{$w1}{$w2}}, $NONWORD); # add tail $w1 = $w2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w1}{$w2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; ($w1, $w2) = ($w2, $t); # advance chain }

Hash Generation $w1 = $w2 = $NONWORD; # initial state while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w1}{$w2}}, $_); ($w1, $w2) = ($w2, $_); # multiple assignment } } push(@{$statetab{$w1}{$w2}}, $NONWORD); # add tail • Iterate over words in stdin, store suffixes • IMPORTANT code segment: @{$statetab{$w1}{$w2}} -> $statetab is implicitly declared hash -> $statetab{$w1} is i.d. reference to hash -> @{ } gets array “referenced” by $statetab{$w1}{$w2} • Note: <>, foreach, push(), multiple assignment

Output Generation $w1 = $w2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w1}{$w2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; ($w1, $w2) = ($w2, $t); # advance chain } • Same $statetab{$w1}{$w2}construction used for array reference • Note: rand, exit line, ->, interpolated string in print, multiple assignment

Relative Performance

Pros and Cons • Pros: • Very short source code • Necessary structures (array, hash) are built-in • Decent performance • Cons: • Can be confusing, especially to new users • Outperformed by some (like C) • Difficult to extend to different prefix sizes

Extension: Different Prefix Sizes # markov_n.pl: markov chain algorithm for n-word prefixes $PREFLEN = 5; # or whatever $MAXGEN = 80; $NONWORD = "\n"; foreach $i (0..$PREFLEN-1) { $words[$i] = $NONWORD; # initial state } while (<>) { # read each line of input foreach (split) { push(@{hash_lookup(\@words)}, $_); @words = (@words[1..$#words],$_); } } push(@{hash_lookup(\@words)}, $NONWORD); # add tail

Extension: Different Prefix Sizes @words = (); foreach $i (0..$PREFLEN-1) { $words[$i] = $NONWORD; } for ($i = 0; $i < $MAXGEN; $i++) { $suf = hash_lookup(\@words); # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; @words = (@words[1..$#words],($t)); # advance chain } sub hash_lookup { my $ref = \%statetab; my @wds = @{@_[0]}; for ($i = 0;$i < $#wds;$i++) { $ref = \%{${$ref}{$wds[$i]}}; } $ref = \@{${$ref}{$wds[$#wds]}}; return $ref; }

Questions?

Implementing a Markov Chain Algorithm in Perl for English Text Generation

Implementing a Markov Chain Algorithm in Perl for English Text Generation

Presentation Transcript

A Markov Chain Model of Baseball

Markov-Chain Monte Carlo

Modeling and Simulation Markov chain

Markov Chain

Markov Chain of DCF

Discrete time Markov Chain

Markov Chain - Brand Switching

Continuous Time Markov Chain

Markov-Chain Monte Carlo

Markov Chain Monte Carlo

Markov Chain Monte Carlo

Markov Chain Part 1

EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis

Monte Carlo-Markov Chain

Markov Chain Models

Markov Chain Population Models in Medical Decision Making

Markov Chain Part 3

Markov-Chain Monte Carlo

Markov Chain Monte Carlo Methods

Markov Chain Models

Markov-Chain Monte Carlo

6. Markov Chain