100 likes | 259 Vues
This document outlines the implementation of a Markov Chain algorithm in Perl, aimed at mimicking proper English composition. The algorithm utilizes a prefix hash table with suffix lists to generate text based on input lines. It begins by constructing the necessary data structures and iteratively generates text by selecting suffixes based on current prefixes. The implementation also discusses performance considerations and potential extensions for varying prefix sizes. The provided code serves as a practical example of the algorithm in action, enabling users to explore and adapt it for their own projects.
E N D
Markov Chain Algorithmin Perl Michael Conway CS 265 May 4, 2011
Markov Chain Algorithm Goal: Mimic proper English composition. 1. Populate prefix hash table with suffix lists. 2. Start at the beginning and jump from prefix to prefix, printing suffixes.
Perl Implementation # markov.pl: markov chain algorithm for 2-word prefixes $MAXGEN = 10000; $NONWORD = "\n"; $w1 = $w2 = $NONWORD; # initial state while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w1}{$w2}}, $_); ($w1, $w2) = ($w2, $_); # multiple assignment } } push(@{$statetab{$w1}{$w2}}, $NONWORD); # add tail $w1 = $w2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w1}{$w2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; ($w1, $w2) = ($w2, $t); # advance chain }
Hash Generation $w1 = $w2 = $NONWORD; # initial state while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w1}{$w2}}, $_); ($w1, $w2) = ($w2, $_); # multiple assignment } } push(@{$statetab{$w1}{$w2}}, $NONWORD); # add tail • Iterate over words in stdin, store suffixes • IMPORTANT code segment: @{$statetab{$w1}{$w2}} -> $statetab is implicitly declared hash -> $statetab{$w1} is i.d. reference to hash -> @{ } gets array “referenced” by $statetab{$w1}{$w2} • Note: <>, foreach, push(), multiple assignment
Output Generation $w1 = $w2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w1}{$w2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; ($w1, $w2) = ($w2, $t); # advance chain } • Same $statetab{$w1}{$w2}construction used for array reference • Note: rand, exit line, ->, interpolated string in print, multiple assignment
Pros and Cons • Pros: • Very short source code • Necessary structures (array, hash) are built-in • Decent performance • Cons: • Can be confusing, especially to new users • Outperformed by some (like C) • Difficult to extend to different prefix sizes
Extension: Different Prefix Sizes # markov_n.pl: markov chain algorithm for n-word prefixes $PREFLEN = 5; # or whatever $MAXGEN = 80; $NONWORD = "\n"; foreach $i (0..$PREFLEN-1) { $words[$i] = $NONWORD; # initial state } while (<>) { # read each line of input foreach (split) { push(@{hash_lookup(\@words)}, $_); @words = (@words[1..$#words],$_); } } push(@{hash_lookup(\@words)}, $NONWORD); # add tail
Extension: Different Prefix Sizes @words = (); foreach $i (0..$PREFLEN-1) { $words[$i] = $NONWORD; } for ($i = 0; $i < $MAXGEN; $i++) { $suf = hash_lookup(\@words); # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; @words = (@words[1..$#words],($t)); # advance chain } sub hash_lookup { my $ref = \%statetab; my @wds = @{@_[0]}; for ($i = 0;$i < $#wds;$i++) { $ref = \%{${$ref}{$wds[$i]}}; } $ref = \@{${$ref}{$wds[$#wds]}}; return $ref; }