Prolog for Linguists Symbolic Systems 139P/239P

Prolog for Linguists Symbolic Systems 139P/239P John Dowding Week 5, Novembver 5, 2001 jdowding@stanford.edu

Office Hours • We have reserved 4 workstations in the Unix Cluster in Meyer library, fables 1-4 • 4:30-5:30 on Thursday this week • Or, contact me and we can make other arrangements

Course Schedule • Oct. 8 • Oct. 15 • Oct. 22 • Oct. 29 • Nov. 5 (double up) • Nov. 12 • Nov. 26 (double up) • Dec. 3 No class on Nov. 19

More about cut! • Common to distinguish between red cuts and green cuts • Red cuts change the solutions of a predicate • Green cuts do not change the solutions, but effect the efficiency • Most of the cuts we have used so far are all red cuts %delete_all(+Element, +List, -NewList) delete_all(_Element, [], []). delete_all(Element, [Element|List], NewList) :- !, delete_all(Element, List, NewList). delete_all(Element, [Head|List], [Head|NewList]) :- delete_all(Element, List, NewList).

Green cuts • Green cuts can be used to avoid unproductive backtracking % identical(?Term1, ?Term2) identical(Var1, Var2):- var(Var1), var(Var2), !, Var1 == Var2. identical(Atomic1,Atomic2):- atomic(Atomic1), atomic(Atomic2), !, Atomic1 == Atomic2. identical(Term1, Term2):- compound(Term1), compound(Term2), functor(Term1, Functor, Arity), functor(Term2, Functor, Arity), identical_helper(Arity, Term1, Term2).

Input/Output of Terms • Input and Output in Prolog takes place on Streams • By default, input comes from the keyboard, and output goes to the screen. • Three special streams: • user_input • user_output • user_error • read(-Term) • write(+Term) • nl

Example: Input/Output • repeat/0 is a built-in predicate that will always resucceed % classifing terms classify_term :- repeat, write('What term should I classify? '), nl, read(Term), process_term(Term), Term == end_of_file.

Streams • You can create streams with open/3 open(+FileName, +Mode, -Stream) • Mode is one of read, write, or append. • When finished reading or writing from a Stream, it should be closed with close(+Stream) • There are Stream-versions of other Input/Output predicates • read(+Stream, -Term) • write(+Stream, +Term) • nl(+Stream)

Characters and character I/O • Prolog represents characters in two ways: • Single character atoms ‘a’, ‘b’, ‘c’ • Character codes • Numbers that represent the character in some character encoding scheme (like ASCII) • By default, the character encoding scheme is ASCII, but others are possible for handling international character sets. • Input and Output predicates for characters follow a naming convention: • If the predicate deals with single character atoms, it’s name ends in _char. • If the predicate deals with character codes, it’s name ends in _code. • Characters are character codes is traditional “Edinburgh” Prolog, but single character atoms were introduced in the ISO Prolog Standard.

Special Syntax I • Prolog has a special syntax for typing character codes: • 0’a is a expression that means the character codc that represents the character a in the current character encoding scheme.

Special Syntax II • A sequence of characters enclosed in double quote marks is a shorthand for a list containing those character codes. • “abc” = [97, 98, 99] • It is possible to change this default behavior to one in which uses single character atoms instead of character codes, but we won’t do that here.

Built-in Predicates: • atom_chars(Atom, CharacterCodes) • Converts an Atom to it’s corresponding list of character codes, • Or, converts a list of CharacterCodes to an Atom. • put_code(Code) and put_code(Stream, Code) • Write the character represented by Code • get_code(Code) and get_code(Stream, Code) • Read a character, and return it’s corresponding Code • Checking the status of a Stream: • at_end_of_file(Stream) • at_end_of_line(Stream)

Review homework problems: last/2 % last(?Element, ?List) last(Element, [Element]). last(Element, [_Head|Tail]):- last(Element, Tail). Or last(Element, List):- append(_EverthingElse, [Element], List).

evenlist/1 and oddlist/1 %evenlist(?List). evenlist([]). evenlist([_Head|Tail]):- oddlist(Tail). %oddlist(+List) oddlist([_Head|Tail]):- evenlist(Tail).

palindrome/1 %palindrome1(+List). palindrome1([]). palindrome1([_OneElement]). palindrome1([Head|Tail]):- append(Rest, [Head], Tail), palindrome1(Rest).

Or, palindrome/1 %palindrome(+List) palindrome(List):- reverse(List, List). %reverse(+List, -ReversedList) reverse(List, ReversedList):- reverse(List, [], ReversedList). %reverse(List, Partial, ReversedList) reverse([], Result, Result). reverse([Head|Tail], Partial, Result):- reverse(Tail, [Head|Partial], Result).

subset/2 %subset(?Set, ?SubSet) subset([], []). subset([Element|RestSet], [Element|RestSubSet]):- subset(RestSet, RestSubSet). subset([_Element|RestSet], SubSet):- subset(RestSet, SubSet).

union/3 %union(+Set1, +Set2, -SetUnion) union([], Set2, Set2). union([Element|RestSet1], Set2, [Element|SetUnion]):- union(RestSet1, Set2, SetUnion), \+ member(Element, SetUnion), !. union([_Element|RestSet1], Set2, SetUnion):- union(RestSet1, Set2, SetUnion).

intersect/3 %intersect(+Set1, +Set2, ?Intersection) intersect([], _Set2, []). intersect([Element|RestSet1], Set2, [Element|Intersection]):- member(Element, Set2), !, intersect(RestSet1, Set2, Intersection). intersect([_Element|RestSet1], Set2, Intersection):- intersect(RestSet1, Set2, Intersection).

split/4 %split(+List, +SplitPoint, -Smaller, -Bigger). split([], _SplitPoint, [], []). split([Head|Tail], SplitPoint, [Head|Smaller], Bigger):- Head =< SplitPoint, !, % green cut split(Tail, SplitPoint, Smaller, Bigger). split([Head|Tail], SplitPoint, Smaller, [Head|Bigger]):- Head > SplitPoint, split(Tail, SplitPoint, Smaller, Bigger).

merge/3 %merge(+List1, +List2, -MergedList) merge([], List2, List2). merge(List1, [], List1). merge([Element1|List1], [Element2|List2], [Element1|MergedList]):- Element1 =< Element2, !, merge(List1, [Element2|List2], MergedList). merge(List1, [Element2|List2], [Element2|MergedList]):- merge(List1, List2, MergedList).

Sorting: quicksort/2 % quicksort(+List, -SortedList) quicksort([], []). quicksort([Head|UnsortedList], SortedList):- split(UnsortedList, Head, Smaller, Bigger), quicksort(Smaller, SortedSmaller), quicksort(Bigger, SortedBigger), append(SortedSmaller, [Head|SortedBigger], SortedList).

Sorting: mergesort/2 % mergesort(+List, -SortedList). mergesort([], []). mergesort([_One], [_One]):- !. mergesort(List, SortedList):- break_list_in_half(List, FirstHalf, SecondHalf), mergesort(FirstHalf, SortedFirstHalf), mergesort(SecondHalf, SortedSecondHalf), merge(SortedFirstHalf, SortedSecondHalf, SortedList).

Merge sort helper predicates % break_list_in_half(+List, -FirstHalf, -SecondHalf) break_list_in_half(List, FirstHalf, SecondHalf):- length(List, L), HalfL is L /2, first_n(List, HalfL, FirstHalf, SecondHalf). % first_n(+List, +N, -FirstN, -Remainder) first_n([Head|Rest], L, [Head|Front], Back):- L > 0, !, NextL is L - 1, first_n(Rest, NextL, Front, Back). first_n(Rest, _L, [], Rest).

Lexigraphic Ordering • We can extending sorting predicates to sort all Prolog terms using a lexigraphic ordering on terms. • Defined recursively: • Variables @< Numbers @< Atoms @< CompoundTerms • Var1 @< Var2 if Var1 is older than Var2 • Atom1 @< Atom2 if Atom1 is alphabetically earlier than Atom2. • Functor1(Arg11, … Arg1N) @< Functor2(Arg21,…, Arg2M) if • Functor1 @< Functor2, or Functor1 = Functor2 and • N @< M, or Functor1=Functor2, N=M, and • Arg11 @< Arg21, or • Arg11 @= Arg21 and Arg12 @< Arg22, or …

Built-in Relations: • Less-than @< • Greater than @> • Less than or equal @=< • Greater than or equal @>= • Built-in predicate sort/2 sorts Prolog terms on a lexigraphic ordering.

Tokenizer • A token is a sequence of characters that constitute a single unit • What counts as a token will vary • A token for a programming language may be different from a token for, say, English. • We will start to write a tokenizer for English, and build on it in further classes

Homework • Read section in SICTus Prolog manual on Input/Output • This material corresponds to Ch. 5 in Clocksin and Mellish, but the Prolog manual is more up to date and consistent with the ISO Prolog Standard • Improve the tokenizer by adding support for contractions • can’t., won’t haven’t, etc. • would’ve, should’ve • I’ll, she’ll, he’ll • He’s, She’s, (contracted is and contracted has, and possessive) • Don’t hand this in, but hold on to it, you’ll need it later.

My tokenizer • First, I modified to turn all tokens into lower case • Then, added support for integer tokens • Then, added support for contraction tokens

Converting character codes to lower case % occurs_in_word(+Code, -LowerCaseCode) occurs_in_word(Code, Code):- Code >= 0'a, Code =< 0'z. occurs_in_word(Code, LowerCaseWordCode):- Code >= 0'A, Code =< 0'Z, LowerCaseWordCode is Code + (0'a - 0'A).

Converting to lower case % case for regular word tokens find_one_token([WordCode|CharacterCodes], Token, RestCharacterCodes):- occurs_in_word(WordCode, LowerCaseWordCode), find_rest_word_codes(CharacterCodes, RestWordCodes, RestCharacterCodes), atom_chars(Token, [LowerCaseWordCode|RestWordCodes]). find_rest_word_codes(+CharacterCodes, -RestWordCodes, -RestCharacterCodes) find_rest_word_codes([WordCode|CharacterCodes], [LowerCaseWordCode|RestWordCodes], RestCharacterCodes):- occurs_in_word(WordCode, LowerCaseWordCode), !, % red cut find_rest_word_codes(CharacterCodes, RestWordCodes, RestCharacterCodes). find_rest_word_codes(CharacterCodes, [], CharacterCodes).

Adding integer tokens % case for integer tokens find_one_token([DigitCode|CharacterCodes], Token, RestCharacterCodes):- digit(DigitCode), find_rest_digit_codes(CharacterCodes, RestDigitCodes, RestCharacterCodes), atom_chars(Token, [DigitCode|RestDigitCodes]). % find_rest_digit_codes(+CharacterCodes, -RestDigitCodes, -RestCharacterCodes) find_rest_digit_codes([DigitCode|CharacterCodes], [DigitCode|RestDigitCodes], RestCharacterCodes):- digit(DigitCode), !, % red cut find_rest_digit_codes(CharacterCodes, RestDigitCodes, RestCharacterCodes). find_rest_digit_codes(CharacterCodes, [], CharacterCodes).

Digits %digit(+Code) digit(Code):- Code >= 0'0, Code =< 0'9.

Contactions • Turned unambiguous contractions into the corresponding English word • Left ambiguous contractions contracted. • Handled 2 cases • Simple contractions: He’s => He + ‘s He’ll => He + will They’ve => They + have • Exceptions can’t => can + not won’t => will + not

Simple Contractions simple_contraction("'re", "are"). simple_contraction("'m", "am"). simple_contraction("'ll", "will"). simple_contraction("'ve", "have"). simple_contraction("'d", "'d"). % had, would simple_contraction("'s", "'s"). % is, has, possessive simple_contraction("n't", "not").

handle_contractions/2 % handle_contractions(+TokenChars, -FrontTokenChars, RestTokenChars) handle_contractions("can't", "can", "not"):- !. handle_contractions("won't", "will", "not"):- !. handle_contractions(FoundCodes, Front, NewCodes):- simple_contraction(Contraction, NewCodes), append(Front, Contraction, FoundCodes), Front \== [], !.

Modify find_one_token/3 % case for regular word tokens find_one_token([WordCode|CharacterCodes], Token, RestCharacterCodes):- occurs_in_word(WordCode, LowerCaseWordCode), find_rest_word_codes(CharacterCodes, RestWordCodes, TempCharacterCodes), handle_contractions([LowerCaseWordCode|RestWordCodes], FirstTokenCodes, CodesToAppend), append(CodesToAppend, TempCharacterCodes, RestCharacterCodes), atom_chars(Token, FirstTokenCodes).

Dynamic predicates and assert • Add or remove clauses from a dynamic predicate at run time. • To specify that a predicate is dynamic, add :- dynamic predicate/Arity. to your program. • assert/1 adds a new clause • retract/1 removes one or more clauses • retractall/1 removes all clauses for the predicate • Can’t modify compiled predicates at run time • Modifying a program while it is running is dangerous

assert/1, asserta/1, and assertz/1 • Asserting facts (most common) assert(Fact) • Asserting rules assert( (Head :- Body) ). • asserta/1 adds the new clause at the front of the predicate • assertz/1 adds the new clause at the end of the predicate • assert/1 leaves the order unspecified

Built-In: retract/1 • retract(Goal) removes the first clause that matches Goal. • On REDO, it will remove the next matching clause, if any. • Retract facts: retract(Fact) • Retract rules: retract( (Head :- Body) ).

Built-in: retractall/1 • retractall(Head) removes all facts and rules whose head matches. • Could be implemented with retract/1 as: retractall(Head) :- retract(Head), fail. retract(Head):- retract( (Head :- _Body) ), fail. retractall(_Head).

Built-In: abolish(Predicate/Arity) • abolish(Predicate/Arity) is almost the same as retract(Predicate(Arg1, …, ArgN)) except that abolish/1 removes all knowledge about the predicate, where retractall/1 only removes the clauses of the predicate. That is, if a predicate is declared dynamic, that is remembered after retractall/1, but not after abolish/1.

Example: Stacks & Queues :- dynamic stack_element/1. empty_stack :- retractall(stack_selement(_Element)). % push_on_stack(+Element) push_on_stack(Element):- asserta(stack_element(Element)). % pop_from_stack(-Element) pop_from_stack(Element):- var(Element), retract(stack_element(Element)), !.

Queues % dynamic queue_element/1. empty_queue :- retractall(queue_element(_Element)). %put_on_queue(+Element) put_on_queue(Element):- assertz(queue_element(Element)). %remove_from_queue(-Element) remove_from_queue(Element):- var(Element), retract(queue_element(Element)), !.

Example: prime_number. :- dynamic known_prime/1. find_primes(Prime):- retractall(known_prime(_Prime)), find_primes(2, Prime). find_primes(Integer, Integer):- \+ composite(Integer), assertz(known_prime(Integer)). find_primes(Integer, Prime):- NextInteger is Integer + 1, find_primes(NextInteger, Prime).

Example: prime_number (cont) %composite(+Integer) composite(Integer):- known_prime(Prime), 0 is Integer mod Prime, !.

Aggregation: findall/3. • findall/3 is a meta-predicate that collects values from multiple solutions to a Goal: findall(Value, Goal, Values) findall(Child, parent(james, Child), Children) • Prolog has other aggregation predicates setof/3 and bagof/3, but we’ll ignore them for now.

findall/3 and assert/1 • findall/3 and assert/1 both let you preserve information across failure. :- dynamic solutions/1. findall(Value, Goal, Solutions):- retractall(solutions/1), assert(solutions([])), call(Goal), retract(solutions(S)), append(S, [Value], NextSolutions), assert(solutions(NextSolutions)), fail. findall(_Value, Goal, Solutions):- solutions(Solutions).

Special Syntax III: Operators • Convenience in writing terms • We’ve seem them all over already: union([Element|RestSet1], Set2, [Element|SetUnion]):- union(RestSet1, Set2, SetUnion), \+ member(Element, SetUnion), !. This is just an easier way to write the term: ‘:-’(union([Element|RestSet],Set2,[Element|SetUnion]), ‘,’(union(RestSet1,Set2,SetUnion), ‘,’(‘\+’(member(Element, SetUnion), !)))

Operators (cont) • Operators can come before their arguments (prefix) • \+, dynamic • Or between their arguments (infix) • , + is < • Of after their arguments (postfix) • Prolog doesn’t use any of these (yet) • The same Operator can be more than one type • :-

Prolog for Linguists Symbolic Systems 139P/239P