Prolog is a portmanteau word for programming in logic, with the idea being that we get to stand on the shoulders of George Boole, Augustus De Morgan, John Venn, Christine Ladd... and other giants to write better code.

In practice, tying Prolog to classical logic can be tricky, so I've put together some examples based on my translation of Jennifer Widom’s SQL basics course into SWI Prolog on Swish.

Turning Widom's example student and apply tables into a Venn diagram looks like so:

venn_majors.svg

Her example data (translated from SQL into Prolog) looks like so

%! student(?SID:text, ?SName:text, ?GPA:float, ?SizeHS:integer) is nondet 
student(123, 'Amy',    3.9, 1000).
student(234, 'Bob',    3.6, 1500).
student(345, 'Craig',  3.5, 500).
student(456, 'Doris',  3.9, 1000).
student(567, 'Edward', 2.9, 2000).
student(678, 'Fay',    3.8, 200).
student(789, 'Gary',   3.4, 800).
student(987, 'Helen',  3.7, 800).
student(876, 'Irene',  3.9, 400).
student(765, 'Jay',    2.9, 1500).
student(654, 'Amy',    3.9, 1000).
student(543, 'Craig',  3.4, 2000).

%! apply(?SID:integer, ?CName:text, ?Major:text, ?Decision:text) is nondet
apply(123, 'Stanford', 'CS',             'Y').
apply(123, 'Stanford', 'EE',             'N').
apply(123, 'Berkeley', 'CS',             'Y').
apply(123, 'Cornell',  'EE',             'Y').
apply(234, 'Berkeley', 'biology',        'N').
apply(345, 'MIT',      'bioengineering', 'Y').
apply(345, 'Cornell',  'bioengineering', 'N').
apply(345, 'Cornell',  'CS',             'Y').
apply(345, 'Cornell',  'EE',             'N').
apply(678, 'Stanford', 'history',        'Y').
apply(987, 'Stanford', 'CS',             'Y').
apply(987, 'Berkeley', 'CS',             'Y').
apply(876, 'Stanford', 'CS',             'N').
apply(876, 'MIT',      'biology',        'Y').
apply(876, 'MIT',      'marine biology', 'N').
apply(765, 'Stanford', 'history',        'Y').
apply(765, 'Cornell',  'history',        'N').
apply(765, 'Cornell',  'psychology',     'Y').
apply(543, 'MIT',      'CS',             'N').

(Please note the tablename apply is completely unrelated to Prolog's builtin apply(:Goal, +List). I cut 'n pasted Widom's tablename apply without remembering the word already has a common usage. Then again, not getting confused by preconceived ideas about what words mean is a key skill in mastering logic).

I'm going to use Stephen Cole Kleene's 15 page textbook as my template.

Something Kleene doesn't do is tie logic to set theory, which John Venn famously did in his 1881 book Symbolic Logic. (Logic is inherently a beautifully simple, minimalistic subject which came to be ruined by proliferation of pointless symbols, but that's not Venn's fault).

A more modern notation for combining sets and logic is set-builder notation which I'll use here.

p — Student IDs of those who have applied for CS

Looking at the above Venn diagram, what we are looking for is the set of five student IDs P = {123, 345, 543, 876, 987} who applied for 'CS' (several times in some cases).

Basic set builder notation looks like P = {p|Φ(p)} where Φ(p) in this case is the boolean test Major == ‘CS’, a true or false test called a proposition.

A simple way to do this is in Prolog is:

apply(SID, _, 'CS', _).

One of the basic ideas of sets is no duplicates, and with the above query students 123 and 987 get listed twice because they applied for ‘CS’ at both Stanford and Berkeley. SWI Prolog has the handy builtin distinct(?Witness, :Goal), so we get what we want with:

distinct(SID, apply(SID, _, 'CS', _)).

¬p — Student IDs of those who have not applied for CS

Here we want the set of seven student IDs P' = {234, 456, 567, 654, 678, 765, 789} who have not applied for 'CS'. It's probably a good idea to try figure out how to do this yourself before reading on. It's easy when you know how, but several traps await novices figuring it out.

A mistake most people will probably make on their first attempt is:

apply(SID, _, _Major, _), _Major \== 'CS'.

This will include the three students who applied for 'CS' along with alternative majors. Rather than "not p", it's easiest to think in its set equivalent, complement P'. The next trap is to think we can the get complement of apply(SID, _, 'CS', _). by negating it as in

\+apply(SID, _, 'CS', _).

The above will always return false due to a convention called negation as failure, NAF to its friends. All this tells us is the above query is ambiguous and needs to be defined more clearly.

Here the complement P' is the set difference E - P, where the environment set E needs to be clearly defined.

In set-builder notation:

P' = {p ∈ E|p ∉ P}.

Another trap is to think the environment set is apply rather than student, which will lose the four students who haven't applied for anything. We get our desired result with

student(SID, _, _, _), \+apply(SID, _, 'CS', _).

Whereas p ∧ ¬q could be rewritten ¬q ∧ p in boolean algebra, a subtlety of Prolog is the above won't work switched around. Thinking in terms of +,-,?... modes or instantiation patterns, variables which can be looked up when a predicate is used unnegated have to supplied when it is negated.

In this example

apply(?SID:integer, ?CName:text, ?Major:text, ?Decision:text) 

becomes

\+apply(+SID:integer, +CName:text, +Major:text, +Decision:text)

ie the value of SID has to be set by student(SID, _, _, _) before it can be used by \+apply(SID, _, 'CS', _)

p ∧ q — Student IDs of those who applied for CS and EE

A handy thing about the copula notation of ∧ for and and ∨ for or is they are "spiky" versions of their "rounded" set counterparts ∩ for intersection and ∪ for union .

In this example, we want the intersection of CS and EE, which is {123, 345}. We get this by anding two Prolog statements with a comma. I've wrapped them in distinct to weed out duplicates.

distinct(SID, (apply(SID, _, 'CS', _), apply(SID, _, 'EE', _))).

I we wanted to find all majors with shared student applications, the intersection query above could be generalised to:

distinct([P, Q], (apply(_SID, _, P, _), apply(_SID, _, Q, _), P @< Q)).

This gives us the "joined" sets. The P @< Q filter avoids the result getting cluttered with every set being joined with itself along with Q ∩ P for every P ∩ Q.

venn_majors.svg

P

Q

'CS'

'EE'

'CS'

bioengineering

'EE'

bioengineering

'CS'

biology

'CS'

'marine biology'

biology

'marine biology'

history

psychology

Later, I'll use ¬(p ∧ q) to get the disjointed sets.

I'll be using two value algebra which dates back to logic's origins: George Boole started out from probability theory where no chance (ie 0) equates to false and definitely (ie 1) equates to true. Stripping arithmetic down to these two numbers makes logic nearly identical to normal algebra provided we substitute multiplication for and and addition for or.

p

q

p ∧ q

p * q

1

1

1

1

1

0

0

0

0

1

0

0

0

0

0

0

My initial reaction to and not being addition was that seems weird since the two words are associated, so maybe this is an example of what makes logic a metalanguage. As with multiplication, if there are any zeros in a series of propositions getting anded, the result is zero. While the traditional series product symbol ∏ to show this would do fine, the universal quantifier symbol ∀ was invented to make a really simple concept complex.

p ∨ q — Student IDs of those who applied for 'CS' or 'Computer Science'

A real life version of this example database is bound to suffer from the curse of synonyms as in different universities writing majors in different ways. Here Prolog's semicolon or comes in handy.

apply(SID, _, 'CS', _); apply(SID, _, 'Computer Science', _).

The more formal name for or is disjunction, and it introduces complexities we don't get with conjunction (the formal name for and). A common snag is the alternatives to be handled tend to proliferate, in conventional programing languages generating a tangle of if...then...else if... spaghetti (an awful convention Prolog attempts to emulate with the -> operator).

Rather than use semicolons, it's often easier in Prolog to handle ors by writing a separate rule for each case:

cs(SID) :-
    apply(SID, _, 'CS', _).

cs(SID) :-
    apply(SID, _, 'Computer Science', _).

Another snag with or is whether we want to short circuit the ors, ie just use the first, or execute all matching cases. In Prolog, short circuiting can be done with !. These are design decisions for which there's no one-size-fits-all answer.

The analogy between or and addition isn't quite as perfect as and and multiplication in that addition in two value algebra maxes at 1.

p

q

p ∨ q

p + q

1

1

1

1

1

0

1

1

0

1

1

1

0

0

0

0

Again the traditional series sum symbol Σ would do fine, but the existential quantifier symbol ∃ is used instead.

p ⇒ q — Which majors are subsets of others

Logic's arrow operator is confusingly and controversially called implication. Its truth table looks like this:

p

q

p ⇒ q

1

1

1

1

0

0

0

1

1

0

0

1

p ⇒ q is conventionally expanded into ¬p ∨ q, which made little sense until I had a bit of an epiphany while figuring out the subsets in this example.

Firstly, an easier way to figure out the above truth table (which works perfectly but I've never seen in any textbook) looks like so:

p

q

p ≤ q

1

1

1

1

0

0

0

1

1

0

0

1

Another thing that nice about using ≤ instead of ⇒ is that just as ∧ relates to ∩ and ∨ to ∪, ≤ is again the "spiky" relative of the "rounded" subset symbol ⊆.

I'm going to use this to write a query to return all subsets. Here's the Venn diagram from above as a reminder:

venn_majors.svg

My code, which I'll elaborate on below, looks as follows:

implies_(P, Q) :-
    apply(_, _, P, _), 
    apply(_, _, Q, _),
    \+ (    apply(SID, _, P, _),
           \+apply(SID, _, Q, _)
       ).

implies(P, Q) :-
    distinct([P, Q], implies_(P, Q)).

Everything implies itself (ie P ⊆ P always holds), so to strip these out, we run the following query with a P \== Q filter to omit listing every set.

implies(P, Q), P \== Q.

which produces:

P

Q

'EE'

'CS'

bioengineering

'CS'

'marine biology'

'CS'

bioengineering

'EE'

'marine biology'

biology

psychology

history

To explain how I wrote the somewhat convoluted auxiliary rule for implies(P, Q), it started from the query "which students applied for 'EE' but not 'CS'" for which in this example there are none, and Prolog responds to the query with false.

apply(SID, _, 'EE', _), \+apply(SID, _, 'CS', _).

This got me thinking what this false means, and it tells us 'EE' is a subset of 'CS' when it's true the above query is false, ie

\+ (apply(SID, _, 'EE', _), \+apply(SID, _, 'CS', _)).

A quirk of Prolog, or at least SWI Prolog, I only discovered here is that when you negate bracketed statements, you need the space between \+ and the opening bracket.

As explained above in the ¬p section, negated statements need to be supplied with input values, so replacing 'EE' with P and 'CS' with Q means first adding statements to set these values:

apply(_, _, P, _),
apply(_, _, Q, _),
\+ (apply(SID, _, P, _), \+apply(SID, _, Q, _)).

To tie the above into how p ⇒ q is conventionally substituted into ¬p ∨ q is a nice example of De Morgan's law in action: ¬(p ∧ ¬q).

While we tend to think of De Morgan’s laws as the logic substitution rules ¬(p∧q) ⇔ ¬p∨¬q and ¬(p∨q) ⇔ ¬p∧¬q, interestingly he discovered them from writings on set theory in ancient Indian texts, which in our notation would be (P ∩ Q)' ≡ P' ∪ Q' and (P ∪ Q)' ≡ P' ∩ Q'

p ⇔ q — Finding sets containing the same elements

The given example doesn't include any identical sets, so lets create one by having student 678 apply for psychology besides history, so now both majors contain {678, 765}.

apply(678, 'Stanford', 'psychology',     'Y').

In set theory, P ≡ Q means P ⊆ Q and Q ⊆ P. So we can build an equality predicate out of the above implies predicate like so:

equality(P, Q) :-
    implies(P, Q),
    implies(Q, P).

Lets run this without all the "everything equals itself" answers along with reciprocal equalities by using P @< Q as a filter:

equality(P, Q), P @< Q.

which returns

P

Q

history

psychology

While classic logic textbooks usually point out p ⇔ q is equivalent to p ⇒ q ∧ q ⇒ p, they tend to write it as (p ∧ q) ∨ (¬p ∧ ¬q). Getting to there from p ⇒ q ∧ q ⇒ p which expands to (¬p ∨ q) ∧ (¬q ∨ p), and deriving that I found an interesting exercise in boolean algebra.

The easiest way to do the required algebraic substitution is rather than use copula notation, lets replace and with multiplication and or with addition.

So we rewrite (¬p ∨ q) ∧ (¬q ∨ p) as (¬p + q)(¬q + p) which normal algebra expands to

¬p¬q + ¬pp + q¬q + qp

Now multiplying ¬p by p is always going to result in zero, as will multiplying q by ¬q, so we can rewrite it

¬p¬q + 0 + 0 + qp which can be simplified further to qp + ¬p¬q

Replace multiplication with ∧ and addition with ∨, and voila

(q ∧ p) ∨ (¬q ∧ ¬p)

¬(p ∧ q) — Finding all disjointed sets

This query is for all those sets whose intersection is the empty set (the set equivalent of false), which in logic equates to nand (ie not and).

disjoint_(P, Q) :-
    apply(_, _, P, _),
    apply(_, _, Q, _),
    \+ (apply(SID, _, P, _), apply(SID, _, Q, _)).

disjoint(P, Q) :-
    distinct([P, Q], disjoint_(P, Q)).

This query

disjoint(P, Q), P @< Q.

venn_majors.svg

Then gives us all the sets with no common elements:

P

Q

'CS'

history

'CS'

psychology

'EE'

biology

'EE'

history

'EE'

'marine biology'

'EE'

psychology

biology

history

biology

psychology

bioengineering

biology

bioengineering

history

bioengineering

'marine biology'

bioengineering

psychology

history

'marine biology'

'marine biology'

psychology

ClassicLogic (last edited 2021-10-11 10:29:07 by RobertLaing)