Family Relationships Domain¶

Overview: This example motivates learning about family relationships from examples of Harry Potter characters, then applies those rules to characters from Pride and Prejudice.

from srlearn.datasets import load_toy_father

train, test = load_toy_father()

The training examples in the “Toy Father” dataset describes relationships and facts about Harry Potter characters.

The first positive example: father(harrypotter,jamespotter). means “James Potter is the father of Harry Potter.”

The first negative example: father(harrypotter,mrgranger). can be interpreted as “Mr. Granger is not the father of Harry Potter.”

print(train.pos[0], "→    James Potter is the father of Harry Potter.")
print(train.neg[0], "  → Mr. Granger is not the father of Harry Potter.")

father(harrypotter,jamespotter). →    James Potter is the father of Harry Potter.
father(harrypotter,mrgranger).   → Mr. Granger is not the father of Harry Potter.

The facts contain three additional predicates: describing children, male, and who is a siblingof.

train.facts

['male(mrgranger).', 'male(jamespotter).', 'male(harrypotter).', 'male(luciusmalfoy).', 'male(dracomalfoy).', 'male(arthurweasley).', 'male(ronweasley).', 'male(fredweasley).', 'male(georgeweasley).', 'male(hagrid).', 'male(dumbledore).', 'male(xenophiliuslovegood).', 'male(cygnusblack).', 'siblingof(ronweasley,fredweasley).', 'siblingof(ronweasley,georgeweasley).', 'siblingof(ronweasley,ginnyweasley).', 'siblingof(fredweasley,ronweasley).', 'siblingof(fredweasley,georgeweasley).', 'siblingof(fredweasley,ginnyweasley).', 'siblingof(georgeweasley,ronweasley).', 'siblingof(georgeweasley,fredweasley).', 'siblingof(georgeweasley,ginnyweasley).', 'siblingof(ginnyweasley,ronweasley).', 'siblingof(ginnyweasley,fredweasley).', 'siblingof(ginnyweasley,georgeweasley).', 'childof(mrgranger,hermione).', 'childof(mrsgranger,hermione).', 'childof(jamespotter,harrypotter).', 'childof(lilypotter,harrypotter).', 'childof(luciusmalfoy,dracomalfoy).', 'childof(narcissamalfoy,dracomalfoy).', 'childof(arthurweasley,ronweasley).', 'childof(mollyweasley,ronweasley).', 'childof(arthurweasley,fredweasley).', 'childof(mollyweasley,fredweasley).', 'childof(arthurweasley,georgeweasley).', 'childof(mollyweasley,georgeweasley).', 'childof(arthurweasley,ginnyweasley).', 'childof(mollyweasley,ginnyweasley).', 'childof(xenophiliuslovegood,lunalovegood).', 'childof(cygnusblack,narcissamalfoy).']

Our aim is to learn about what a “father” is in terms of the facts we have available. This process is usually called induction, and is often portrayed as “learning a definition of an object.”

from srlearn.rdn import BoostedRDNClassifier
from srlearn import Background

bk = Background(
    modes=[
        "male(+name).",
        "father(+name,+name).",
        "childof(+name,+name).",
        "siblingof(+name,+name)."
    ],
    number_of_clauses=8,
)

clf = BoostedRDNClassifier(
    background=bk,
    target="father",
    node_size=1,
    n_estimators=5,
)

clf.fit(train)

/home/docs/checkouts/readthedocs.org/user_builds/srlearn/checkouts/latest/srlearn/base.py:70: FutureWarning: solver='BoostSRL' will default to solver='SRLBoost' in 0.6.0, pass one or the other as an argument to suppress this warning.
  ", pass one or the other as an argument to suppress this warning.", FutureWarning)

BoostedRDNClassifier(background=setParam: numOfClauses=8.
setParam: numOfCycles=100.
usePrologVariables: true.
setParam: nodeSize=1.
setParam: maxTreeDepth=3.
mode: male(+name).
mode: father(+name,+name).
mode: childof(+name,+name).
mode: siblingof(+name,+name).
, n_estimators=5, neg_pos_ratio=2, solver='BoostSRL', target='father')

It’s important to check whether we actually learn something useful. We’ll visually inspect the relational regression trees to see what they learned.

from srlearn.plotting import plot_digraph
from srlearn.plotting import export_digraph

plot_digraph(export_digraph(clf, 0), format="html")

<srlearn.plotting._GVPlotter object at 0x7effa41d8910>

There is some variance between runs, but in the concept that the trees pick up on is roughly that “A father has a child and is male.”

plot_digraph(export_digraph(clf, 1), format="html")

<srlearn.plotting._GVPlotter object at 0x7effa4058d50>

Here the data is fairly complete, and the concept that “A father has a child and is male” seems sufficient for the purposes of this data. Let’s apply our learned model to the test data, which includes facts about characters from Jane Austen’s Pride and Prejudice.

predictions = clf.predict_proba(test)

print("{:<35} {}".format("Predicate", "Probability of being True"), "\n", "-" * 60)
for predicate, prob in zip(test.pos + test.neg, predictions):
    print("{:<35} {:.2f}".format(predicate, prob))

Predicate                           Probability of being True
 ------------------------------------------------------------
father(elizabeth,mrbennet).         0.47
father(jane,mrbennet).              0.47
father(charlotte,mrlucas).          0.47
father(charlotte,mrsbennet).        0.07
father(jane,mrlucas).               0.09
father(mrsbennet,mrbennet).         0.09
father(jane,elizabeth).             0.07

The confidence might be a little low, which is a good excuse to mention one of the hyperparameters. “Node Size,” or node_size corresponds to the maximum number of predicates that can be used as a split in the dependency network. We set node_size=1 above for demonstration, but the concept that seems to be learned: father(A, B) = [childof(B, A), male(B)] is of size 2.

We might be able to learn a better model by taking this new information into account:

bk = Background(
    modes=[
        "male(+name).",
        "father(+name,+name).",
        "childof(+name,+name).",
        "siblingof(+name,+name)."
    ],
    number_of_clauses=8,
)

clf = BoostedRDNClassifier(
    background=bk,
    target="father",
    node_size=2,                # <--- Changed from 1 to 2
    n_estimators=5,
)

clf.fit(train)

plot_digraph(export_digraph(clf, 0), format="html")

/home/docs/checkouts/readthedocs.org/user_builds/srlearn/checkouts/latest/srlearn/base.py:70: FutureWarning: solver='BoostSRL' will default to solver='SRLBoost' in 0.6.0, pass one or the other as an argument to suppress this warning.
  ", pass one or the other as an argument to suppress this warning.", FutureWarning)

<srlearn.plotting._GVPlotter object at 0x7eff9535f990>

This seems to be much more stable, which should also be reflected in the probabilities assigned on test examples.

predictions = clf.predict_proba(test)

print("{:<35} {}".format("Predicate", "Probability of being True"), "\n", "-" * 60)
for predicate, prob in zip(test.pos + test.neg, predictions):
    print("{:<35} {:.2f}".format(predicate, prob))

Predicate                           Probability of being True
 ------------------------------------------------------------
father(elizabeth,mrbennet).         0.74
father(jane,mrbennet).              0.74
father(charlotte,mrlucas).          0.74
father(charlotte,mrsbennet).        0.09
father(jane,mrlucas).               0.09
father(mrsbennet,mrbennet).         0.09
father(jane,elizabeth).             0.09

Total running time of the script: ( 0 minutes 3.171 seconds)

Gallery generated by Sphinx-Gallery