Family Relationships Domainยถ

Overview: This example motivates learning about family relationships from examples of Harry Potter characters, then applies those rules to characters from Pride and Prejudice.

from srlearn.datasets import load_toy_father

train, test = load_toy_father()

The training examples in the โ€œToy Fatherโ€ dataset describes relationships and facts about Harry Potter characters.

The first positive example: father(harrypotter,jamespotter). means โ€œJames Potter is the father of Harry Potter.โ€

The first negative example: father(harrypotter,mrgranger). can be interpreted as โ€œMr. Granger is not the father of Harry Potter.โ€

print(train.pos[0], "โ†’    James Potter is the father of Harry Potter.")
print(train.neg[0], "  โ†’ Mr. Granger is not the father of Harry Potter.")

Out:

father(harrypotter,jamespotter). โ†’    James Potter is the father of Harry Potter.
father(harrypotter,mrgranger).   โ†’ Mr. Granger is not the father of Harry Potter.

The facts contain three additional predicates: describing children, male, and who is a siblingof.

Out:

['male(mrgranger).', 'male(jamespotter).', 'male(harrypotter).', 'male(luciusmalfoy).', 'male(dracomalfoy).', 'male(arthurweasley).', 'male(ronweasley).', 'male(fredweasley).', 'male(georgeweasley).', 'male(hagrid).', 'male(dumbledore).', 'male(xenophiliuslovegood).', 'male(cygnusblack).', 'siblingof(ronweasley,fredweasley).', 'siblingof(ronweasley,georgeweasley).', 'siblingof(ronweasley,ginnyweasley).', 'siblingof(fredweasley,ronweasley).', 'siblingof(fredweasley,georgeweasley).', 'siblingof(fredweasley,ginnyweasley).', 'siblingof(georgeweasley,ronweasley).', 'siblingof(georgeweasley,fredweasley).', 'siblingof(georgeweasley,ginnyweasley).', 'siblingof(ginnyweasley,ronweasley).', 'siblingof(ginnyweasley,fredweasley).', 'siblingof(ginnyweasley,georgeweasley).', 'childof(mrgranger,hermione).', 'childof(mrsgranger,hermione).', 'childof(jamespotter,harrypotter).', 'childof(lilypotter,harrypotter).', 'childof(luciusmalfoy,dracomalfoy).', 'childof(narcissamalfoy,dracomalfoy).', 'childof(arthurweasley,ronweasley).', 'childof(mollyweasley,ronweasley).', 'childof(arthurweasley,fredweasley).', 'childof(mollyweasley,fredweasley).', 'childof(arthurweasley,georgeweasley).', 'childof(mollyweasley,georgeweasley).', 'childof(arthurweasley,ginnyweasley).', 'childof(mollyweasley,ginnyweasley).', 'childof(xenophiliuslovegood,lunalovegood).', 'childof(cygnusblack,narcissamalfoy).']

Our aim is to learn about what a โ€œfatherโ€ is in terms of the facts we have available. This process is usually called induction, and is often portrayed as โ€œlearning a definition of an object.โ€

from srlearn.rdn import BoostedRDNClassifier
from srlearn import Background

bk = Background(
    modes=[
        "male(+name).",
        "father(+name,+name).",
        "childof(+name,+name).",
        "siblingof(+name,+name)."
    ],
    number_of_clauses=8,
)

clf = BoostedRDNClassifier(
    background=bk,
    target="father",
    node_size=1,
    n_estimators=5,
)

clf.fit(train)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/srlearn/checkouts/stable/srlearn/base.py:70: FutureWarning: solver='BoostSRL' will default to solver='SRLBoost' in 0.6.0, pass one or the other as an argument to suppress this warning.
  ", pass one or the other as an argument to suppress this warning.", FutureWarning)

BoostedRDNClassifier(background=setParam: numOfClauses=8.
setParam: numOfCycles=100.
usePrologVariables: true.
setParam: nodeSize=1.
setParam: maxTreeDepth=3.
mode: male(+name).
mode: father(+name,+name).
mode: childof(+name,+name).
mode: siblingof(+name,+name).
, n_estimators=5, neg_pos_ratio=2, solver='BoostSRL', target='father')

Itโ€™s important to check whether we actually learn something useful. Weโ€™ll visually inspect the relational regression trees to see what they learned.

from srlearn.plotting import plot_digraph
from srlearn.plotting import export_digraph

plot_digraph(export_digraph(clf, 0), format="html")

Out:

<srlearn.plotting._GVPlotter object at 0x7f040d25e190>

There is some variance between runs, but in the concept that the trees pick up on is roughly that โ€œA father has a child and is male.โ€

plot_digraph(export_digraph(clf, 1), format="html")

Out:

<srlearn.plotting._GVPlotter object at 0x7f040d25e290>

Here the data is fairly complete, and the concept that โ€œA father has a child and is maleโ€ seems sufficient for the purposes of this data. Letโ€™s apply our learned model to the test data, which includes facts about characters from Jane Austenโ€™s Pride and Prejudice.

predictions = clf.predict_proba(test)

print("{:<35} {}".format("Predicate", "Probability of being True"), "\n", "-" * 60)
for predicate, prob in zip(test.pos + test.neg, predictions):
    print("{:<35} {:.2f}".format(predicate, prob))

Out:

Predicate                           Probability of being True
 ------------------------------------------------------------
father(elizabeth,mrbennet).         0.66
father(jane,mrbennet).              0.66
father(charlotte,mrlucas).          0.66
father(charlotte,mrsbennet).        0.08
father(jane,mrlucas).               0.09
father(mrsbennet,mrbennet).         0.09
father(jane,elizabeth).             0.08

The confidence might be a little low, which is a good excuse to mention one of the hyperparameters. โ€œNode Size,โ€ or node_size corresponds to the maximum number of predicates that can be used as a split in the dependency network. We set node_size=1 above for demonstration, but the concept that seems to be learned: father(A, B) = [childof(B, A), male(B)] is of size 2.

We might be able to learn a better model by taking this new information into account:

bk = Background(
    modes=[
        "male(+name).",
        "father(+name,+name).",
        "childof(+name,+name).",
        "siblingof(+name,+name)."
    ],
    number_of_clauses=8,
)

clf = BoostedRDNClassifier(
    background=bk,
    target="father",
    node_size=2,                # <--- Changed from 1 to 2
    n_estimators=5,
)

clf.fit(train)

plot_digraph(export_digraph(clf, 0), format="html")

Out:

/home/docs/checkouts/readthedocs.org/user_builds/srlearn/checkouts/stable/srlearn/base.py:70: FutureWarning: solver='BoostSRL' will default to solver='SRLBoost' in 0.6.0, pass one or the other as an argument to suppress this warning.
  ", pass one or the other as an argument to suppress this warning.", FutureWarning)

<srlearn.plotting._GVPlotter object at 0x7f040d23f450>

This seems to be much more stable, which should also be reflected in the probabilities assigned on test examples.

predictions = clf.predict_proba(test)

print("{:<35} {}".format("Predicate", "Probability of being True"), "\n", "-" * 60)
for predicate, prob in zip(test.pos + test.neg, predictions):
    print("{:<35} {:.2f}".format(predicate, prob))

Out:

Predicate                           Probability of being True
 ------------------------------------------------------------
father(elizabeth,mrbennet).         0.74
father(jane,mrbennet).              0.74
father(charlotte,mrlucas).          0.74
father(charlotte,mrsbennet).        0.07
father(jane,mrlucas).               0.08
father(mrsbennet,mrbennet).         0.08
father(jane,elizabeth).             0.07

Total running time of the script: ( 0 minutes 3.647 seconds)

Gallery generated by Sphinx-Gallery