Note
Click here to download the full example code
Family Relationships Domainยถ
Overview: This example motivates learning about family relationships from examples of Harry Potter characters, then applies those rules to characters from Pride and Prejudice.
from srlearn.datasets import load_toy_father
train, test = load_toy_father()
The training examples in the โToy Fatherโ dataset describes relationships and facts about Harry Potter characters.
The first positive example: father(harrypotter,jamespotter).
means
โJames Potter is the father of Harry Potter.โ
The first negative example: father(harrypotter,mrgranger).
can be interpreted as
โMr. Granger is not the father of Harry Potter.โ
father(harrypotter,jamespotter). โ James Potter is the father of Harry Potter.
father(harrypotter,mrgranger). โ Mr. Granger is not the father of Harry Potter.
The facts contain three additional predicates: describing children
, male
,
and who is a siblingof
.
['male(mrgranger).', 'male(jamespotter).', 'male(harrypotter).', 'male(luciusmalfoy).', 'male(dracomalfoy).', 'male(arthurweasley).', 'male(ronweasley).', 'male(fredweasley).', 'male(georgeweasley).', 'male(hagrid).', 'male(dumbledore).', 'male(xenophiliuslovegood).', 'male(cygnusblack).', 'siblingof(ronweasley,fredweasley).', 'siblingof(ronweasley,georgeweasley).', 'siblingof(ronweasley,ginnyweasley).', 'siblingof(fredweasley,ronweasley).', 'siblingof(fredweasley,georgeweasley).', 'siblingof(fredweasley,ginnyweasley).', 'siblingof(georgeweasley,ronweasley).', 'siblingof(georgeweasley,fredweasley).', 'siblingof(georgeweasley,ginnyweasley).', 'siblingof(ginnyweasley,ronweasley).', 'siblingof(ginnyweasley,fredweasley).', 'siblingof(ginnyweasley,georgeweasley).', 'childof(mrgranger,hermione).', 'childof(mrsgranger,hermione).', 'childof(jamespotter,harrypotter).', 'childof(lilypotter,harrypotter).', 'childof(luciusmalfoy,dracomalfoy).', 'childof(narcissamalfoy,dracomalfoy).', 'childof(arthurweasley,ronweasley).', 'childof(mollyweasley,ronweasley).', 'childof(arthurweasley,fredweasley).', 'childof(mollyweasley,fredweasley).', 'childof(arthurweasley,georgeweasley).', 'childof(mollyweasley,georgeweasley).', 'childof(arthurweasley,ginnyweasley).', 'childof(mollyweasley,ginnyweasley).', 'childof(xenophiliuslovegood,lunalovegood).', 'childof(cygnusblack,narcissamalfoy).']
Our aim is to learn about what a โfatherโ is in terms of the facts we have available. This process is usually called induction, and is often portrayed as โlearning a definition of an object.โ
from srlearn.rdn import BoostedRDNClassifier
from srlearn import Background
bk = Background(
modes=[
"male(+name).",
"father(+name,+name).",
"childof(+name,+name).",
"siblingof(+name,+name)."
],
number_of_clauses=8,
)
clf = BoostedRDNClassifier(
background=bk,
target="father",
node_size=1,
n_estimators=5,
)
clf.fit(train)
/home/docs/checkouts/readthedocs.org/user_builds/srlearn/checkouts/latest/srlearn/base.py:70: FutureWarning: solver='BoostSRL' will default to solver='SRLBoost' in 0.6.0, pass one or the other as an argument to suppress this warning.
", pass one or the other as an argument to suppress this warning.", FutureWarning)
BoostedRDNClassifier(background=setParam: numOfClauses=8.
setParam: numOfCycles=100.
usePrologVariables: true.
setParam: nodeSize=1.
setParam: maxTreeDepth=3.
mode: male(+name).
mode: father(+name,+name).
mode: childof(+name,+name).
mode: siblingof(+name,+name).
, n_estimators=5, neg_pos_ratio=2, solver='BoostSRL', target='father')
Itโs important to check whether we actually learn something useful. Weโll visually inspect the relational regression trees to see what they learned.
from srlearn.plotting import plot_digraph
from srlearn.plotting import export_digraph
plot_digraph(export_digraph(clf, 0), format="html")
<srlearn.plotting._GVPlotter object at 0x7effa41d8910>
There is some variance between runs, but in the concept that the trees pick up on is roughly that โA father has a child and is male.โ
plot_digraph(export_digraph(clf, 1), format="html")
<srlearn.plotting._GVPlotter object at 0x7effa4058d50>
Here the data is fairly complete, and the concept that โA father has a child and is maleโ seems sufficient for the purposes of this data. Letโs apply our learned model to the test data, which includes facts about characters from Jane Austenโs Pride and Prejudice.
predictions = clf.predict_proba(test)
print("{:<35} {}".format("Predicate", "Probability of being True"), "\n", "-" * 60)
for predicate, prob in zip(test.pos + test.neg, predictions):
print("{:<35} {:.2f}".format(predicate, prob))
Predicate Probability of being True
------------------------------------------------------------
father(elizabeth,mrbennet). 0.47
father(jane,mrbennet). 0.47
father(charlotte,mrlucas). 0.47
father(charlotte,mrsbennet). 0.07
father(jane,mrlucas). 0.09
father(mrsbennet,mrbennet). 0.09
father(jane,elizabeth). 0.07
The confidence might be a little low, which is a good excuse to mention
one of the hyperparameters. โNode Size,โ or node_size
corresponds to
the maximum number of predicates that can be used as a split in the
dependency network. We set node_size=1
above for demonstration, but the
concept that seems to be learned: father(A, B) = [childof(B, A), male(B)]
is of size 2.
We might be able to learn a better model by taking this new information into account:
bk = Background(
modes=[
"male(+name).",
"father(+name,+name).",
"childof(+name,+name).",
"siblingof(+name,+name)."
],
number_of_clauses=8,
)
clf = BoostedRDNClassifier(
background=bk,
target="father",
node_size=2, # <--- Changed from 1 to 2
n_estimators=5,
)
clf.fit(train)
plot_digraph(export_digraph(clf, 0), format="html")
/home/docs/checkouts/readthedocs.org/user_builds/srlearn/checkouts/latest/srlearn/base.py:70: FutureWarning: solver='BoostSRL' will default to solver='SRLBoost' in 0.6.0, pass one or the other as an argument to suppress this warning.
", pass one or the other as an argument to suppress this warning.", FutureWarning)
<srlearn.plotting._GVPlotter object at 0x7eff9535f990>
This seems to be much more stable, which should also be reflected in the probabilities assigned on test examples.
predictions = clf.predict_proba(test)
print("{:<35} {}".format("Predicate", "Probability of being True"), "\n", "-" * 60)
for predicate, prob in zip(test.pos + test.neg, predictions):
print("{:<35} {:.2f}".format(predicate, prob))
Predicate Probability of being True
------------------------------------------------------------
father(elizabeth,mrbennet). 0.74
father(jane,mrbennet). 0.74
father(charlotte,mrlucas). 0.74
father(charlotte,mrsbennet). 0.09
father(jane,mrlucas). 0.09
father(mrsbennet,mrbennet). 0.09
father(jane,elizabeth). 0.09
Total running time of the script: ( 0 minutes 3.171 seconds)