The Network and Trajectories of Transitions

among Sentential Co-Occurrences of Noun Phrases in

Plato's Phaedrus

By Moses Boudourides & Sergios Lenis

IMPORTANT: To use this notebook, you'll need to

  1. Install IPython Notebook (easiest way: use Anaconda)
  2. Download this notebook and all other Python scripts used here from https://github.com/mboudour/WordNets/blob/master/Plato_Phaedrus_Network&Trajectories.ipynb
  3. Run ipython notebook in the same directory where notebook and scripts were put

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Importing Python Modules

In [1]:
import random
import nltk
import codecs
from textblob import TextBlob
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
from collections import Counter
import imp
# utilsdir='/home/sergios-len/Dropbox/Python Projects (1)/utils/'#tools.py'
utilsdir='/home/mab/Dropbox/Python Projects/utils/'

%matplotlib inline 
%load_ext autoreload
/usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

I. Importing the Text of Plato's Phaedrus

In [2]:
filename = 'Plato_Phaedrus1.txt'
titlename = "Plato's Phaedrus"

f = codecs.open(filename, "r", encoding="utf-8").read()

num_lines = 0
num_words = 0
num_chars = 0
for line in f:
    words = line.split()
    num_lines += 1
    num_words += len(words)
    num_chars += len(line)
print "%s has number of words = %i and number of characters = %i" %(titlename,num_words,num_chars)

blob = TextBlob(f)
Plato's Phaedrus has number of words = 102009 and number of characters = 125944

II. Extracting the Most Frequent Noun Phrases in Plato's Phaedrus

In [3]:
all_sents=blob.sentences
occurdic=Counter()
for sen in all_sents:
    dd=sen.dict
    for np in dd['noun_phrases']:
        occurdic[np]+=1

df = pd.DataFrame(columns=["%s Noun Phrases" %titlename, "Frequencies"])
u=1
for l,v in occurdic.items(): 
    df.loc[u]=[l,v]
    u+=1

print "The total number of noun phrases in %s is %i." %(titlename,len(df))#len(npA))
df.sort(["Frequencies"], ascending=[0])

cut = 2
df = df[df['Frequencies']>cut].sort(["Frequencies"], ascending=[0])
print "The total number of noun phrases in %s with frequencies > %i is %i." %(titlename,cut,len(df))#len(npA))
df.sort(["Frequencies"], ascending=[0])
The total number of noun phrases in Plato's Phaedrus is 739.
The total number of noun phrases in Plato's Phaedrus with frequencies > 2 is 37.
/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:15: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:18: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:20: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
Out[3]:
Plato's Phaedrus Noun Phrases Frequencies
734 socrates 215
135 phaedrus 211
678 lysias 34
192 muses 11
737 god 9
299 zeus 7
677 true 5
540 well 5
441 fair youth 5
351 thamus 5
13 theuth 5
392 tisias 5
497 suppose 4
352 shall 4
381 matters stand 4
193 whole soul 4
486 homer 4
574 thrasymachus 4
169 nymphs 4
431 enough 4
174 odysseus 4
433 nestor 3
460 ilissus 3
461 true beauty 3
350 exactly 3
496 will 3
360 isocrates 3
509 midas 3
618 hippocrates 3
190 eros 3
182 boreas 3
699 good friend 3
717 acumenus 3
293 listen 3
353 who 3
254 stesichorus 3
306 pericles 3
In [4]:
%autoreload 2

selectedTerms={}
excluded = ['who','will','exactly','enough','shall','suppose','well']
for k in df["Plato's Phaedrus Noun Phrases"].tolist(): #df["Plato's Phaedrus Noun Phrases"].tolist():
    if k not in excluded:
        selectedTerms[k] = k.capitalize()
tool= imp.load_source('tools', utilsdir+'tools.py')
create_pandas_dataframe_from_text=tool.create_pandas_dataframe_from_text  

dfst,sec_prot,coccurlist,occurlist,dflines=create_pandas_dataframe_from_text(blob,selectedTerms,selectedTerms,titlename)
# print len(sec_prot.nodes()), sec_prot.nodes()
# dfst.sort_values(by='Frequencies').sort(["Frequencies"], ascending=[0])

prot_pol_sub=dflines[['protagonists','#_of_protagonists','polarity','subjectivity']].reset_index()
prot_pol_sub['sentence_id']=prot_pol_sub.index
prot_pol_sub=prot_pol_sub[['sentence_id','protagonists','#_of_protagonists','polarity','subjectivity']]

cuts = 1
prot_pol_sub = prot_pol_sub[prot_pol_sub['#_of_protagonists']>cuts]
lp = prot_pol_sub['protagonists'].tolist()
lpn = []
for i in lp:
    for j in i:
        lpn.append(j)
# len(set(lpn))
print "The total number of sentences in %s with at least %i selected noun phrases in each one of them is %i." %(titlename,cuts+1,len(prot_pol_sub))
prot_pol_sub.rename(columns={'protagonists':'list_of_selected_noun_phrases','#_of_protagonists':'#_of_selected_noun_phrases'},inplace=True)
prot_pol_sub.sort(["#_of_selected_noun_phrases"], ascending=[0]) #.drop('sentence_id', 1)
ddff = prot_pol_sub.drop('sentence_id', 1)
ddff.index.name = 'sentence_id'
ddff
The total number of sentences in Plato's Phaedrus with at least 2 selected noun phrases in each one of them is 27.
/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:29: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
Out[4]:
list_of_selected_noun_phrases #_of_selected_noun_phrases polarity subjectivity
sentence_id
6 [Acumenus, Lysias] 2 0.150000 0.491667
24 [Fair youth, Socrates, Lysias] 3 0.550000 0.775000
34 [Phaedrus, Lysias] 2 0.143750 0.547817
67 [Ilissus, Socrates, Boreas] 3 0.000000 0.000000
148 [Socrates, Zeus] 2 0.206667 0.351667
177 [Phaedrus, Lysias] 2 0.200000 0.200000
183 [Phaedrus, Lysias] 2 -0.033333 0.825000
209 [Muses, Good friend] 2 0.466667 0.500000
310 [Homer, Stesichorus] 2 0.000000 0.708333
312 [Homer, Stesichorus] 2 0.294444 0.455556
334 [Fair youth, Phaedrus] 2 0.350000 0.450000
455 [Phaedrus, Eros] 2 0.400000 0.800000
457 [Phaedrus, Lysias] 2 0.040000 0.406667
591 [Odysseus, Nestor] 2 0.000000 1.000000
593 [Thrasymachus, Odysseus, Nestor] 3 -0.250000 0.500000
706 [Nymphs, Lysias] 2 0.300000 0.750000
760 [Muses, Eros] 2 0.220833 0.330556
786 [Thrasymachus, Lysias] 2 0.500000 1.000000
864 [Socrates, Phaedrus] 2 0.068229 0.504167
871 [Thrasymachus, Lysias] 2 0.100000 1.000000
1015 [Thamus, Theuth] 2 -0.062500 0.187500
1016 [Theuth, Thamus] 2 -0.025000 0.450000
1018 [Theuth, Thamus] 2 0.530000 0.760000
1068 [Phaedrus, True] 2 0.350000 0.650000
1099 [Homer, Nymphs, Lysias] 3 -0.061728 0.545062
1119 [Isocrates, Phaedrus] 2 0.175000 0.575000
1127 [Isocrates, Lysias] 2 -0.125000 0.375000
In [5]:
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset

ndfl=dflines[dflines['#_of_protagonists']>0  ]

fig, ax = plt.subplots(figsize=[12, 10])
axes2 = zoomed_inset_axes(ax, 6, loc=5)  # zoom = 6

dflines['#_of_protagonists'].plot.hist(ax=ax)

ax.set_xlabel('#_of_Characters')
ax.set_ylabel('Frequency')
ax.set_title('Histogram of # of characters')

x1, x2, y1, y2 = 2.9, 3., 0, 25
axes2.set_xlim(x1, x2)
axes2.set_ylim(y1, y2)
ndfl['#_of_protagonists'].plot.hist(ax=axes2)
axes2.set_ylabel('Frequency')

mark_inset(ax, axes2, loc1=2, loc2=4, fc="none", ec="0.5")
axes3 = zoomed_inset_axes(ax, 6, loc=10)

x1, x2, y1, y2 = 2, 2.05, 0, 30
axes3.set_xlim(x1, x2)
axes3.set_ylim(y1, y2)
ndfl['#_of_protagonists'].plot.hist(ax=axes3)
axes3.set_ylabel('Frequency')

mark_inset(ax, axes3, loc1=2, loc2=4, fc="none", ec="0.5")

plt.show()
In [6]:
%autoreload 2

draw_network_node_color=tool.draw_network_node_color
sstt="%s Two-Mode Network of Sentences and Selected Noun Phrases" %titlename
pos=nx.spring_layout(sec_prot)
nds=[nd for nd in sec_prot.nodes() if isinstance(nd,int)]
prot=[nd for nd in sec_prot.nodes() if nd not in nds]

for en,nd in enumerate(nds):
    if en<len(nds)/2.:
        pos[nd][0]=-1
        pos[nd][1]=en*2./len(nds)
    else:
        pos[nd][0]=1
        pos[nd][1]=(en-len(nds)/2.)*2./len(nds)
for en ,nd in enumerate(prot):
    pos[nd][0]=0
    pos[nd][1]=en*1./len(prot)
    
possit=draw_network_node_color(sec_prot,sstt,pos=pos,with_edgewidth=False,withLabels=True,labfs=12,valpha=0.2,
                               ealpha=0.4,labelfont=15,with_node_weight=False,node_size_fixer=300.,node_col='polarity')
In [7]:
possit=draw_network_node_color(sec_prot,sstt,pos=pos,with_edgewidth=False,withLabels=True,labfs=12,valpha=0.2,
                               ealpha=0.4,labelfont=15,with_node_weight=False,node_size_fixer=300.,
                               node_col='subjectivity',colormat='Greens')

III. Constructing the Network of Sententially Co-Occurring Noun Phrases in Plato's Phaedrus

In [8]:
%autoreload 2

plist = prot_pol_sub['list_of_selected_noun_phrases'].tolist()
pplist=prot_pol_sub['polarity'].tolist()
nplist=prot_pol_sub['#_of_selected_noun_phrases'].tolist()
splist=prot_pol_sub['subjectivity'].tolist()

G = tool.make_graph_from_lists(plist,pplist,nplist,splist)
posg=nx.spring_layout(G,scale=50)#,k=0.55)#,iterations=20)

sstt="%s Network of Selected Noun Phrases \n(Sentences colored in polarity)" %titlename
possit=tool.draw_network(G,sstt,pos=posg,with_edgewidth=True,withLabels=True,labfs=15,valpha=0.2,ealpha=0.7,labelfont=15,
                   with_edgecolor=True,edgecolor='polarity',colormat='Blues')
In [9]:
sstt="%s Network of Selected Noun Phrases \n(Sentences colored in subjectivity)" %titlename
possit=tool.draw_network(G,sstt,pos=posg,with_edgewidth=True,withLabels=True,labfs=15,valpha=0.2,ealpha=0.7,labelfont=15,
                   with_edgecolor=True,edgecolor='subjectivity',colormat='Greys')

IV. Centralities of Nodes in the Network of Sententially Co-Occurring Noun Phrases in Plato's Phaedrus

In [10]:
centrali=tool.draw_centralities_subplots(G,pos=posg,withLabels=False,labfs=5,figsi=(15,22),ealpha=1,vals=True)