The Network of Sentence-Co-Occurring Words of a Document¶

The case of Bruno Latour's An Inquiry into Modes of Existence¶

By Moses Boudourides & Sergios Lenis¶

IMPORTANT: To use this notebook, you'll need to

Install IPython Notebook (easiest way: use Anaconda)
Download this notebook and all other Python scripts used here from https://github.com/mboudour/WordNets
Run ipython notebook in the same directory where notebook and scripts were put

This work is licensed under a Creative Commons Attribution 4.0 International License.

Importing Python modules¶

import random
import urllib
import nltk
import codecs
from textblob import TextBlob
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
# http://localhost:8889/notebooks/LiteratureNetworks/PhilosophyBooks_Networks/Latour_AnInquiryIntoModesOfExistence/WordNetsFromDocuments.ipynb

doc="Bruno Latour's *An Inquiry into Modes of Existence*"
%matplotlib inline 
%load_ext autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

I. Importing a Document as Text¶

If the document is a text file from your own storage, move it in the same directory where this notebook and the complementary script tools.py are put and make the corresponding changes of the names of next cell.
If the document is a pdf file from your own storage or at a URL, first convert it to text using an online converter like http://document.online-convert.com/convert-to-txt and then follow previous step.
If the document is a text file at a URL, comment out next cell and uncomment the subsequent cell where you should make the corresponding changes of the names of that cell.

ATTENTION: Only one of cells (I1), (I2) and (I3) has to be active (uncommented)!¶

(I1) Importing a Document as Text from a Locally Stored txt File¶

filename = 'Latour_AIME.txt'
titlename = "Latour's AIME"

f = codecs.open(filename, "r", encoding="utf-8")
fTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
fTemp.write('temp.txt')
document=f.read()
blobbook = TextBlob(document)

(I2) Importing a Document as Text from a pdf File at a URL¶

# # URLpdf = u'https://ecomig2014.files.wordpress.com/2014/08/178919402-latour-bruno-an-inquiry-into-modes-of-existence-an-anthropology-of-the-moderns-pdf.pdf'

# filename = 'Latour_AIME.txt'
# titlename = "Latour's AIME"

# f = codecs.open(filename, "r", encoding="utf-8")
# fTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
# fTemp.write('temp.txt')
# document=f.read()
# blobbook = TextBlob(document)

(I3) Importing a Document as Text from a txt File at a URL¶

reload(sys)  
sys.setdefaultencoding('utf8')
urr='http://www.bruno-latour.fr/node/328'
URLname = u'https://www.w3.org/services/html2txt?url=http%3A%2F%2Fwww.bruno-latour.fr%2Fnode%2F328&noinlinerefs=on&nonums=on'
titlename="Summary of Latour's AIME"

f=urllib.urlopen(URLname.encode("UTF-8"))
document=f.read()
blobbook = TextBlob(document)

II. Selecting Terms in the Document as the WordNet Nodes¶

We may select terms in the document as the wordnet nodes in three ways:

Using a default list of terms (called selectedTerms) composed from the Index or the Table of Contents or by reading (parts of) the document.
Inserting a user-defined list of terms.
Extracting the noun phrases of the document through TextBlob.

ATTENTION: Only one of cells (II1), (II2) and (II3) has to be active (uncommented)!¶

(II1) Using a Default List of Selected Terms¶

# For Latour's AIME we are using the following list of 114 selected terms:

selectedTerms=[
'action', 'actor', 'anthropology', 'anxiety', 'art', 'assemble', 'association', 'attachment', 'attention', 'bads', 
'being', 'calculation', 'category', 'circle', 'civilization', 'connection', 'crises', 'crossing', 'delegation', 
'desire', 'destroy', 'detection', 'detour', 'differences', 'disorder', 'displacement', 'distance', 'emotion', 'empire', 
'ends', 'entities', 'enunciation', 'essence', 'existence', 'existent', 'exploration', 'fail', 'fiction', 'figuration', 
'figure', 'force', 'form', 'frame', 'freedom', 'global', 'goods', 'habit', 'hesitation', 'heterogeneous', 'hiatus', 
'horror', 'imitate', 'impossibility', 'information', 'ingenuity', 'inquiry', 'inscription', 'install', 'interest', 
'interpretive', 'invention', 'judgment', 'lacks', 'law', 'link', 'material', 'means', 'metamorphosis', 'mistake', 
'mode', 'modern', 'morality', 'narrative', 'network', 'obstacle', 'occidentalism', 'order', 'organization', 'passion', 
'person', 'politics', 'possibility', 'preposition', 'presence', 'production', 'project', 'protect', 'reason', 
'reconnect', 'reference', 'religion', 'represent', 'reproduction', 'research', 'resistance', 'risk', 'science', 
'script', 'scruple', 'society', 'speak', 'surprise', 'technology', 'template', 'time', 'transformation', 'translation', 
'traverse', 'trope', 'vacillation', 'veil', 'work', 'world', 'zigzag'    
]

# More terms can be manually inserted.

# Since the visualization of networks in **networkx** is rather obscures for more than 100 nodes, we randomly select
# a smaller number (k) of terms among the ones in the above list.

k = 20 # Number of terms to be randomly selected from the above selectedTerms list
selectedTerms = random.sample(selectedTerms, k)

dfst=pd.DataFrame(columns=["%s selected terms" %titlename, "Frequencies"])
u=1
selectedTermsDic={}
for l in selectedTerms: 
    dfst.loc[u]=[l,blobbook.word_counts[l]]
    selectedTermsDic[l]=blobbook.word_counts[l]
    u+=1
dfst.sort([u"Frequencies"],ascending=[0])

/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:33: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)

(II2) Inserting a User-Defined List of Selected Terms¶

# selectedT=raw_input('Type L for a list of terms or I for terms to be added one by one. Type "Enter" to exit!')
# if selectedT.strip() == 'L':
#     selectedTerms=input('Paste list')
# elif selectedT.strip() == 'I':
#     selectedTerms=[]
#     while True:
#         selec=raw_input('Give the next term')
#         if len(selec.strip()) >1:
#             selectedTerms.append(selec)
#         else:
#             breakDocument
# 
# dfst=pd.DataFrame(columns=["%s selected terms" %titlename, "Frequencies"])
# u=1
# selectedTermsDic={}
# for l in selectedTerms: 
#     dfst.loc[u]=[l,blobbook.word_counts[l]]
#     selectedTermsDic[l]=blobbook.word_counts[l]
#     u+=1
# dfst.sort([u"Frequencies"],ascending=[0])

(II3) Extracting TextBlob Noun Phrases¶

# npbook = blobbook.np_counts
# dfst = pd.DataFrame(columns=["%s noun phrases" %titlename, "Frequencies"])
# u=1
# selectedTermsDic={}
# for l in npbook: 
#     dfst.loc[u]=[l,npbook[l]]
#     selectedTermsDic[l]=blobbook.word_counts[l]
#     u+=1
# print "The total number of noun phrases in %s is %i." %(titlename,len(npbook))
# dfst.sort(["Frequencies"], ascending=[0])

III. Constructing the Network of Sentence-Co-Occurring Words in the Document¶

%autoreload 2
from tools import occurrences, makegraph

documentDict = occurrences(document,selectedTermsDic)
documentGraph = makegraph(documentDict)
pos=nx.spring_layout(documentGraph,scale=50,k=0.8,iterations=20)
# pos=nx.graphviz_layout(documentGraph)

IV. The Degree Histogram of the Network of Sentence-Co-Occurring Words in the Document¶

from tools import dhist

sstth="The Degree Histogram of %s wordnet" %titlename
dhp=dhist(documentGraph,sstth,pos=pos,figsize=(12,10))

V. Plotting the Network of Sentence-Co-Occurring Words in the Document¶

from tools import draw_network

sstt="%s wordnet" %titlename
possit=draw_network(documentGraph,sstt,pos=pos,with_edgewidth=True,withLabels=True,labfs=20,valpha=0.2,ealpha=0.3,labelfont=15,with_node_weight=True,node_size_fixer=300.)

VI. Centralities of Nodes in the Network of Sentence-Co-Occurring Words in the Document¶

from tools import draw_centralities_subplots

centrali=draw_centralities_subplots(documentGraph,pos,withLabels=False,labfs=5,figsi=(15,22),ealpha=1,vals=True)

The table of Centralities of Nodes in the Network of Sentence-Co-Occurring Words in the Document¶

dfc=pd.DataFrame()
u=0
for i,k in centrali.items():
    dfc.insert(u,i,k.values())
    u+=1
dfc.insert(0,'Nodes',centrali[centrali.keys()[0]].keys())
dfc

VII. Communities of Nodes in the Network of Sentence-Co-Occurring Words in the Document¶

%autoreload 2
from tools import draw_comms, modul_arity, print_communities

part,nodper=print_communities(documentGraph,sstt)

d=0.8 codefolding
dd=0.8
c=1.2
cc=1.4
alpha=0.2
ealpha=0.2
vcc={}
sstta="The %s %s Communities" %(max(part.values())+1,sstt)

draw_comms(documentGraph,documentGraph.nodes(),[],[],[] ,part,part,d,dd,c,cc,alpha,ealpha,nodper,sstta,titlefont=20,labelfont=17,valpha=0.5)

Number of communities of Latour's AIME wordnet = 3
Community partition of Latour's AIME wordnet:
[['differences', 'organization', 'technology', 'category', 'network', 'script', 'surprise'], ['form', 'modern', 'delegation', 'destroy', 'assemble', 'protect', 'bads', 'actor', 'impossibility'], ['presence', 'entities', 'install', 'disorder']]
Community modularity of Latour's AIME wordnet = 0.1454

import notebook
print notebook.nbextensions.check_nbextension('usability/python-markdown', user=True)
print notebook.nbextensions.check_nbextension('usability/python-markdown/main.js', user=True)

True
True

import notebook
E = notebook.nbextensions.EnableNBExtensionApp()
E.enable_nbextension('usability/python-markdown/main')

from IPython.html.services.config import ConfigManager
from IPython.display import HTML
ip = get_ipython()
cm = ConfigManager(parent=ip, profile_dir=ip.profile_dir.location)
extensions =cm.get('notebook')
table = ""
for ext in extensions['load_extensions']:
    table += "<tr><td>%s</td>\n" % (ext)

top = """
<table border="1">
  <tr>
    <th>Extension name</th>
  </tr>
"""
bottom = """
</table>
"""
HTML(top + table + bottom)

/usr/local/lib/python2.7/dist-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

	Nodes	closeness_centrality	katz_centrality	betweenness_centrality	page_rank	eigenvector_centrality	degree_centrality
0	category	0.655172	-0.030215	0.016186	0.042656	0.167740	0.473684
1	script	0.703704	-0.011007	0.018783	0.050317	0.207262	0.578947
2	protect	0.678571	0.077487	0.026518	0.123739	0.428796	0.526316
3	modern	0.950000	-0.133345	0.193630	0.015232	0.048071	0.947368
4	bads	0.558824	0.404119	0.000000	0.069674	0.308302	0.210526
5	network	0.791667	-0.014810	0.037782	0.110328	0.399749	0.736842
6	form	0.904762	0.023409	0.156788	0.069164	0.281147	0.894737
7	presence	0.678571	-0.102249	0.019312	0.023192	0.085143	0.526316
8	disorder	0.542857	-0.223896	0.000000	0.048106	0.172036	0.157895
9	differences	0.612903	0.071062	0.003481	0.030793	0.104435	0.368421
10	actor	0.703704	-0.023509	0.029268	0.069140	0.324336	0.578947
11	delegation	0.513514	0.801103	0.000000	0.009854	0.013646	0.105263
12	entities	0.730769	-0.040676	0.072501	0.061474	0.177420	0.631579
13	impossibility	0.542857	-0.116640	0.000000	0.031857	0.171000	0.157895
14	organization	0.678571	0.019917	0.020642	0.049685	0.210489	0.526316
15	install	0.612903	-0.103167	0.002402	0.042153	0.168851	0.368421
16	surprise	0.678571	0.011514	0.006718	0.053019	0.207205	0.526316
17	assemble	0.558824	0.271903	0.000000	0.019507	0.085937	0.210526
18	destroy	0.527778	0.019062	0.000000	0.031596	0.149053	0.157895
19	technology	0.655172	-0.049214	0.004177	0.048514	0.222783	0.473684