The Network of Sentence-Co-Occurring Words of a Document

The case of Bruno Latour's An Inquiry into Modes of Existence

By Moses Boudourides & Sergios Lenis

IMPORTANT: To use this notebook, you'll need to

  1. Install IPython Notebook (easiest way: use Anaconda)
  2. Download this notebook and all other Python scripts used here from https://github.com/mboudour/WordNets
  3. Run ipython notebook in the same directory where notebook and scripts were put

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Importing Python modules

In [31]:
import random
import urllib
import nltk
import codecs
from textblob import TextBlob
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
# http://localhost:8889/notebooks/LiteratureNetworks/PhilosophyBooks_Networks/Latour_AnInquiryIntoModesOfExistence/WordNetsFromDocuments.ipynb

doc="Bruno Latour's *An Inquiry into Modes of Existence*"
%matplotlib inline 
%load_ext autoreload
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

I. Importing a Document as Text

  1. If the document is a text file from your own storage, move it in the same directory where this notebook and the complementary script tools.py are put and make the corresponding changes of the names of next cell.
  2. If the document is a pdf file from your own storage or at a URL, first convert it to text using an online converter like http://document.online-convert.com/convert-to-txt and then follow previous step.
  3. If the document is a text file at a URL, comment out next cell and uncomment the subsequent cell where you should make the corresponding changes of the names of that cell.

ATTENTION: Only one of cells (I1), (I2) and (I3) has to be active (uncommented)!

(I1) Importing a Document as Text from a Locally Stored txt File

In [2]:
filename = 'Latour_AIME.txt'
titlename = "Latour's AIME"

f = codecs.open(filename, "r", encoding="utf-8")
fTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
fTemp.write('temp.txt')
document=f.read()
blobbook = TextBlob(document)

(I2) Importing a Document as Text from a pdf File at a URL

In [3]:
# # URLpdf = u'https://ecomig2014.files.wordpress.com/2014/08/178919402-latour-bruno-an-inquiry-into-modes-of-existence-an-anthropology-of-the-moderns-pdf.pdf'

# filename = 'Latour_AIME.txt'
# titlename = "Latour's AIME"

# f = codecs.open(filename, "r", encoding="utf-8")
# fTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
# fTemp.write('temp.txt')
# document=f.read()
# blobbook = TextBlob(document)

(I3) Importing a Document as Text from a txt File at a URL

In [4]:
reload(sys)  
sys.setdefaultencoding('utf8')
urr='http://www.bruno-latour.fr/node/328'
URLname = u'https://www.w3.org/services/html2txt?url=http%3A%2F%2Fwww.bruno-latour.fr%2Fnode%2F328&noinlinerefs=on&nonums=on'
titlename="Summary of Latour's AIME"

f=urllib.urlopen(URLname.encode("UTF-8"))
document=f.read()
blobbook = TextBlob(document)

II. Selecting Terms in the Document as the WordNet Nodes

We may select terms in the document as the wordnet nodes in three ways:

  1. Using a default list of terms (called selectedTerms) composed from the Index or the Table of Contents or by reading (parts of) the document.
  2. Inserting a user-defined list of terms.
  3. Extracting the noun phrases of the document through TextBlob.

ATTENTION: Only one of cells (II1), (II2) and (II3) has to be active (uncommented)!

(II1) Using a Default List of Selected Terms

In [5]:
# For Latour's AIME we are using the following list of 114 selected terms:

selectedTerms=[
'action', 'actor', 'anthropology', 'anxiety', 'art', 'assemble', 'association', 'attachment', 'attention', 'bads', 
'being', 'calculation', 'category', 'circle', 'civilization', 'connection', 'crises', 'crossing', 'delegation', 
'desire', 'destroy', 'detection', 'detour', 'differences', 'disorder', 'displacement', 'distance', 'emotion', 'empire', 
'ends', 'entities', 'enunciation', 'essence', 'existence', 'existent', 'exploration', 'fail', 'fiction', 'figuration', 
'figure', 'force', 'form', 'frame', 'freedom', 'global', 'goods', 'habit', 'hesitation', 'heterogeneous', 'hiatus', 
'horror', 'imitate', 'impossibility', 'information', 'ingenuity', 'inquiry', 'inscription', 'install', 'interest', 
'interpretive', 'invention', 'judgment', 'lacks', 'law', 'link', 'material', 'means', 'metamorphosis', 'mistake', 
'mode', 'modern', 'morality', 'narrative', 'network', 'obstacle', 'occidentalism', 'order', 'organization', 'passion', 
'person', 'politics', 'possibility', 'preposition', 'presence', 'production', 'project', 'protect', 'reason', 
'reconnect', 'reference', 'religion', 'represent', 'reproduction', 'research', 'resistance', 'risk', 'science', 
'script', 'scruple', 'society', 'speak', 'surprise', 'technology', 'template', 'time', 'transformation', 'translation', 
'traverse', 'trope', 'vacillation', 'veil', 'work', 'world', 'zigzag'    
]

# More terms can be manually inserted.

# Since the visualization of networks in **networkx** is rather obscures for more than 100 nodes, we randomly select
# a smaller number (k) of terms among the ones in the above list.

k = 20 # Number of terms to be randomly selected from the above selectedTerms list
selectedTerms = random.sample(selectedTerms, k)

dfst=pd.DataFrame(columns=["%s selected terms" %titlename, "Frequencies"])
u=1
selectedTermsDic={}
for l in selectedTerms: 
    dfst.loc[u]=[l,blobbook.word_counts[l]]
    selectedTermsDic[l]=blobbook.word_counts[l]
    u+=1
dfst.sort([u"Frequencies"],ascending=[0])
/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:33: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
Out[5]:
Latour's AIME selected terms Frequencies
12 form 160
19 technology 86
20 category 70
17 network 67
15 organization 64
3 modern 46
6 presence 31
13 script 30
10 entities 28
4 surprise 24
7 differences 15
8 protect 14
2 bads 12
1 assemble 8
9 impossibility 7
14 install 4
16 actor 4
18 delegation 4
5 destroy 2
11 disorder 2

(II2) Inserting a User-Defined List of Selected Terms

In [6]:
# selectedT=raw_input('Type L for a list of terms or I for terms to be added one by one. Type "Enter" to exit!')
# if selectedT.strip() == 'L':
#     selectedTerms=input('Paste list')
# elif selectedT.strip() == 'I':
#     selectedTerms=[]
#     while True:
#         selec=raw_input('Give the next term')
#         if len(selec.strip()) >1:
#             selectedTerms.append(selec)
#         else:
#             breakDocument
# 
# dfst=pd.DataFrame(columns=["%s selected terms" %titlename, "Frequencies"])
# u=1
# selectedTermsDic={}
# for l in selectedTerms: 
#     dfst.loc[u]=[l,blobbook.word_counts[l]]
#     selectedTermsDic[l]=blobbook.word_counts[l]
#     u+=1
# dfst.sort([u"Frequencies"],ascending=[0])

(II3) Extracting TextBlob Noun Phrases

In [7]:
# npbook = blobbook.np_counts
# dfst = pd.DataFrame(columns=["%s noun phrases" %titlename, "Frequencies"])
# u=1
# selectedTermsDic={}
# for l in npbook: 
#     dfst.loc[u]=[l,npbook[l]]
#     selectedTermsDic[l]=blobbook.word_counts[l]
#     u+=1
# print "The total number of noun phrases in %s is %i." %(titlename,len(npbook))
# dfst.sort(["Frequencies"], ascending=[0])

III. Constructing the Network of Sentence-Co-Occurring Words in the Document

In [35]:
%autoreload 2
from tools import occurrences, makegraph

documentDict = occurrences(document,selectedTermsDic)
documentGraph = makegraph(documentDict)
pos=nx.spring_layout(documentGraph,scale=50,k=0.8,iterations=20)
# pos=nx.graphviz_layout(documentGraph)

IV. The Degree Histogram of the Network of Sentence-Co-Occurring Words in the Document

In [17]:
from tools import dhist

sstth="The Degree Histogram of %s wordnet" %titlename
dhp=dhist(documentGraph,sstth,pos=pos,figsize=(12,10))

V. Plotting the Network of Sentence-Co-Occurring Words in the Document

In [11]:
from tools import draw_network

sstt="%s wordnet" %titlename
possit=draw_network(documentGraph,sstt,pos=pos,with_edgewidth=True,withLabels=True,labfs=20,valpha=0.2,ealpha=0.3,labelfont=15,with_node_weight=True,node_size_fixer=300.)

VI. Centralities of Nodes in the Network of Sentence-Co-Occurring Words in the Document

In [12]:
from tools import draw_centralities_subplots

centrali=draw_centralities_subplots(documentGraph,pos,withLabels=False,labfs=5,figsi=(15,22),ealpha=1,vals=True)

The table of Centralities of Nodes in the Network of Sentence-Co-Occurring Words in the Document

In [13]:
dfc=pd.DataFrame()
u=0
for i,k in centrali.items():
    dfc.insert(u,i,k.values())
    u+=1
dfc.insert(0,'Nodes',centrali[centrali.keys()[0]].keys())
dfc
Out[13]:
Nodes closeness_centrality katz_centrality betweenness_centrality page_rank eigenvector_centrality degree_centrality
0 category 0.655172 -0.030215 0.016186 0.042656 0.167740 0.473684
1 script 0.703704 -0.011007 0.018783 0.050317 0.207262 0.578947
2 protect 0.678571 0.077487 0.026518 0.123739 0.428796 0.526316
3 modern 0.950000 -0.133345 0.193630 0.015232 0.048071 0.947368
4 bads 0.558824 0.404119 0.000000 0.069674 0.308302 0.210526
5 network 0.791667 -0.014810 0.037782 0.110328 0.399749 0.736842
6 form 0.904762 0.023409 0.156788 0.069164 0.281147 0.894737
7 presence 0.678571 -0.102249 0.019312 0.023192 0.085143 0.526316
8 disorder 0.542857 -0.223896 0.000000 0.048106 0.172036 0.157895
9 differences 0.612903 0.071062 0.003481 0.030793 0.104435 0.368421
10 actor 0.703704 -0.023509 0.029268 0.069140 0.324336 0.578947
11 delegation 0.513514 0.801103 0.000000 0.009854 0.013646 0.105263
12 entities 0.730769 -0.040676 0.072501 0.061474 0.177420 0.631579
13 impossibility 0.542857 -0.116640 0.000000 0.031857 0.171000 0.157895
14 organization 0.678571 0.019917 0.020642 0.049685 0.210489 0.526316
15 install 0.612903 -0.103167 0.002402 0.042153 0.168851 0.368421
16 surprise 0.678571 0.011514 0.006718 0.053019 0.207205 0.526316
17 assemble 0.558824 0.271903 0.000000 0.019507 0.085937 0.210526
18 destroy 0.527778 0.019062 0.000000 0.031596 0.149053 0.157895
19 technology 0.655172 -0.049214 0.004177 0.048514 0.222783 0.473684

VII. Communities of Nodes in the Network of Sentence-Co-Occurring Words in the Document

In [14]:
%autoreload 2
from tools import draw_comms, modul_arity, print_communities

part,nodper=print_communities(documentGraph,sstt)

d=0.8 codefolding
dd=0.8
c=1.2
cc=1.4
alpha=0.2
ealpha=0.2
vcc={}
sstta="The %s %s Communities" %(max(part.values())+1,sstt)

draw_comms(documentGraph,documentGraph.nodes(),[],[],[] ,part,part,d,dd,c,cc,alpha,ealpha,nodper,sstta,titlefont=20,labelfont=17,valpha=0.5)
Number of communities of Latour's AIME wordnet = 3
Community partition of Latour's AIME wordnet:
[['differences', 'organization', 'technology', 'category', 'network', 'script', 'surprise'], ['form', 'modern', 'delegation', 'destroy', 'assemble', 'protect', 'bads', 'actor', 'impossibility'], ['presence', 'entities', 'install', 'disorder']]
Community modularity of Latour's AIME wordnet = 0.1454