Interlocking International Networks from the Panama Papers

The Case of Greece, Cyprus and Russia

By Moses Boudourides & Sergios Lenis

In [35]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
Out[35]:

1. Basic Structure of the Panama Papers Dataset

  • The Panama Paper dataset (downloadable from https://offshoreleaks.icij.org/pages/database) consists of five separate data matrices:
    1. The data matrix of edges containing 1265690 rows and 3 columns.
      • The second column (named 'rel_type') corresponds to the type of relation that binds the node of the first column (called 'node_1') with the node of the third column (called 'node_2'). Each one of the two nodes is identified by a certain number called node id.
      • There are five types of relations (directed edges):
        1. node 1 being officer of node 2.
        2. node 1 being intermediary of node 2.
        3. node 2 being registered address of node 2.
        4. node 3 being similar to node 2.
        5. node 4 being underlying to node 2.
    2. The data matrix of officers containing 345594 rows and 7 columns.
    3. The data matrix of intermediaries 23636 rows and 9 columns.
    4. The data matrix of (offshore) entities containing 319150 rows and 21 columns.
    5. The data matrix of addresses containing 151054 rows and 7 columns.
  • Among all the columns of the last four data matrices, here, we are going to focus on three columns:
    • node_id, countries and country codes appearing in all four data matrices.
    • name appearing in the first three data matrices (i.e., all of them except "addresses").
  • Thus, we can merge the four data matrices ("officers", "intermediaries", "entities" and "addresses") together in such a way that they are indexed by the (unique) node_id.
    • The merged dataset will be called all-nodes dataframe.
    • Each row of this dataframe will be called node.
    • A new column (called type) is added in order to indicate whether a node is an officer or an intermediary or an entity or an address.
    • In particular, the all-nodes dataframe contains
      • 839434 rows (nodes) and 23 columns,
      • 838295 unique node_ids,
      • 345594 officers,
      • 23636 intermediaries,
      • 319150 (offshore) entities,
      • 151054 addresses and
      • 209 countries.
In [1]:
%matplotlib inline 

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
from pandas.tools.plotting import scatter_matrix
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import numpy as np
import math
import random
import os
from lightning import Lightning

edges = pd.read_csv('offshore_leaks_csvs-20160621/all_edges.csv')
rels = edges.rel_type.unique()
edges_officer_of = edges.loc[edges['rel_type']=='officer_of']
edges_intermediary_of = edges.loc[edges['rel_type']=='intermediary_of']
edges_registered_address = edges.loc[edges['rel_type']=='registered_address']
edges_similar = edges.loc[edges['rel_type']=='similar']
edges_underlying = edges.loc[edges['rel_type']=='underlying']
Officers = pd.read_csv('offshore_leaks_csvs-20160621/Officers.csv')
Intermediaries = pd.read_csv('offshore_leaks_csvs-20160621/Intermediaries.csv')
Entities = pd.read_csv('offshore_leaks_csvs-20160621/Entities.csv')
Addresses = pd.read_csv('offshore_leaks_csvs-20160621/Addresses.csv')
print 'The data matrix "edges" contains %i rows and %i columns' %(edges.shape[0],edges.shape[1])
print 'The', edges.shape[1], 'columns of the data matrix "edges" are', list(edges.columns)
print 'The types of relations are', list(rels)
print "The number of edges of type 'officer_of' are", edges_officer_of.shape[0]
print "The number of edges of type 'intermediary_of' are", edges_intermediary_of.shape[0]
print "The number of edges of type 'registered_address' are", edges_registered_address.shape[0]
print "The number of edges of type 'similar' are", edges_similar.shape[0]
print "The number of edges of type 'underlying' are", edges_underlying.shape[0]
print
print 'The data matrix "officers" contains %i rows and %i columns' %(Officers.shape[0],Officers.shape[1])
print 'The', Officers.shape[1], 'columns of the data matrix "officers" are:' 
print list(Officers.columns)
print
print 'The data matrix "intermediaries" contains %i rows and %i columns' %(Intermediaries.shape[0],Intermediaries.shape[1])
print 'The', Intermediaries.shape[1], 'columns of the data matrix "intermediaries" are:' 
print list(Intermediaries.columns)
print
print 'The data matrix "entities" contains %i rows and %i columns' %(Entities.shape[0],Entities.shape[1])
print 'The', Entities.shape[1], 'columns of the data matrix "entities" are:' 
print list(Entities.columns)
print
print 'The data matrix "addresses" contains %i rows and %i columns' %(Addresses.shape[0],Addresses.shape[1])
print 'The', Addresses.shape[1], 'columns of the data matrix "addresses" are:' 
print list(Addresses.columns)
officers = pd.read_csv('offshore_leaks_csvs-20160621/Officers.csv').set_index('node_id')
intermediaries = pd.read_csv('offshore_leaks_csvs-20160621/Intermediaries.csv').set_index('node_id')
addresses = pd.read_csv('offshore_leaks_csvs-20160621/Addresses.csv').set_index('node_id')
entities = pd.read_csv('offshore_leaks_csvs-20160621/Entities.csv').set_index('node_id')
officers["type"] = "officer"
intermediaries["type"] = "intermediary"
addresses["type"] = "address"
entities["type"] = "entity"
all_nodes = pd.concat([officers, intermediaries, addresses, entities])
all_nodes['name'] = all_nodes['name'].str.upper()
all_nodes['name'] = all_nodes['name'].str.strip()
all_nodes['name'].replace(to_replace=[r'MRS?\.\s+', r'\.', r'\s+', 'LIMITED'], 
                          value=['', '', ' ', 'LTD'], inplace=True, 
                          regex=True)
# Ensure that all "Bearers" do not become a single node
alBear=all_nodes[all_nodes.name == 'THE BEARER'].to_dict()
all_nodes.loc[all_nodes.name == 'THE BEARER']['name'] = np.nan
officers=None
intermediaries=None
addresses=None
entities=None
all_nodes = all_nodes.reset_index()
print 'The dataframe "all_nodes" contains', all_nodes.shape[0], 'rows and', all_nodes.shape[1], 'columns'
print 'The', all_nodes.shape[1], 'columns of the dataframe "all_nodes" are:' 
print list(all_nodes.columns)
print 
print 'The dataframe "all_nodes" contains', len(all_nodes[all_nodes.type == 'officer']), 'officers'
print 'The dataframe "all_nodes" contains', len(all_nodes[all_nodes.type == 'intermediary']), 'intermediaries'
print 'The dataframe "all_nodes" contains', len(all_nodes[all_nodes.type == 'entity']), '(offshore) entities'
print 'The dataframe "all_nodes" contains', len(all_nodes[all_nodes.type == 'address']), 'addresses'
The data matrix "edges" contains 1269796 rows and 3 columns
The 3 columns of the data matrix "edges" are ['node_1', 'rel_type', 'node_2']
The types of relations are ['intermediary of', 'shareholder of', 'Shareholder of', 'Director / Shareholder of', 'Director of', 'Director (Rami Makhlouf) of', 'Power of Attorney of', 'Director / Shareholder / Beneficial Owner of', 'Member / Shareholder of', 'Owner of', 'Beneficial Owner of', 'Power of attorney of', 'Owner, director and shareholder of', 'President - Director of', 'Sole shareholder of', 'President and director of', 'Director / Beneficial Owner of', 'Power of Attorney / Shareholder of', 'Director and shareholder of', 'beneficiary of', 'President of', 'Authorized signatory of', 'Secretary of', 'Member of Foundation Council of', 'Signatory of', 'Grantee of a mortgage of', 'Beneficial owner of', 'Sole signatory of', 'Sole signatory / Beneficial owner of', 'Principal beneficiary of', 'Protector of', 'Beneficiary, shareholder and director of', 'Beneficiary of', 'Connected of', 'Shareholder (through Julex Foundation) of', 'First beneficiary of', 'Authorised Person / Signatory of', 'Co-Trustee of Trust of', 'Partner of', 'Trust Settlor of', 'Officer of', 'General Accountant of', 'Successor Protector of', 'Register of Shareholder of', 'Reserve Director of', 'Auditor of', 'Investment Advisor of', 'Resident Director of', 'Alternate Director of', 'Nominated Person of', 'Register of Director of', 'Tax Advisor of', 'Bank Signatory of', 'Trustee of Trust of', 'Appointor of', 'Legal Advisor of', 'Personal Directorship of', 'Stockbroker of', 'Joint Settlor of', 'Assistant Secretary of', 'Unit Trust Register of', 'Treasurer of', 'Vice President of', 'Auth. Representative of', 'Records & Registers of', 'Safekeeping of', 'Correspondent Addr. of', 'Chairman of', 'Board Representative of', 'Custodian of', 'Nominee Name of', 'registered address', 'related entity', 'similar name and address as', 'same name and registration date as', 'same address as', 'Nominee Shareholder of', 'Nominee Director of', 'Nominee Protector of', 'Nominee Investment Advisor of', 'Nominee Trust Settlor of', 'Nominee Beneficiary of', 'Nominee Secretary of', 'Nominee Beneficial Owner of']
The number of edges of type 'officer_of' are 0
The number of edges of type 'intermediary_of' are 0
The number of edges of type 'registered_address' are 0
The number of edges of type 'similar' are 0
The number of edges of type 'underlying' are 0

The data matrix "officers" contains 345594 rows and 7 columns
The 7 columns of the data matrix "officers" are:
['name', 'icij_id', 'valid_until', 'country_codes', 'countries', 'node_id', 'sourceID']

The data matrix "intermediaries" contains 23636 rows and 9 columns
The 9 columns of the data matrix "intermediaries" are:
['name', 'internal_id', 'address', 'valid_until', 'country_codes', 'countries', 'status', 'node_id', 'sourceID']

The data matrix "entities" contains 319150 rows and 21 columns
The 21 columns of the data matrix "entities" are:
['name', 'original_name', 'former_name', 'jurisdiction', 'jurisdiction_description', 'company_type', 'address', 'internal_id', 'incorporation_date', 'inactivation_date', 'struck_off_date', 'dorm_date', 'status', 'service_provider', 'ibcRUC', 'country_codes', 'countries', 'note', 'valid_until', 'node_id', 'sourceID']

The data matrix "addresses" contains 151054 rows and 7 columns
The 7 columns of the data matrix "addresses" are:
['address', 'icij_id', 'valid_until', 'country_codes', 'countries', 'node_id', 'sourceID']
The dataframe "all_nodes" contains 839434 rows and 23 columns
The 23 columns of the dataframe "all_nodes" are:
['node_id', 'address', 'company_type', 'countries', 'country_codes', 'dorm_date', 'former_name', 'ibcRUC', 'icij_id', 'inactivation_date', 'incorporation_date', 'internal_id', 'jurisdiction', 'jurisdiction_description', 'name', 'note', 'original_name', 'service_provider', 'sourceID', 'status', 'struck_off_date', 'type', 'valid_until']

The dataframe "all_nodes" contains 345594 officers
The dataframe "all_nodes" contains 23636 intermediaries
The dataframe "all_nodes" contains 319150 (offshore) entities
The dataframe "all_nodes" contains 151054 addresses
In [ ]:
 
In [2]:
fildir='offshore_leaks_csvs'

edges = pd.read_csv(os.path.join(fildir, 'all_edges.csv'))
rels = edges.rel_type.unique()
edges_officer_of = edges.loc[edges['rel_type']=='officer_of']
edges_intermediary_of = edges.loc[edges['rel_type']=='intermediary_of']
edges_registered_address = edges.loc[edges['rel_type']=='registered_address']
edges_similar = edges.loc[edges['rel_type']=='similar']
edges_underlying = edges.loc[edges['rel_type']=='underlying']

Officers = pd.read_csv(os.path.join(fildir, 'Officers.csv'))
Intermediaries = pd.read_csv(os.path.join(fildir, 'Intermediaries.csv'))
Entities = pd.read_csv(os.path.join(fildir, 'Entities.csv'), low_memory=False)
Addresses = pd.read_csv(os.path.join(fildir, 'Addresses.csv'))

print 'The data matrix "edges" contains %i rows and %i columns' %(edges.shape[0],edges.shape[1])
print 'The', edges.shape[1], 'columns of the data matrix "edges" are', list(edges.columns)
print 'The types of relations are', list(rels)
print "The number of edges of type 'officer_of' are", edges_officer_of.shape[0]
print "The number of edges of type 'intermediary_of' are", edges_intermediary_of.shape[0]
print "The number of edges of type 'registered_address' are", edges_registered_address.shape[0]
print "The number of edges of type 'similar' are", edges_similar.shape[0]
print "The number of edges of type 'underlying' are", edges_underlying.shape[0]
print
print 'The data matrix "officers" contains %i rows and %i columns' %(Officers.shape[0],Officers.shape[1])
print 'The', Officers.shape[1], 'columns of the data matrix "officers" are:' 
print list(Officers.columns)
print
print 'The data matrix "intermediaries" contains %i rows and %i columns' %(Intermediaries.shape[0],Intermediaries.shape[1])
print 'The', Intermediaries.shape[1], 'columns of the data matrix "intermediaries" are:' 
print list(Intermediaries.columns)
print
print 'The data matrix "entities" contains %i rows and %i columns' %(Entities.shape[0],Entities.shape[1])
print 'The', Entities.shape[1], 'columns of the data matrix "entities" are:' 
print list(Entities.columns)
print
print 'The data matrix "addresses" contains %i rows and %i columns' %(Addresses.shape[0],Addresses.shape[1])
print 'The', Addresses.shape[1], 'columns of the data matrix "addresses" are:' 
print list(Addresses.columns)
The data matrix "edges" contains 1265690 rows and 3 columns
The 3 columns of the data matrix "edges" are ['node_1', 'rel_type', 'node_2']
The types of relations are ['intermediary_of', 'officer_of', 'registered_address', 'similar', 'underlying']
The number of edges of type 'officer_of' are 581476
The number of edges of type 'intermediary_of' are 319121
The number of edges of type 'registered_address' are 317094
The number of edges of type 'similar' are 46761
The number of edges of type 'underlying' are 1238

The data matrix "officers" contains 345594 rows and 7 columns
The 7 columns of the data matrix "officers" are:
['name', 'icij_id', 'valid_until', 'country_codes', 'countries', 'node_id', 'sourceID']

The data matrix "intermediaries" contains 23636 rows and 9 columns
The 9 columns of the data matrix "intermediaries" are:
['name', 'internal_id', 'address', 'valid_until', 'country_codes', 'countries', 'status', 'node_id', 'sourceID']

The data matrix "entities" contains 319150 rows and 21 columns
The 21 columns of the data matrix "entities" are:
['name', 'original_name', 'former_name', 'jurisdiction', 'jurisdiction_description', 'company_type', 'address', 'internal_id', 'incorporation_date', 'inactivation_date', 'struck_off_date', 'dorm_date', 'status', 'service_provider', 'ibcRUC', 'country_codes', 'countries', 'note', 'valid_until', 'node_id', 'sourceID']

The data matrix "addresses" contains 151054 rows and 7 columns
The 7 columns of the data matrix "addresses" are:
['address', 'icij_id', 'valid_until', 'country_codes', 'countries', 'node_id', 'sourceID']
In [3]:
from IPython.display import Image
Image(filename='figs/oie.png')
Out[3]:
In [33]:
all_nodes.head(20)
Out[33]:
node_id address company_type countries country_codes dorm_date former_name ibcRUC icij_id inactivation_date ... jurisdiction_description name note original_name service_provider sourceID status struck_off_date type valid_until
0 12000001 NaN NaN South Korea KOR NaN NaN NaN E72326DEA50F1A9C2876E112AAEB42BC NaN ... NaN KIM SOO IN NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
1 12000002 NaN NaN China CHN NaN NaN NaN 58287E0FD37852000D9D5AB8B27A2581 NaN ... NaN TIAN YUAN NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
2 12000003 NaN NaN Australia AUS NaN NaN NaN F476011509FD5C2EF98E9B1D74913CCE NaN ... NaN GREGORY JOHN SOLOMON NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
3 12000004 NaN NaN Japan JPN NaN NaN NaN 974F420B2324A23EAF46F20E178AF52C NaN ... NaN MATSUDA MASUMI NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
4 12000005 NaN NaN Viet Nam VNM NaN NaN NaN 06A0FC92656D09F63D966FE7BD076A45 NaN ... NaN HO THUY NGA NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
5 12000006 NaN NaN Australia AUS NaN NaN NaN 14BCB3A8F783A319511E6C5EF5F4BB30 NaN ... NaN RACHMAT ARIFIN NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
6 12000007 NaN NaN Philippines PHL NaN NaN NaN C3912EA62746F395A64FB216BE464F61 NaN ... NaN TAN SUN-HUA NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
7 12000008 NaN NaN Taiwan TWN NaN NaN NaN DB896EE47F60BB1B2E9EA9C10ACBFCD7 NaN ... NaN OU YANG YET-SING AND CHANG KO NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
8 12000009 NaN NaN Taiwan TWN NaN NaN NaN 1B92FDDD451DA8DCA9CD36B0AF797411 NaN ... NaN WU CHI-PING AND WU CHOU TSAN-TING NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
9 12000010 NaN NaN China CHN NaN NaN NaN 0AE47CB442426F2ACF73E42BFA6657FA NaN ... NaN ZHONG LI MING NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
10 12000011 NaN NaN China CHN NaN NaN NaN BB8842DD315BB503CCCE1D3B23575A14 NaN ... NaN LIN PING NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
11 12000012 NaN NaN NaN NaN NaN NaN NaN 5F64E7218A4743275D1098ED6F7C8221 NaN ... NaN BOSHEN LTD/135-77 NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
12 12000013 NaN NaN NaN NaN NaN NaN NaN B08B58F2381272DDE80A02220EDF938A NaN ... NaN BOSHEN LTD/133-58 NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
13 12000014 NaN NaN NaN NaN NaN NaN NaN 69D406C865AA5A91B290E6EB49A9C430 NaN ... NaN BOSHEN LTD/132-50 NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
14 12000015 NaN NaN NaN NaN NaN NaN NaN BC7889636CFCAF8D386459BA10F40C11 NaN ... NaN BOSHEN LTD NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
15 12000016 NaN NaN NaN NaN NaN NaN NaN B7DA7B60A1E6E0DE0B45A31936B78ADB NaN ... NaN ALGONQUIN TRUST LTD NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
16 12000017 NaN NaN NaN NaN NaN NaN NaN E9C108CB220912B70747D23611CC1CEC NaN ... NaN ALGONQUIN TRUST PANAMA NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
17 12000018 NaN NaN NaN NaN NaN NaN NaN DA0E7729252BE7F9A3E53B77485698D7 NaN ... NaN MIRABAUD & CIE/111-40 NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
18 12000019 NaN NaN NaN NaN NaN NaN NaN 145B00769D1ECDB2A01DB431D1879010 NaN ... NaN BOSHEN LTD/137-93 NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015
19 12000020 NaN NaN NaN NaN NaN NaN NaN 39CB6CE27746A8BA38FB32064563FFD9 NaN ... NaN BOSHEN LTD/136-83 NaN NaN NaN Panama Papers NaN NaN officer The Panama Papers data is current through 2015

20 rows × 23 columns

2. Countries in Panama Papers

In [5]:
cc_dict=all_nodes[['country_codes','countries','node_id','type']].to_dict()
from collections import Counter
mono={}#Counter()
countries_dict={}
for k,v in cc_dict['country_codes'].items():
    if isinstance(v,float):
        continue
    vv=v.split(';')
    kk=cc_dict['countries'][k].split(';')
    for ik,vk in enumerate(vv):
        if vk not in mono:
            mono[vk]={}
        if cc_dict['type'][k] not in mono[vk]:
            mono[vk][cc_dict['type'][k]]=Counter()
        mono[vk][cc_dict['type'][k]][vk]+=1
        countries_dict[kk[ik]]=vk   
print 'The total number of countries in Panama Papers is', len(countries_dict)
key_lis=[]
for key in  sorted(countries_dict):
    vv=countries_dict[key]
    sor={'Country_name':key,'Country_code':vv}
    for k,v in mono[vv].items():
        sor[k]=v[vv] 
    key_lis.append(sor)#{'Country_name':key,'Country_code':vv,'Number_of_nodes':mono[countries_dict[key]]})
countries_pd=pd.DataFrame(key_lis)
The total number of countries in Panama Papers is 209
In [6]:
countries_pd
Out[6]:
Country_code Country_name address entity intermediary officer
0 ALB Albania 23 2 NaN 25
1 DZA Algeria 20 NaN NaN 22
2 ASM American Samoa 10 1 NaN 16
3 AND Andorra 41 490 22 55
4 AGO Angola 39 10 1 58
5 AIA Anguilla 115 23 1 254
6 ATG Antigua and Barbuda 39 280 8 229
7 ARG Argentina 773 270 94 1294
8 ARM Armenia 32 NaN NaN 37
9 ABW Aruba 18 27 5 29
10 AUS Australia 1408 118 201 1703
11 AUT Austria 115 76 23 121
12 AZE Azerbaijan 111 8 1 127
13 BHS Bahamas 954 5021 122 1593
14 BHR Bahrain 68 13 12 94
15 BGD Bangladesh 42 2 3 56
16 BRB Barbados 57 35 14 47
17 BLR Belarus 87 35 4 98
18 BEL Belgium 293 61 54 363
19 BLZ Belize 274 1455 11 626
20 BEN Benin 8 NaN NaN 9
21 BMU Bermuda 218 149 29 402
22 BTN Bhutan 1 NaN NaN 1
23 BOL Bolivia 35 95 20 43
24 BIH Bosnia and Herzegovina 5 NaN NaN 6
25 BWA Botswana 82 6 2 105
26 BRA Brazil 1438 1399 406 2056
27 VGB British Virgin Islands 4125 69092 88 15211
28 BRN Brunei 60 5 9 73
29 BGR Bulgaria 102 50 8 117
... ... ... ... ... ... ...
179 SUR Suriname 7 NaN NaN 8
180 SWZ Swaziland 19 NaN NaN 21
181 SWE Sweden 193 84 34 201
182 CHE Switzerland 3208 38077 1339 4595
183 SYR Syria 41 5 5 65
184 TWN Taiwan 14610 2906 1324 19571
185 TJK Tajikistan 12 NaN NaN 12
186 TZA Tanzania 38 3 3 53
187 THA Thailand 1285 1013 362 1413
188 TGO Togo 3 NaN NaN 3
189 TON Tonga 2 NaN NaN 15
190 TTO Trinidad and Tobago 15 6 4 18
191 TUN Tunisia 26 4 3 31
192 TUR Turkey 559 101 21 684
193 TKM Turkmenistan 16 NaN NaN 18
194 TCA Turks and Caicos Islands 53 41 11 75
195 VIR U.S. Virgin Islands 69 13 8 609
196 UGA Uganda 8 1 1 8
197 UKR Ukraine 558 469 20 643
198 ARE United Arab Emirates 2081 7772 182 3397
199 GBR United Kingdom 4839 17973 2106 5676
200 USA United States 6860 6254 1540 7325
201 URY Uruguay 894 4906 300 2016
202 UZB Uzbekistan 98 5 NaN 104
203 VUT Vanuatu 54 34 8 591
204 VEN Venezuela 494 750 188 831
205 VNM Viet Nam 185 19 23 189
206 YEM Yemen 6 1 NaN 9
207 ZMB Zambia 34 2 2 47
208 ZWE Zimbabwe 225 8 6 293

209 rows × 6 columns

In [7]:
import warnings
warnings.filterwarnings("ignore")
ntei='Scatter Matrix Plot of the Disribution of Officers, Intermediaries and Entities over Countries' 
f, ax = plt.subplots(figsize=(15,15))
sss=scatter_matrix(countries_pd[['officer','intermediary','entity']], alpha=0.9, color='black', diagonal='hist',ax=ax)
plt.suptitle(ntei,fontsize=18,fontweight='bold')
corr = countries_pd.corr().as_matrix()
for i, j in zip(*plt.np.triu_indices_from(sss, k=1)):
    sss[i, j].annotate("pearson = %.3f" %corr[i,j], (0.8, 0.93), xycoords='axes fraction', ha='center', va='center')
In [8]:
c1='Greece'
i1=70
c2='Cyprus'
i2=46
c3='Russia'
i3=154
c4='Turkey'
i4=192
c5='United Kingdom'
i5=199
c6='United States'
i6=200
c7='Belgium'
i7=18
c8='Austria'
i8=11
c9='Bulgaria'
i9=29
c10='Belarus'
i10=17
c11='Czech Republic'
i11=47
c12='Denmark'
i12=50
c13='Estonia'
i13=58
c14='Finland'
i14=61
c15='France'
i15=62
c16='Georgia'
i16=66
c17='Germany'
i17=67
c18='Hungary'
i18=81
c19='Iceland'
i19=82
c20='Italy'
i20=90
c21='Ireland'
i21=87
# c22='Kazakshtan'
# i22=95
c23='Latvia'
i23=100
c24='Liechtenstein'
i24=105
c25='Luxembourg'
i25=107
c26='Malta'
i26=115
c27='Moldova'
i27=120
c28='Monaco'
i28=121
c29='Netherlands'
i29=130
c30='Norway'
i30=139
c31='Ireland'
i31=87
c32='Poland'
i32=149
c33='Portugal'
i33=150
c34='Romania'
i34=153
c35='Serbia'
i35=165
c36='Slovakia'
i36=170
c37='Slovenia'
i37=171
c38='Spain'
i38=176
c39='Sweden'
i39=181
c40='Switzerland'
i40=182
c41='Ukraine'
i41=197
c42='Andorra'
i42=3
# c43='Azerbaijan'
# i43=12

# c1='Zimbabwe'
# i1=208
# c2='Turkey'
# i2=192
# c3='Venezuela'
# i3=204


gr=countries_pd[countries_pd['Country_name']==c1].to_dict()
negr=[]
negr.append(gr['officer'][i1])
negr.append(gr['intermediary'][i1])
negr.append(gr['entity'][i1])

cy=countries_pd[countries_pd['Country_name']==c2].to_dict()
necy=[]
necy.append(cy['officer'][i2])
necy.append(cy['intermediary'][i2])
necy.append(cy['entity'][i2])

ru=countries_pd[countries_pd['Country_name']==c3].to_dict()
neru=[]
neru.append(ru['officer'][i3])
neru.append(ru['intermediary'][i3])
neru.append(ru['entity'][i3])


cc4=countries_pd[countries_pd['Country_name']==c4].to_dict()
ncc4=[]
ncc4.append(cc4['officer'][i4])
ncc4.append(cc4['intermediary'][i4])
ncc4.append(cc4['entity'][i4])

cc5=countries_pd[countries_pd['Country_name']==c5].to_dict()
ncc5=[]
ncc5.append(cc5['officer'][i5])
ncc5.append(cc5['intermediary'][i5])
ncc5.append(cc5['entity'][i5])

cc6=countries_pd[countries_pd['Country_name']==c6].to_dict()
ncc6=[]
ncc6.append(cc6['officer'][i6])
ncc6.append(cc6['intermediary'][i6])
ncc6.append(cc6['entity'][i6])

cc7=countries_pd[countries_pd['Country_name']==c7].to_dict()
ncc7=[]
ncc7.append(cc7['officer'][i7])
ncc7.append(cc7['intermediary'][i7])
ncc7.append(cc7['entity'][i7])

cc8=countries_pd[countries_pd['Country_name']==c8].to_dict()
ncc8=[]
ncc8.append(cc8['officer'][i8])
ncc8.append(cc8['intermediary'][i8])
ncc8.append(cc8['entity'][i8])

cc9=countries_pd[countries_pd['Country_name']==c9].to_dict()
ncc9=[]
ncc9.append(cc9['officer'][i9])
ncc9.append(cc9['intermediary'][i9])
ncc9.append(cc9['entity'][i9])

cc10=countries_pd[countries_pd['Country_name']==c10].to_dict()
ncc10=[]
ncc10.append(cc10['officer'][i10])
ncc10.append(cc10['intermediary'][i10])
ncc10.append(cc10['entity'][i10])

cc11=countries_pd[countries_pd['Country_name']==c11].to_dict()
ncc11=[]
ncc11.append(cc11['officer'][i11])
ncc11.append(cc11['intermediary'][i11])
ncc11.append(cc11['entity'][i11])

cc12=countries_pd[countries_pd['Country_name']==c12].to_dict()
ncc12=[]
ncc12.append(cc12['officer'][i12])
ncc12.append(cc12['intermediary'][i12])
ncc12.append(cc12['entity'][i12])

cc13=countries_pd[countries_pd['Country_name']==c13].to_dict()
ncc13=[]
ncc13.append(cc13['officer'][i13])
ncc13.append(cc13['intermediary'][i13])
ncc13.append(cc13['entity'][i13])

cc14=countries_pd[countries_pd['Country_name']==c14].to_dict()
ncc14=[]
ncc14.append(cc14['officer'][i14])
ncc14.append(cc14['intermediary'][i14])
ncc14.append(cc14['entity'][i14])

cc15=countries_pd[countries_pd['Country_name']==c15].to_dict()
ncc15=[]
ncc15.append(cc15['officer'][i15])
ncc15.append(cc15['intermediary'][i15])
ncc15.append(cc15['entity'][i15])

cc16=countries_pd[countries_pd['Country_name']==c16].to_dict()
ncc16=[]
ncc16.append(cc16['officer'][i16])
ncc16.append(cc16['intermediary'][i16])
ncc16.append(cc16['entity'][i16])

cc17=countries_pd[countries_pd['Country_name']==c17].to_dict()
ncc17=[]
ncc17.append(cc17['officer'][i17])
ncc17.append(cc17['intermediary'][i17])
ncc17.append(cc17['entity'][i17])

cc18=countries_pd[countries_pd['Country_name']==c18].to_dict()
ncc18=[]
ncc18.append(cc18['officer'][i18])
ncc18.append(cc18['intermediary'][i18])
ncc18.append(cc18['entity'][i18])

cc19=countries_pd[countries_pd['Country_name']==c19].to_dict()
ncc19=[]
ncc19.append(cc19['officer'][i19])
ncc19.append(cc19['intermediary'][i19])
ncc19.append(cc19['entity'][i19])

cc20=countries_pd[countries_pd['Country_name']==c20].to_dict()
ncc20=[]
ncc20.append(cc20['officer'][i20])
ncc20.append(cc20['intermediary'][i20])
ncc20.append(cc20['entity'][i20])

cc21=countries_pd[countries_pd['Country_name']==c21].to_dict()
ncc21=[]
ncc21.append(cc21['officer'][i21])
ncc21.append(cc21['intermediary'][i21])
ncc21.append(cc21['entity'][i21])

# cc22=countries_pd[countries_pd['Country_name']==c22].to_dict()
# ncc22=[]
# ncc22.append(cc22['officer'][i22])
# ncc22.append(cc22['intermediary'][i22])
# ncc22.append(cc22['entity'][i22])

cc23=countries_pd[countries_pd['Country_name']==c23].to_dict()
ncc23=[]
ncc23.append(cc23['officer'][i23])
ncc23.append(cc23['intermediary'][i23])
ncc23.append(cc23['entity'][i23])

cc24=countries_pd[countries_pd['Country_name']==c24].to_dict()
ncc24=[]
ncc24.append(cc24['officer'][i24])
ncc24.append(cc24['intermediary'][i24])
ncc24.append(cc24['entity'][i24])

cc25=countries_pd[countries_pd['Country_name']==c25].to_dict()
ncc25=[]
ncc25.append(cc25['officer'][i25])
ncc25.append(cc25['intermediary'][i25])
ncc25.append(cc25['entity'][i25])

cc26=countries_pd[countries_pd['Country_name']==c26].to_dict()
ncc26=[]
ncc26.append(cc26['officer'][i26])
ncc26.append(cc26['intermediary'][i26])
ncc26.append(cc26['entity'][i26])

cc27=countries_pd[countries_pd['Country_name']==c27].to_dict()
ncc27=[]
ncc27.append(cc27['officer'][i27])
ncc27.append(cc27['intermediary'][i27])
ncc27.append(cc27['entity'][i27])

cc28=countries_pd[countries_pd['Country_name']==c28].to_dict()
ncc28=[]
ncc28.append(cc28['officer'][i28])
ncc28.append(cc28['intermediary'][i28])
ncc28.append(cc28['entity'][i28])

cc29=countries_pd[countries_pd['Country_name']==c29].to_dict()
ncc29=[]
ncc29.append(cc29['officer'][i29])
ncc29.append(cc29['intermediary'][i29])
ncc29.append(cc29['entity'][i29])

cc30=countries_pd[countries_pd['Country_name']==c30].to_dict()
ncc30=[]
ncc30.append(cc30['officer'][i30])
ncc30.append(cc30['intermediary'][i30])
ncc30.append(cc30['entity'][i30])

cc31=countries_pd[countries_pd['Country_name']==c31].to_dict()
ncc31=[]
ncc31.append(cc31['officer'][i31])
ncc31.append(cc31['intermediary'][i31])
ncc31.append(cc31['entity'][i31])

cc32=countries_pd[countries_pd['Country_name']==c32].to_dict()
ncc32=[]
ncc32.append(cc32['officer'][i32])
ncc32.append(cc32['intermediary'][i32])
ncc32.append(cc32['entity'][i32])

cc33=countries_pd[countries_pd['Country_name']==c33].to_dict()
ncc33=[]
ncc33.append(cc33['officer'][i33])
ncc33.append(cc33['intermediary'][i33])
ncc33.append(cc33['entity'][i33])

cc34=countries_pd[countries_pd['Country_name']==c34].to_dict()
ncc34=[]
ncc34.append(cc34['officer'][i34])
ncc34.append(cc34['intermediary'][i34])
ncc34.append(cc34['entity'][i34])

cc35=countries_pd[countries_pd['Country_name']==c35].to_dict()
ncc35=[]
ncc35.append(cc35['officer'][i35])
ncc35.append(cc35['intermediary'][i35])
ncc35.append(cc35['entity'][i35])

cc36=countries_pd[countries_pd['Country_name']==c36].to_dict()
ncc36=[]
ncc36.append(cc36['officer'][i36])
ncc36.append(cc36['intermediary'][i36])
ncc36.append(cc36['entity'][i36])

cc37=countries_pd[countries_pd['Country_name']==c37].to_dict()
ncc37=[]
ncc37.append(cc37['officer'][i37])
ncc37.append(cc37['intermediary'][i37])
ncc37.append(cc37['entity'][i37])

cc38=countries_pd[countries_pd['Country_name']==c38].to_dict()
ncc38=[]
ncc38.append(cc38['officer'][i38])
ncc38.append(cc38['intermediary'][i38])
ncc38.append(cc38['entity'][i38])

cc39=countries_pd[countries_pd['Country_name']==c39].to_dict()
ncc39=[]
ncc39.append(cc39['officer'][i39])
ncc39.append(cc39['intermediary'][i39])
ncc39.append(cc39['entity'][i39])

cc40=countries_pd[countries_pd['Country_name']==c40].to_dict()
ncc40=[]
ncc40.append(cc40['officer'][i40])
ncc40.append(cc40['intermediary'][i40])
ncc40.append(cc40['entity'][i40])

cc41=countries_pd[countries_pd['Country_name']==c41].to_dict()
ncc41=[]
ncc41.append(cc41['officer'][i41])
ncc41.append(cc41['intermediary'][i41])
ncc41.append(cc41['entity'][i41])

cc42=countries_pd[countries_pd['Country_name']==c42].to_dict()
ncc42=[]
ncc42.append(cc42['officer'][i42])
ncc42.append(cc42['intermediary'][i42])
ncc42.append(cc42['entity'][i42])

# cc43=countries_pd[countries_pd['Country_name']==c43].to_dict()
# ncc43=[]
# ncc43.append(cc43['officer'][i42])
# ncc43.append(cc43['intermediary'][i42])
# ncc43.append(cc43['entity'][i42])



lisl=[]
for i,v in enumerate(negr):
    if i ==0:
                lisl.append([v,necy[i],neru[i],ncc4[i],ncc5[i],ncc7[i],ncc8[i],ncc9[i],ncc10[i],ncc11[i],ncc12[i],ncc13[i],ncc14[i],ncc15[i],ncc16[i],ncc17[i],ncc18[i],ncc19[i],ncc20[1],ncc21[i],ncc23[i],ncc24[i],ncc25[i],ncc26[i],ncc27[i],ncc28[i],ncc29[i],ncc30[i],ncc31[i],ncc32[i],ncc33[i],ncc34[i],ncc35[i],ncc36[i],ncc37[1],ncc38[i],ncc39[i],ncc40[i],ncc41[i],ncc42[i],ncc6[i]]) #ncc22[i], ,ncc43[i]

    elif i==1:
        lisl.append([v+negr[i-1],necy[i]+necy[i-1],neru[i]+neru[i-1],ncc4[i]+ncc4[i-1],ncc5[i]+ncc5[i-1],ncc7[i]+ncc7[i-1],ncc8[i]+ncc8[i-1],ncc9[i]+ncc9[i-1],ncc10[i]+ncc10[i-1],ncc11[i]+ncc11[i-1],ncc12[i]+ncc12[i-1],ncc13[i]+ncc13[i-1],ncc14[i]+ncc14[i-1],ncc15[i]+ncc15[i-1],ncc16[i]+ncc16[i-1],ncc17[i]+ncc17[i-1],ncc18[i]+ncc18[i-1],ncc19[i]+ncc19[i-1],ncc20[i]+ncc20[i-1],ncc21[i]+ncc21[i-1],ncc23[i]+ncc23[i-1],ncc24[i]+ncc24[i-1],ncc25[i]+ncc25[i-1],ncc26[i]+ncc26[i-1],ncc27[i]+ncc27[i-1],ncc28[i]+ncc28[i-1],ncc29[i]+ncc29[i-1],ncc30[i]+ncc30[i-1],ncc31[i]+ncc31[i-1],ncc32[i]+ncc32[i-1],ncc33[i]+ncc33[i-1],ncc34[i]+ncc34[i-1],ncc35[i]+ncc35[i-1],ncc36[i]+ncc36[i-1],ncc37[i]+ncc37[i-1],ncc38[i]+ncc38[i-1],ncc39[i]+ncc39[i-1],ncc40[i]+ncc40[i-1],ncc41[i]+ncc41[i-1],ncc42[i]+ncc42[i-1],ncc6[i]+ncc6[i-1]
                    ]) #,ncc43[i]+ncc43[i-1] ncc22[i]+ncc22[i-1],
    elif i==2:
        lisl.append([v+negr[i-1]+negr[i-2],necy[i]+necy[i-1]+necy[i-2],neru[i]+neru[i-1]+neru[i-2],ncc4[i]+ncc4[i-1]+ncc4[i-2],ncc5[i]+ncc5[i-1]+ncc5[i-2],ncc7[i]+ncc7[i-1]+ncc7[i-2],ncc8[i]+ncc8[i-1]+ncc8[i-2],ncc9[i]+ncc9[i-1]+ncc9[i-2],ncc10[i]+ncc10[i-1]+ncc10[i-2],ncc11[i]+ncc11[i-1]+ncc11[i-2],ncc12[i]+ncc12[i-1]+ncc12[i-2],ncc13[i]+ncc13[i-1]+ncc13[i-2],ncc14[i]+ncc14[i-1]+ncc14[i-2],ncc15[i]+ncc15[i-1]+ncc15[i-2],ncc16[i]+ncc16[i-1]+ncc16[i-2],ncc17[i]+ncc17[i-1]+ncc17[i-2],ncc18[i]+ncc18[i-1]+ncc18[i-2],ncc19[i]+ncc19[i-1]+ncc19[i-2],ncc20[i]+ncc20[i-1]+ncc20[i-2],ncc21[i]+ncc21[i-1]+ncc21[i-2],ncc23[i]+ncc23[i-1]+ncc23[i-2],ncc24[i]+ncc24[i-1]+ncc24[i-2],ncc25[i]+ncc25[i-1]+ncc25[i-2],ncc26[i]+ncc26[i-1]+ncc26[i-2],ncc27[i]+ncc27[i-1]+ncc27[i-2],ncc28[i]+ncc28[i-1]+ncc28[i-2],ncc29[i]+ncc29[i-1]+ncc29[i-2],ncc30[i]+ncc30[i-1]+ncc30[i-2],ncc31[i]+ncc31[i-1]+ncc31[i-2],ncc32[i]+ncc32[i-1]+ncc32[i-2],ncc33[i]+ncc33[i-1]+ncc33[i-2],ncc34[i]+ncc34[i-1]+ncc34[i-2],ncc35[i]+ncc35[i-1]+ncc35[i-2],ncc36[i]+ncc36[i-1]+ncc36[i-2],ncc37[i]+ncc37[i-1]+ncc37[i-2],ncc38[i]+ncc38[i-1]+ncc38[i-2],ncc39[i]+ncc39[i-1]+ncc39[i-2],ncc40[i]+ncc40[i-1]+ncc40[i-2],ncc41[i]+ncc41[i-1]+ncc41[i-2],ncc42[i]+ncc42[i-1]+ncc42[i-2],ncc6[i]+ncc6[i-1]+ncc6[i-2]]) #,ncc43[i]+ncc43[i-1]+ncc43[i-2] ,ncc22[i]+ncc22[i-1]+ncc22[i-2]
# #     print lisl,i
# beaker.tot=lisl                             
# # beaker.negr=negr
# # beaker.necy=necy
# # beaker.neru=neru
# beaker.base=[0,lisl[0],lisl[1]]
# beaker.countries=[c1,c2,c3,c4,c5,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c23,c24,c25,c26,c27, c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40,c41,c42,c6] #c22,,c43
In [9]:
Image(filename='figs/coie.png')
Out[9]:

3. THE TWO-MODE NETWORK OF GREEK-CYPRIOT-RUSSIAN OFFICERS & ENTITIES

In [10]:
name1 = c1 #Greece
cc1=countries_dict[name1]

c2='Cyprus'
name2=c2
cc2=countries_dict[name2]
# c2=''
# cc2={}
# name2=c2

c3='Russia'
name3=c3
cc3=countries_dict[name3]

names = ", ".join([c1,c2,c3])#'Russian Federarion, Greece and Cyprus'
# names

def find_nodes_countries(cc,cc_dict):
    cnodes_dict=[]    
    for k,v in cc_dict['country_codes'].items():
        if isinstance(v,float):
            continue
        vv=v.split(';')
        for ik,vk in enumerate(vv):
            if vk ==cc:
                cnodes_dict.append(cc_dict['node_id'][k])
    return cnodes_dict

ccnodes1=find_nodes_countries(cc1,cc_dict)
print 'Total number of nodes from %s: %i' %(name1,len(ccnodes1)) #,cc1
ccnodes2=find_nodes_countries(cc2,cc_dict)
print 'Total number of nodes from %s: %i' %(name2,len(ccnodes2)) #,cc1
# print 'Total number of nodes from', name2, ':', len(ccnodes2) #,cc2
ccnodes3=find_nodes_countries(cc3,cc_dict)
print 'Total number of nodes from %s: %i' %(name3,len(ccnodes3))
# print 'Total number of nodes from %s: %i' %(name1,len(ccnodes1)+len(ccnodes2)+len(ccnodes3))
# print 'Total number of nodes from all countries (%s, %s, %s): %i' %(name1,name2,name3, 
#                                                                          len(ccnodes1)+len(ccnodes2)+len(ccnodes3))

nodes_rem=[]
for k,v in alBear['status'].items():
    nodes_rem.append(k)
# fildirg='/home/sergios-len/Dropbox/Python Projects (1)/PPs'
# fildirg='/home/mosesboudourides/Dropbox/Python Projects/PPs'
# F1=nx.read_graphml(os.path.join(fildirg, 'graphs/F1.graphml')) 
F1=nx.read_graphml('graphs/F1.graphml') 
    
union_nodes=list(set(ccnodes1).union(set(ccnodes2)).union(set(ccnodes3)))#
union_nodes=[str(i) for i in union_nodes]
# ccnodes1
# print len(union_nodes)
F=F1
graph = nx.subgraph(F, union_nodes)
graph.remove_nodes_from(nx.isolates(graph))

offic = list(Officers['node_id'].unique())
inter = list(Intermediaries['node_id'].unique())
enti = list(Entities['node_id'].unique())
addr = list(Addresses['node_id'].unique())


# print 'Total number of nodes in the (%s,%s,%s) graph: %i' %(name1,name2,name3,len(graph.nodes()))

labels={}
groups={}
noddd={}
deg=nx.degree(graph)
ngroups={}
cgroups={}
for i,nd in enumerate(graph.nodes()):
    noddd[nd]=i
    nd=int(nd) 
    if nd in ccnodes1:
        groups[i]=1
    elif nd in ccnodes2:
        groups[i]=2
    elif nd in ccnodes3:
        groups[i]=3

    if nd in offic:
        labels[i]=Officers.loc[Officers['node_id'] == nd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=1
    elif nd in inter:
        labels[i]= Intermediaries.loc[Intermediaries['node_id'] == nd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=4
    elif nd in enti:
        labels[i]= Entities.loc[Entities['node_id'] == nd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=5

    elif nd in addr:
        labels[i]= Addresses.loc[Addresses['node_id'] == nd]['address'].tolist()[0]#.capitalize()
        ngroups[i]=2
        
colorr={} 
for k,v in ngroups.items():
    if v ==1:      # Officers
        if groups[k]==1:   #Greece
            colorr[k]=(204,204,255)
        elif groups[k]==2:  # Cyprus
            colorr[k]=(204,255,204)
        elif groups[k]==3:  #Russia
            colorr[k]= (255,204,204)
    elif v==5:
        if groups[k]==1:   #Greece
            colorr[k]=(0,0,255)
        elif groups[k]==2:  # Cyprus
            colorr[k]=(0,255,0)
        elif groups[k]==3:  #Russsia
            colorr[k]= (255,0,0)
    else:
        colorr[k]= (255,255,255)
lali=[]
grouli=[]
cols=[]
vals=[]
for  v in graph.nodes():
    lali.append(labels[noddd[v]])
    grouli.append(groups[noddd[v]])
    cols.append(colorr[noddd[v]])
    vals.append(deg[v])    
edges=[]
for edd in graph.edges():
    if 'weight' in graph[edd[0]][edd[1]]:
        wei=graph[edd[0]][edd[1]]['weight']
    else:
        wei=1
    edges.append([noddd[edd[0]],noddd[edd[1]],wei])   
print 'Total number of nonisolated nodes in the graph of %s, %s and %s: %i' %(name1,name2,name3,len(cols))
print 'Total number of edges in the graph of %s, %s and %s: %i' %(name1,name2,name3,len(edges))
# print 'Total number of edges in the (%s,%s,%s) graph: %i' %(name1,name2,name3,len(edges))
# 'Number of nodes: %i Number of edges: %i' %(len(cols),len(edges))
Total number of nodes from Greece: 1013
Total number of nodes from Cyprus: 12620
Total number of nodes from Russia: 23475
Total number of nonisolated nodes in the graph of Greece, Cyprus and Russia: 7817
Total number of edges in the graph of Greece, Cyprus and Russia: 5869
In [25]:
lgn = Lightning(ipython=True, host='http://public.lightning-viz.org',size='full') # vis at server
# lgn = Lightning(ipython=True,local=True,size='large') # local vis
vis=lgn.force(conn=edges, values=None, labels=lali, color=cols, group=None, colormap=None, size=3, tooltips=True,
              width=1200, brush=True,zoom=True, height=None,
              description=r'''## **The Panama Papers Network of %s**''' %names)
vis.open() # vis at server
# vis # local vis
Lightning initialized
Connected to server at http://public.lightning-viz.org
In [28]:
from IPython.display import IFrame
IFrame('http://public.lightning-viz.org/visualizations/a634166d-2bbe-4133-ac78-8a19a0ee75f4/public/', width=1000, height=1000)
Out[28]:
In [12]:
def get_nat(ed,c_d):
    natt=None
    for nat in c_d:
#         print nat,c_d[nat]
        if int(ed) in c_d[nat]:
            natt=nat
    return natt
            
def count_edges_nat(c_d,edges):
    nat_edgs=Counter()
    for ed in edges:
        edg=get_nat(ed[0],c_d)
        deg=get_nat(ed[1],c_d)
        edd=sorted((edg,deg))
        edde='%s , %s' %(edd[0],edd[1])
#         print ed,edg,deg,sorted(edg,deg)
        nat_edgs[edde]+=1
    return nat_edgs
c_d={c1:ccnodes1,c2:ccnodes2,c3:ccnodes3}
edges_nationalities=count_edges_nat(c_d,graph.edges())
for nat,nat_value in edges_nationalities.items():
    natt=nat.split(' ,')
    print 'There are %i edges between %s and %s' %(nat_value,natt[0],natt[1])

een=edges_nationalities.values()
r1 = [2*een[4],een[-1],een[1]]
r2 = [een[-1],2*een[3],een[2]]
r3 = [een[1],een[2],2*een[0]]
m=[r1,r2,r3]
import numpy as np
M=np.array(m)
if M.sum() != 1.0:
    M=M/float(M.sum())
M=np.asmatrix(M)
s=(M*M).sum()
t=M.trace()
R=t-s
r=R/(1-s)
ac = float(r)
print 'The Attribute Assortativity Coefficient of the graph of %s, %s and %s is %.4f' %(name1,name2,name3,ac)
There are 2832 edges between Russia and  Russia
There are 6 edges between Greece and  Russia
There are 863 edges between Cyprus and  Russia
There are 2073 edges between Cyprus and  Cyprus
There are 27 edges between Greece and  Greece
There are 68 edges between Cyprus and  Greece
The Attribute Assortativity Coefficient of the graph of Greece, Cyprus and Russia is 0.6826

4. THE PROJECTED NETWORK OF GREEK-CYPRIOT-RUSSIAN OFFICERS

In [36]:
import itertools as it
addr=[str(i) for i in addr]
offic=[str(i) for i in offic]
enti=[str(i) for i in enti]
# nodes_no_addr_ent=set(union_nodes)-(set(addr).union(set(offic)))
nodes_no_addr_ent=set(union_nodes)-(set(addr).union(set(enti)))
# print len(union_nodes)
# print len(nodes_no_addr_ent)
pgraph = nx.subgraph(F, nodes_no_addr_ent)
# print len(enti),len(addr)
entil=set(enti).intersection(set(union_nodes))
addrl=set(addr).intersection(set(union_nodes))
# print len(entil),len(addrl)
ll=[enti] 
for ae in ll:
    for nd in ae:
        if nd in graph:
            nnei=nx.all_neighbors(graph,nd)
            nei=list(set(nodes_no_addr_ent).intersection(set(nnei)))
            for ii in it.combinations(nei,2):
                ed=ii[0]
                de=ii[1]
                if pgraph.has_edge(ed,de):
                    if 'weight' in pgraph[ed][de]:
                        wei=pgraph[ed][de]['weight']+1
                    else:
                        wei=1
                else:
                    wei=1
                pgraph.add_edge(ed,de,weight=wei)
pgraph.remove_nodes_from(nx.isolates(pgraph))
# print 'The projected network has', len(pgraph.nodes()), 'and', len(pgraph.edges()), 'edges'
print 'Total number of nonisolated nodes in the graph of %s, %s and %s: %i' %(name1,name2,name3,len(pgraph.nodes()))
print 'Total number of edges in the graph of %s, %s and %s: %i' %(name1,name2,name3,len(pgraph.edges()))

labels={}
groups={}
noddd={}
deg=nx.degree(pgraph)
ngroups={}
for i,nd in enumerate(pgraph.nodes()):
    noddd[nd]=i
    ndd=int(nd)
    if ndd in ccnodes1:
        groups[i]=1
    elif ndd in ccnodes2:
        groups[i]=2
    elif ndd in ccnodes3:
        groups[i]=3
    if nd in offic:
        labels[i]=Officers.loc[Officers['node_id'] == ndd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=1
    elif nd in inter:
        labels[i]= Intermediaries.loc[Intermediaries['node_id'] == ndd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=4
    elif nd in enti:
        labels[i]= Entities.loc[Entities['node_id'] == ndd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=5
    elif nd in addr:
        labels[i]= Addresses.loc[Addresses['node_id'] == ndd]['address'].tolist()[0]#.capitalize()
        ngroups[i]=2
# print groups

for k,v in ngroups.items():
    if v ==1:      # Officers
        if groups[k]==1:   #Greek
            colorr[k]=(204,204,255)
#             colorr[k]= (255,204,204)
        elif groups[k]==2:  # Cypr
            colorr[k]=(204,255,204)
        elif groups[k]==3:  #Rus
            colorr[k]= (255,204,204)
#             colorr[k]=(204,204,255)
    elif v==5:
        if groups[k]==1:   #Greek
            colorr[k]=(0,0,255)
#             colorr[k]= (255,0,0)
        elif groups[k]==2:  # Cypr
            colorr[k]=(0,255,0)
        elif groups[k]==3:  #Rus
            colorr[k]= (255,0,0)
    else:
        colorr[k]= (255,255,255)
plali=[]
pgrouli=[]
pcols=[]
pvals=[]
for v in pgraph.nodes():
    plali.append(labels[noddd[v]])
    pgrouli.append(groups[noddd[v]])
    pcols.append(colorr[noddd[v]])
    pvals.append(deg[v])   
pedges=[]
for edd in pgraph.edges():
    if 'weight' in pgraph[edd[0]][edd[1]]:
        wei=4*pgraph[edd[0]][edd[1]]['weight']
    else:
        wei=4
    pedges.append([noddd[edd[0]],noddd[edd[1]],wei])   
# print 'Number of nodes: %i Number of edges: %i' %(len(cols),len(edges))

ssssi=set()
for edd in pgraph.edges():
    if 'weight' in pgraph[edd[0]][edd[1]]:
        ssssi.add(pgraph[edd[0]][edd[1]]['weight'])
# print ssssi
Total number of nonisolated nodes in the graph of Greece, Cyprus and Russia: 3145
Total number of edges in the graph of Greece, Cyprus and Russia: 4821
In [37]:
lgn = Lightning(ipython=True, host='http://public.lightning-viz.org',size='full') # vis at server
# lgn = Lightning(ipython=True,local=True,size='large') #local
vis=lgn.force(conn=pedges, values=None, labels=plali, color=pcols, group=None, colormap=None, size=3, tooltips=True,
              width=1200, brush=True,zoom=True, height=800,
              description=r'''## **The Projected Panama Papers Network of %s**''' %names)
vis.open() # vis at server
# vis ## local
Lightning initialized
Connected to server at http://public.lightning-viz.org
In [38]:
from IPython.display import IFrame
IFrame('http://public.lightning-viz.org/visualizations/8a06add8-f937-47ea-beb2-3b0649262219/public/', width=1000, height=1000)
Out[38]:
In [15]:
def get_nat(ed,c_d):
    natt=None
    for nat in c_d:
#         print nat,c_d[nat]
        if int(ed) in c_d[nat]:
            natt=nat
    return natt
            
def count_edges_nat(c_d,edges):
    nat_edgs=Counter()
    for ed in edges:
        edg=get_nat(ed[0],c_d)
        deg=get_nat(ed[1],c_d)
        edd=sorted((edg,deg))
        edde='%s , %s' %(edd[0],edd[1])
#         print ed,edg,deg,sorted(edg,deg)
        nat_edgs[edde]+=1
    return nat_edgs
c_d={c1:ccnodes1,c2:ccnodes2,c3:ccnodes3}
edges_nationalities=count_edges_nat(c_d,pgraph.edges())
for nat,nat_value in edges_nationalities.items():
    natt=nat.split(' ,')
    print 'There are %i edges between %s and %s' %(nat_value,natt[0],natt[1])
    
een=edges_nationalities.values()
r1 = [2*een[4],een[-1],een[1]]
r2 = [een[-1],2*een[3],een[2]]
r3 = [een[1],een[2],2*een[0]]
m=[r1,r2,r3]
import numpy as np
M=np.array(m)
if M.sum() != 1.0:
    M=M/float(M.sum())
M=np.asmatrix(M)
s=(M*M).sum()
t=M.trace()
R=t-s
r=R/(1-s)
ac = float(r)
print 'The Attribute Assortativity Coefficient of the graph of %s, %s and %s is %.4f' %(name1,name2,name3,ac)
There are 3148 edges between Russia and  Russia
There are 3 edges between Greece and  Russia
There are 537 edges between Cyprus and  Russia
There are 1040 edges between Cyprus and  Cyprus
There are 74 edges between Greece and  Greece
There are 19 edges between Cyprus and  Greece
The Attribute Assortativity Coefficient of the graph of Greece, Cyprus and Russia is 0.7254
In [16]:
def create_centralities_list(G,maxiter=2000,pphi=5,centList=[]):
    if len(centList)==0:
        centList=['degree_centrality','closeness_centrality','betweenness_centrality',
    'eigenvector_centrality','katz_centrality','page_rank']
    cenLen=len(centList)
    valus={}
    # plt.figure(figsize=figsi)
    for uu,centr in enumerate(centList):
        if centr=='degree_centrality':
            cent=nx.degree_centrality(G)
            sstt='Degree Centralities'
            ssttt='degree centrality'
            valus[centr]=cent
        elif centr=='closeness_centrality':
            cent=nx.closeness_centrality(G)
            sstt='Closeness Centralities'
            ssttt='closeness centrality'
            valus[centr]=cent
        elif centr=='betweenness_centrality':
            cent=nx.betweenness_centrality(G)
            sstt='Betweenness Centralities'
            ssttt='betweenness centrality'
            valus[centr]=cent
        elif centr=='eigenvector_centrality':
            try:
                cent=nx.eigenvector_centrality(G,max_iter=maxiter)
                sstt='Eigenvector Centralities'
                ssttt='eigenvector centrality'
                valus[centr]=cent
            except:
                valus[centr]=None
                continue
        elif centr=='katz_centrality':
            phi = (1+math.sqrt(pphi))/2.0 # largest eigenvalue of adj matrix
            cent=nx.katz_centrality_numpy(G,1/phi-0.01)
            sstt='Katz Centralities'
            ssttt='Katz centrality'
            valus[centr]=cent
        elif centr=='page_rank':
            try:
                cent=nx.pagerank(G)
                sstt='PageRank'
                ssttt='pagerank'
                valus[centr]=cent
            except:
                valus[centr]=None
                continue
        print '%s done!!!' %sstt
    return valus
centList=['degree_centrality','closeness_centrality','betweenness_centrality',
    'eigenvector_centrality','katz_centrality','page_rank']
centrali=create_centralities_list(pgraph)

dfco=pd.DataFrame()
u=0
for k in centList:
    try:
        v=centrali[k].values()
    except:
        v=None
    dfco.insert(u,k,v)
    u+=1
dfco.insert(0,'Nodes',centrali[centrali.keys()[0]].keys())
Degree Centralities done!!!
Closeness Centralities done!!!
Betweenness Centralities done!!!
Eigenvector Centralities done!!!
Katz Centralities done!!!
PageRank done!!!
In [17]:
dfco
Out[17]:
Nodes degree_centrality closeness_centrality betweenness_centrality eigenvector_centrality katz_centrality page_rank
0 12144174 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000208
1 13010743 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000127
2 22466 0.001272 0.004049 0.000000e+00 9.961874e-07 0.000433 0.000318
3 12137501 0.000954 0.001272 0.000000e+00 1.219158e-48 0.001445 0.000318
4 12137500 0.000954 0.000954 0.000000e+00 6.904356e-55 0.002511 0.000239
5 12137502 0.000636 0.001566 0.000000e+00 1.993751e-43 0.002074 0.000318
6 13007675 0.000954 0.000636 0.000000e+00 1.082830e-63 0.009577 0.000318
7 12211045 0.000636 0.000954 0.000000e+00 1.082830e-63 0.002128 0.000318
8 12154121 0.001272 0.000636 0.000000e+00 9.617461e-79 0.009577 0.000318
9 23447 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
10 12140152 0.000954 0.000318 0.000000e+00 6.904356e-55 -0.005279 0.000318
11 12140153 0.000318 0.000954 0.000000e+00 9.617461e-79 0.002511 0.000318
12 12171888 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
13 12161023 0.000636 0.000318 0.000000e+00 1.082830e-63 -0.005279 0.000318
14 40998 0.000318 0.000636 0.000000e+00 9.617461e-79 0.009577 0.000318
15 24717 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
16 24716 0.003817 0.000318 0.000000e+00 8.752310e-25 -0.005279 0.000318
17 12137611 0.000318 0.003817 0.000000e+00 9.617461e-79 0.000329 0.000318
18 12137613 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
19 25063 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
20 12118791 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
21 12118790 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
22 12170051 0.000636 0.000318 0.000000e+00 1.082830e-63 -0.005279 0.000318
23 12099861 0.002545 0.000636 0.000000e+00 1.372650e-33 0.009577 0.000318
24 13006617 0.000318 0.002545 0.000000e+00 9.617461e-79 0.000535 0.000318
25 12126479 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000609
26 12197067 0.000954 0.000318 0.000000e+00 6.904356e-55 -0.005279 0.000236
27 13001933 0.000318 0.000954 0.000000e+00 3.227084e-71 0.002511 0.000318
28 23848 0.000636 0.000424 0.000000e+00 1.082830e-63 -0.012769 0.000318
29 21461 0.000636 0.000318 0.000000e+00 1.082830e-63 -0.005279 0.000318
... ... ... ... ... ... ... ...
3115 12172377 0.000954 0.000318 0.000000e+00 1.082830e-63 -0.005279 0.000318
3116 28527 0.000318 0.000954 0.000000e+00 9.617461e-79 0.002511 0.000318
3117 29172 0.000636 0.000674 0.000000e+00 1.082830e-63 -0.005033 0.000318
3118 12024899 0.000318 0.000636 0.000000e+00 1.082830e-63 0.009577 0.000318
3119 21380 0.000318 0.000636 0.000000e+00 9.617461e-79 0.009577 0.000318
3120 12200537 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3121 12200538 0.000318 0.000318 0.000000e+00 5.037595e-41 -0.005279 0.000205
3122 12207968 0.000954 0.000318 0.000000e+00 4.063667e-51 -0.005279 0.000374
3123 23199 0.000636 0.001417 0.000000e+00 4.063667e-51 0.013199 0.000374
3124 23196 0.000636 0.000636 0.000000e+00 8.122939e-26 0.002128 0.000318
3125 23197 0.001272 0.000636 0.000000e+00 6.904356e-55 0.002128 0.000318
3126 40408 0.000954 0.003348 0.000000e+00 6.904356e-55 0.061282 0.000318
3127 12172583 0.000954 0.000954 0.000000e+00 9.617461e-79 0.002511 0.000318
3128 12123225 0.000318 0.000954 0.000000e+00 9.617461e-79 0.002511 0.000318
3129 12212837 0.000318 0.000318 0.000000e+00 8.148765e-67 -0.005279 0.000645
3130 12212836 0.000954 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3131 24403 0.000318 0.000954 6.071896e-07 5.586321e-03 0.053554 0.000318
3132 12123223 0.000318 0.000318 0.000000e+00 2.211266e-58 -0.005279 0.000271
3133 12213836 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3134 12212838 0.001590 0.001590 0.000000e+00 8.542021e-44 0.001014 0.000318
3135 27751 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000239
3136 12128840 0.002545 0.001431 0.000000e+00 7.294223e-48 0.004446 0.000526
3137 12024328 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3138 12137720 0.003817 0.003840 0.000000e+00 9.815694e-23 0.000169 0.000318
3139 12137723 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3140 12196642 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3141 13004991 0.000318 0.000318 0.000000e+00 9.617461e-79 -0.005279 0.000318
3142 25930 0.005407 0.007328 3.500834e-05 9.617461e-79 0.000213 0.001245
3143 20056 0.000636 0.004661 0.000000e+00 8.542021e-44 -0.000538 0.000237
3144 25933 0.004771 0.007185 2.876103e-05 5.101517e-03 -0.000760 0.001047

3145 rows × 7 columns

In [18]:
import warnings
warnings.filterwarnings("ignore")
ntei='Scatter Matrix Plot of Centralities of the Projected Network of Officers from %s, %s and %s' %(name1,name2,name3) #+ names
f, ax = plt.subplots(figsize=(20,20))
sss=scatter_matrix(dfco[centList], alpha=0.9, color='black', diagonal='hist',ax=ax)
plt.suptitle(ntei,fontsize=18,fontweight='bold')
corr = dfco.corr().as_matrix()
for i, j in zip(*plt.np.triu_indices_from(sss, k=1)):
    sss[i, j].annotate("pearson = %.3f" %corr[i,j], (0.8, 0.93), xycoords='axes fraction', ha='center', va='center')

5. THE PROJECTED NETWORK OF GREEK-CYPRIOT-RUSSIAN ENTITIES

In [19]:
import itertools as it
addr=[str(i) for i in addr]
offic=[str(i) for i in offic]
enti=[str(i) for i in enti]
# nodes_no_addr_ent=set(union_nodes)-(set(addr).union(set(offic)))
nodes_no_addr_ent=set(union_nodes)-(set(addr).union(set(offic)))
# print len(union_nodes)
# print len(nodes_no_addr_ent)
pgraph = nx.subgraph(F, nodes_no_addr_ent)
# print len(enti),len(addr)
entil=set(enti).intersection(set(union_nodes))
addrl=set(addr).intersection(set(union_nodes))
# print len(entil),len(addrl)
ll=[offic] 
for ae in ll:
    for nd in ae:
        if nd in graph:
            nnei=nx.all_neighbors(graph,nd)
            nei=list(set(nodes_no_addr_ent).intersection(set(nnei)))
            for ii in it.combinations(nei,2):
                ed=ii[0]
                de=ii[1]
                if pgraph.has_edge(ed,de):
                    if 'weight' in pgraph[ed][de]:
                        wei=pgraph[ed][de]['weight']+1
                    else:
                        wei=1
                else:
                    wei=1
                pgraph.add_edge(ed,de,weight=wei)
pgraph.remove_nodes_from(nx.isolates(pgraph))
# print 'The projected network has', len(pgraph.nodes()), 'and', len(pgraph.edges()), 'edges'
print 'Total number of nonisolated nodes in the graph of %s, %s and %s: %i' %(name1,name2,name3,len(pgraph.nodes()))
print 'Total number of edges in the graph of %s, %s and %s: %i' %(name1,name2,name3,len(pgraph.edges()))

labels={}
groups={}
noddd={}
deg=nx.degree(pgraph)
ngroups={}
for i,nd in enumerate(pgraph.nodes()):
    noddd[nd]=i
    ndd=int(nd)
    if ndd in ccnodes1:
        groups[i]=1
    elif ndd in ccnodes2:
        groups[i]=2
    elif ndd in ccnodes3:
        groups[i]=3
    if nd in offic:
        labels[i]=Officers.loc[Officers['node_id'] == ndd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=1
    elif nd in inter:
        labels[i]= Intermediaries.loc[Intermediaries['node_id'] == ndd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=4
    elif nd in enti:
        labels[i]= Entities.loc[Entities['node_id'] == ndd]['name'].tolist()[0]#.capitalize()
        ngroups[i]=5
    elif nd in addr:
        labels[i]= Addresses.loc[Addresses['node_id'] == ndd]['address'].tolist()[0]#.capitalize()
        ngroups[i]=2
# print groups

for k,v in ngroups.items():
    if v ==1:      # Officers
        if groups[k]==1:   #Greek
            colorr[k]=(204,204,255)
#             colorr[k]= (255,204,204)
        elif groups[k]==2:  # Cypr
            colorr[k]=(204,255,204)
        elif groups[k]==3:  #Rus
            colorr[k]= (255,204,204)
#             colorr[k]=(204,204,255)
    elif v==5:
        if groups[k]==1:   #Greek
            colorr[k]=(0,0,255)
#             colorr[k]= (255,0,0)
        elif groups[k]==2:  # Cypr
            colorr[k]=(0,255,0)
        elif groups[k]==3:  #Rus
            colorr[k]= (255,0,0)
    else:
        colorr[k]= (255,255,255)
plali=[]
pgrouli=[]
pcols=[]
pvals=[]
for v in pgraph.nodes():
    plali.append(labels[noddd[v]])
    pgrouli.append(groups[noddd[v]])
    pcols.append(colorr[noddd[v]])
    pvals.append(deg[v])   
pedges=[]
for edd in pgraph.edges():
    if 'weight' in pgraph[edd[0]][edd[1]]:
        wei=4*pgraph[edd[0]][edd[1]]['weight']
    else:
        wei=4
    pedges.append([noddd[edd[0]],noddd[edd[1]],wei])   
# print 'Number of nodes: %i Number of edges: %i' %(len(cols),len(edges))

ssssi=set()
for edd in pgraph.edges():
    if 'weight' in pgraph[edd[0]][edd[1]]:
        ssssi.add(pgraph[edd[0]][edd[1]]['weight'])
# print ssssi
Total number of nonisolated nodes in the graph of Greece, Cyprus and Russia: 1526
Total number of edges in the graph of Greece, Cyprus and Russia: 4851
In [31]:
lgn = Lightning(ipython=True, host='http://public.lightning-viz.org',size='full') # vis at server
# lgn = Lightning(ipython=True,local=True,size='large') #local
vis=lgn.force(conn=pedges, values=None, labels=plali, color=pcols, group=None, colormap=None, size=3, tooltips=True,
              width=1200, brush=True,zoom=True, height=800,
              description=r'''## **The Projected Panama Papers Network of %s**''' %names)
vis.open() # vis at server
# vis ## local
Lightning initialized
Connected to server at http://public.lightning-viz.org
In [32]:
from IPython.display import IFrame
IFrame('http://public.lightning-viz.org/visualizations/212c22cf-3f51-4492-a976-a4f5e96f949f/public/', width=1000, height=1000)
Out[32]:
In [21]:
def get_nat(ed,c_d):
    natt=None
    for nat in c_d:
#         print nat,c_d[nat]
        if int(ed) in c_d[nat]:
            natt=nat
    return natt
            
def count_edges_nat(c_d,edges):
    nat_edgs=Counter()
    for ed in edges:
        edg=get_nat(ed[0],c_d)
        deg=get_nat(ed[1],c_d)
        edd=sorted((edg,deg))
        edde='%s , %s' %(edd[0],edd[1])
#         print ed,edg,deg,sorted(edg,deg)
        nat_edgs[edde]+=1
    return nat_edgs
c_d={c1:ccnodes1,c2:ccnodes2,c3:ccnodes3}
edges_nationalities=count_edges_nat(c_d,pgraph.edges())
for nat,nat_value in edges_nationalities.items():
    natt=nat.split(' ,')
    print 'There are %i edges between %s and %s' %(nat_value,natt[0],natt[1])
    
een=edges_nationalities.values()
# print een
# print aaaa
r1 = [2*een[1],0,0]
r2 = [0,2*een[-1],een[2]]
r3 = [0,een[2],2*een[0]]
# r1 = [2*een[4],een[-1],een[1]]
# r2 = [een[-1],2*een[3],een[2]]
# r3 = [een[1],een[2],2*een[0]]
m=[r1,r2,r3]
import numpy as np
M=np.array(m)
if M.sum() != 1.0:
    M=M/float(M.sum())
M=np.asmatrix(M)
s=(M*M).sum()
t=M.trace()
R=t-s
r=R/(1-s)
ac = float(r)
print 'The Attribute Assortativity Coefficient of the graph of %s, %s and %s is %.4f' %(name1,name2,name3,ac)
There are 1350 edges between Russia and  Russia
There are 1 edges between Greece and  Greece
There are 325 edges between Cyprus and  Russia
There are 3175 edges between Cyprus and  Cyprus
The Attribute Assortativity Coefficient of the graph of Greece, Cyprus and Russia is 0.8440
In [22]:
centList=['degree_centrality','closeness_centrality','betweenness_centrality',
    'eigenvector_centrality','katz_centrality','page_rank']
centrali=create_centralities_list(pgraph)
# centrali=create_centralities_list(graph_no_addr_ent)
dfce=pd.DataFrame()
u=0
for k in centList:
    try:
        v=centrali[k].values()
    except:
        v=None
    dfce.insert(u,k,v)
    u+=1
dfce.insert(0,'Nodes',centrali[centrali.keys()[0]].keys())
Degree Centralities done!!!
Closeness Centralities done!!!
Betweenness Centralities done!!!
Eigenvector Centralities done!!!
Katz Centralities done!!!
PageRank done!!!
In [23]:
dfce
Out[23]:
Nodes degree_centrality closeness_centrality betweenness_centrality eigenvector_centrality katz_centrality page_rank
0 228055 0.000656 0.012502 0.000000 6.376576e-05 -0.003156 0.000216
1 228051 0.001311 0.002049 0.000000 1.850154e-36 0.001355 0.000482
2 220090 0.004590 0.004663 0.000000 5.789683e-31 -0.000564 0.000658
3 230731 0.003934 0.003934 0.000000 3.413357e-21 -0.000529 0.000655
4 230737 0.004590 0.004590 0.000000 7.574343e-03 -0.000431 0.000655
5 214781 0.001311 0.001311 0.000000 1.378472e-44 -0.006489 0.000655
6 201729 0.002623 0.002623 0.000000 1.410853e-23 -0.000979 0.000655
7 10185432 0.001967 0.001967 0.000000 3.807184e-20 -0.001701 0.000655
8 214894 0.009836 0.010710 0.000000 9.058267e-22 0.003535 0.000691
9 10122014 0.001311 0.001311 0.000000 1.378472e-44 -0.006489 0.000655
10 10147320 0.000656 0.000874 0.000000 1.850154e-36 0.008652 0.000516
11 202299 0.005246 0.002951 0.000000 8.192539e-32 -0.000823 0.000623
12 202351 0.000656 0.021486 0.000000 2.327828e-31 0.000837 0.000626
13 10126531 0.000656 0.000656 0.000000 1.632768e-17 0.003577 0.000655
14 205286 0.001311 0.001311 0.000000 6.697842e-04 -0.006489 0.000655
15 205289 0.003934 0.003934 0.000000 1.378472e-44 -0.000529 0.000655
16 10134051 0.000656 0.000656 0.000000 1.850154e-36 0.003577 0.000655
17 10139568 0.000656 0.000656 0.000000 2.483235e-28 0.003577 0.000655
18 230930 0.011148 0.011148 0.000000 1.378472e-44 -0.000150 0.000655
19 227521 0.001311 0.001311 0.000000 1.305222e-11 -0.006489 0.000655
20 10097787 0.000656 0.000656 0.000000 1.850154e-36 0.003577 0.000655
21 215419 0.001967 0.001967 0.000002 1.129245e-40 -0.001391 0.000799
22 231217 0.015082 0.020175 0.000000 1.850154e-36 -0.000317 0.000667
23 205739 0.015082 0.016066 0.000004 2.483235e-28 -0.001004 0.000855
24 191469 0.019016 0.025268 0.000309 1.378472e-44 -0.000232 0.001002
25 231265 0.012459 0.022357 0.000088 2.483235e-28 0.000801 0.000621
26 227234 0.000656 0.000656 0.000000 1.796697e-02 0.003577 0.000655
27 201548 0.015082 0.020175 0.000000 1.850154e-36 -0.000317 0.000667
28 227237 0.009836 0.015270 0.000275 1.796697e-02 -0.001124 0.000950
29 227014 0.001311 0.001311 0.000000 1.378472e-44 -0.006489 0.000655
... ... ... ... ... ... ... ...
1496 202158 0.000656 0.001475 0.000000 1.309512e-30 -0.004579 0.000728
1497 10149174 0.000656 0.001311 0.000000 1.037811e-30 -0.002360 0.000749
1498 222323 0.001311 0.001475 0.000000 1.406800e-19 -0.002575 0.000655
1499 232655 0.001311 0.011148 0.000000 1.378472e-44 -0.000150 0.000472
1500 226480 0.001311 0.006947 0.000000 4.003143e-23 0.000227 0.000655
1501 232653 0.011148 0.011148 0.000000 1.652733e-12 -0.000150 0.000655
1502 212221 0.002623 0.000656 0.000000 1.378472e-44 0.003577 0.000572
1503 222924 0.011148 0.017726 0.000000 1.850154e-36 -0.000020 0.000225
1504 212705 0.009836 0.010432 0.000000 1.101639e-10 -0.000286 0.000537
1505 190132 0.010492 0.002142 0.000000 1.305222e-11 -0.001033 0.000655
1506 221216 0.001311 0.000656 0.000000 3.298794e-08 -0.006489 0.000655
1507 226260 0.000656 0.002623 0.000000 1.051168e-31 -0.000979 0.001051
1508 216256 0.002623 0.003279 0.000005 1.850154e-36 -0.001433 0.000639
1509 216257 0.003279 0.002342 0.000000 1.364157e-09 -0.002457 0.000693
1510 216254 0.001967 0.002732 0.000000 1.051168e-31 -0.000391 0.000655
1511 10130992 0.002623 0.000656 0.000000 1.850154e-36 0.003577 0.000321
1512 208819 0.000656 0.011803 0.000000 1.850154e-36 0.002745 0.000655
1513 10140969 0.001311 0.001311 0.000000 1.378472e-44 -0.006489 0.000695
1514 195699 0.001311 0.012049 0.000000 1.850154e-36 -0.000035 0.000691
1515 189751 0.011803 0.010710 0.000000 2.483235e-28 0.003535 0.000655
1516 10019193 0.009836 0.001967 0.000000 1.051168e-31 -0.001701 0.000655
1517 223973 0.001967 0.001311 0.000000 2.251473e-21 -0.006489 0.000576
1518 10153834 0.001311 0.001967 0.000000 1.101639e-10 -0.001701 0.000655
1519 208956 0.000656 0.000656 0.000000 1.112606e-10 0.003577 0.000655
1520 10150249 0.002623 0.002623 0.000000 1.850154e-36 -0.000979 0.000655
1521 225661 0.003934 0.007344 0.000006 2.338725e-02 -0.001031 0.000664
1522 207273 0.006557 0.014872 0.000000 3.526217e-11 -0.000047 0.000625
1523 225663 0.000656 0.002472 0.000000 3.140576e-27 0.001313 0.000203
1524 10099507 0.000656 0.000656 0.000000 1.378472e-44 -0.006489 0.000655
1525 208958 0.001311 0.001311 0.000000 1.850154e-36 -0.006489 0.000655

1526 rows × 7 columns

In [24]:
import warnings
warnings.filterwarnings("ignore")
ntei='Scatter Matrix Plot of Centralities of the Projected Network of Entities from %s, %s and %s' %(name1,name2,name3) #+ names
f, ax = plt.subplots(figsize=(20,20))
sss=scatter_matrix(dfce[centList], alpha=0.9, color='black', diagonal='hist',ax=ax)
plt.suptitle(ntei,fontsize=18,fontweight='bold')
corr = dfce.corr().as_matrix()
for i, j in zip(*plt.np.triu_indices_from(sss, k=1)):
    sss[i, j].annotate("pearson = %.3f" %corr[i,j], (0.8, 0.93), xycoords='axes fraction', ha='center', va='center')

# sss= scatter_matrix(dfce[centList], alpha=0.9, figsize=(20,20), color='black', diagonal='hist')
In [ ]: