# Getting started¶

In scikit-network, a graph is represented by its adjacency matrix (or biadjacency matrix for a bipartite graph) in the Compressed Sparse Row format of SciPy.

In this tutorial, we present a few methods to instantiate a graph in this format.

[1]:

from IPython.display import SVG

import numpy as np
from scipy import sparse
import pandas as pd

from sknetwork.visualization import svg_graph, svg_digraph, svg_bigraph


## From a NumPy array¶

For small graphs, you can instantiate the adjacency matrix as a dense NumPy array and convert it into a sparse matrix in CSR format.

[2]:

adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 0]])

SVG(image)

[2]:


## From an edge list¶

Another natural way to build a graph is from a list of edges.

[3]:

edge_list = [(0, 1), (1, 2), (2, 3), (3, 0), (0, 2)]

SVG(image)

[3]:


By default, the graph is treated as directed, but you can easily make it undirected.

[4]:

adjacency = edgelist2adjacency(edge_list, undirected=True)

SVG(image)

[4]:


[5]:

edge_list = [(0, 1, 1), (1, 2, 0.5), (2, 3, 1), (3, 0, 0.5), (0, 2, 2)]

SVG(image)

[5]:


You can instantiate a bipartite graph as well.

[6]:

edge_list = [(0, 0), (1, 0), (1, 1), (2, 1)]

SVG(image)

[6]:


If nodes are not indexed, convert them!

[7]:

edge_list = [("Alice", "Bob"), ("Bob", "Carey"), ("Alice", "David"), ("Carey", "David"), ("Bob", "David")]
graph = convert_edge_list(edge_list)


You get a bunch containing the adjacency matrix and the name of each node.

[8]:

graph

[8]:

{'adjacency': <4x4 sparse matrix of type '<class 'numpy.bool_'>'
with 10 stored elements in Compressed Sparse Row format>,
'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5')}

[9]:

adjacency = graph.adjacency
names = graph.names

[10]:

image = svg_graph(adjacency, names=names)
SVG(image)

[10]:


Again, you can make the graph directed:

[11]:

graph = convert_edge_list(edge_list, directed=True)

[12]:

graph

[12]:

{'adjacency': <4x4 sparse matrix of type '<class 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>,
'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5')}

[13]:

adjacency = graph.adjacency
names = graph.names

[14]:

image = svg_digraph(adjacency, names=names)
SVG(image)

[14]:


The graph can also be weighted:

[15]:

edge_list = [("Alice", "Bob", 3), ("Bob", "Carey", 2), ("Alice", "David", 1), ("Carey", "David", 2), ("Bob", "David", 3)]
graph = convert_edge_list(edge_list)

[16]:

adjacency = graph.adjacency
names = graph.names

[17]:

image = svg_graph(adjacency, names=names, display_edge_weight=True, display_node_weight=True)
SVG(image)

[17]:


For a bipartite graph:

[18]:

edge_list = [("Alice", "Football"), ("Bob", "Tennis"), ("David", "Football"), ("Carey", "Tennis"), ("Carey", "Football")]
graph = convert_edge_list(edge_list, bipartite=True)

[19]:

biadjacency = graph.biadjacency
names = graph.names
names_col = graph.names_col

[20]:

image = svg_bigraph(biadjacency, names_row=names, names_col=names_col)
SVG(image)

[20]:


## From a dataframe¶

[21]:

df = pd.read_csv('data/miserables.tsv', sep='\t', names=['character_1', 'character_2'])

[22]:

df.head()

[22]:

character_1 character_2
0 Myriel Napoleon
1 Myriel Mlle Baptistine
2 Myriel Mme Magloire
3 Myriel Countess de Lo
4 Myriel Geborand
[23]:

edge_list = list(df.itertuples(index=False))

[24]:

graph = convert_edge_list(edge_list)

[25]:

graph

[25]:

{'adjacency': <77x77 sparse matrix of type '<class 'numpy.bool_'>'
with 508 stored elements in Compressed Sparse Row format>,
'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17')}

[26]:

df = pd.read_csv('data/movie_actor.tsv', sep='\t', names=['movie', 'actor'])

[27]:

df.head()

[27]:

movie actor
0 Inception Leonardo DiCaprio
1 Inception Marion Cotillard
2 Inception Joseph Gordon Lewitt
3 The Dark Knight Rises Marion Cotillard
4 The Dark Knight Rises Joseph Gordon Lewitt
[28]:

edge_list = list(df.itertuples(index=False))

[29]:

graph = convert_edge_list(edge_list, bipartite=True)

[30]:

graph

[30]:

{'biadjacency': <15x16 sparse matrix of type '<class 'numpy.bool_'>'
with 41 stored elements in Compressed Sparse Row format>,
'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
dtype='<U28')}


## From a TSV file¶

You can directly load a graph from a TSV file:

[31]:

graph = load_edge_list('data/miserables.tsv')

[32]:

graph

[32]:

{'adjacency': <77x77 sparse matrix of type '<class 'numpy.bool_'>'
with 508 stored elements in Compressed Sparse Row format>,
'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17')}

[33]:

graph = load_edge_list('data/movie_actor.tsv', bipartite=True)

[34]:

graph

[34]:

{'biadjacency': <15x16 sparse matrix of type '<class 'numpy.bool_'>'
with 41 stored elements in Compressed Sparse Row format>,
'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
dtype='<U20')}


## From NetworkX¶

NetworkX has import and export functions from and towards the CSR format.

## Other options¶

• You have a GraphML file

• You want to test our toy graphs

• You want to generate a graph from a model

• You want to load a graph from one of our referenced repositories (see NetSet and KONECT)

Take a look at the tutorials of the data section!