Getting started

In scikit-network, a graph is represented by its adjacency matrix (or biadjacency matrix for a bipartite graph) in the Compressed Sparse Row format of SciPy.

In this tutorial, we present a few methods to instantiate a graph in this format.

[1]:
from IPython.display import SVG

import numpy as np
from scipy import sparse
import pandas as pd

from sknetwork.utils import edgelist2adjacency, edgelist2biadjacency
from sknetwork.data import convert_edge_list, load_edge_list
from sknetwork.visualization import svg_graph, svg_digraph, svg_bigraph

From a NumPy array

For small graphs, you can instantiate the adjacency matrix as a dense NumPy array and convert it into a sparse matrix in CSR format.

[2]:
adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 0]])
adjacency = sparse.csr_matrix(adjacency)

image = svg_graph(adjacency)
SVG(image)
[2]:
../_images/tutorials_getting_started_4_0.svg

From an edge list

Another natural way to build a graph is from a list of edges.

[3]:
edge_list = [(0, 1), (1, 2), (2, 3), (3, 0), (0, 2)]
adjacency = edgelist2adjacency(edge_list)

image = svg_digraph(adjacency)
SVG(image)
[3]:
../_images/tutorials_getting_started_6_0.svg

By default, the graph is treated as directed, but you can easily make it undirected.

[4]:
adjacency = edgelist2adjacency(edge_list, undirected=True)

image = svg_graph(adjacency)
SVG(image)
[4]:
../_images/tutorials_getting_started_8_0.svg

You might also want to add weights to your edges. Just use triplets instead of pairs!

[5]:
edge_list = [(0, 1, 1), (1, 2, 0.5), (2, 3, 1), (3, 0, 0.5), (0, 2, 2)]
adjacency = edgelist2adjacency(edge_list)

image = svg_digraph(adjacency)
SVG(image)
[5]:
../_images/tutorials_getting_started_10_0.svg

You can instantiate a bipartite graph as well.

[6]:
edge_list = [(0, 0), (1, 0), (1, 1), (2, 1)]
biadjacency = edgelist2biadjacency(edge_list)

image = svg_bigraph(biadjacency)
SVG(image)
[6]:
../_images/tutorials_getting_started_12_0.svg

If nodes are not indexed, convert them!

[7]:
edge_list = [("Alice", "Bob"), ("Bob", "Carey"), ("Alice", "David"), ("Carey", "David"), ("Bob", "David")]
graph = convert_edge_list(edge_list)

You get a bunch containing the adjacency matrix and the name of each node.

[8]:
graph
[8]:
{'adjacency': <4x4 sparse matrix of type '<class 'numpy.bool_'>'
        with 10 stored elements in Compressed Sparse Row format>,
 'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5')}
[9]:
adjacency = graph.adjacency
names = graph.names
[10]:
image = svg_graph(adjacency, names=names)
SVG(image)
[10]:
../_images/tutorials_getting_started_18_0.svg

Again, you can make the graph directed:

[11]:
graph = convert_edge_list(edge_list, directed=True)
[12]:
graph
[12]:
{'adjacency': <4x4 sparse matrix of type '<class 'numpy.bool_'>'
        with 5 stored elements in Compressed Sparse Row format>,
 'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5')}
[13]:
adjacency = graph.adjacency
names = graph.names
[14]:
image = svg_digraph(adjacency, names=names)
SVG(image)
[14]:
../_images/tutorials_getting_started_23_0.svg

The graph can also be weighted:

[15]:
edge_list = [("Alice", "Bob", 3), ("Bob", "Carey", 2), ("Alice", "David", 1), ("Carey", "David", 2), ("Bob", "David", 3)]
graph = convert_edge_list(edge_list)
[16]:
adjacency = graph.adjacency
names = graph.names
[17]:
image = svg_graph(adjacency, names=names, display_edge_weight=True, display_node_weight=True)
SVG(image)
[17]:
../_images/tutorials_getting_started_27_0.svg

For a bipartite graph:

[18]:
edge_list = [("Alice", "Football"), ("Bob", "Tennis"), ("David", "Football"), ("Carey", "Tennis"), ("Carey", "Football")]
graph = convert_edge_list(edge_list, bipartite=True)
[19]:
biadjacency = graph.biadjacency
names = graph.names
names_col = graph.names_col
[20]:
image = svg_bigraph(biadjacency, names_row=names, names_col=names_col)
SVG(image)
[20]:
../_images/tutorials_getting_started_31_0.svg

From a dataframe

[21]:
df = pd.read_csv('data/miserables.tsv', sep='\t', names=['character_1', 'character_2'])
[22]:
df.head()
[22]:
character_1 character_2
0 Myriel Napoleon
1 Myriel Mlle Baptistine
2 Myriel Mme Magloire
3 Myriel Countess de Lo
4 Myriel Geborand
[23]:
edge_list = list(df.itertuples(index=False))
[24]:
graph = convert_edge_list(edge_list)
[25]:
graph
[25]:
{'adjacency': <77x77 sparse matrix of type '<class 'numpy.bool_'>'
        with 508 stored elements in Compressed Sparse Row format>,
 'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
        'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
        'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
        'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
        'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
        'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
        'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
        'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
        'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
        'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
        'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
        'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
        'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
        'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
        'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
        'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17')}
[26]:
df = pd.read_csv('data/movie_actor.tsv', sep='\t', names=['movie', 'actor'])
[27]:
df.head()
[27]:
movie actor
0 Inception Leonardo DiCaprio
1 Inception Marion Cotillard
2 Inception Joseph Gordon Lewitt
3 The Dark Knight Rises Marion Cotillard
4 The Dark Knight Rises Joseph Gordon Lewitt
[28]:
edge_list = list(df.itertuples(index=False))
[29]:
graph = convert_edge_list(edge_list, bipartite=True)
[30]:
graph
[30]:
{'biadjacency': <15x16 sparse matrix of type '<class 'numpy.bool_'>'
        with 41 stored elements in Compressed Sparse Row format>,
 'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
        'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
        'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
        'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
        'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
       dtype='<U28')}

From a TSV file

You can directly load a graph from a TSV file:

[31]:
graph = load_edge_list('data/miserables.tsv')
[32]:
graph
[32]:
{'adjacency': <77x77 sparse matrix of type '<class 'numpy.bool_'>'
        with 508 stored elements in Compressed Sparse Row format>,
 'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
        'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
        'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
        'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
        'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
        'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
        'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
        'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
        'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
        'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
        'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
        'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
        'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
        'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
        'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
        'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17')}
[33]:
graph = load_edge_list('data/movie_actor.tsv', bipartite=True)
[34]:
graph
[34]:
{'biadjacency': <15x16 sparse matrix of type '<class 'numpy.bool_'>'
        with 41 stored elements in Compressed Sparse Row format>,
 'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
        'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
        'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
        'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
        'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
       dtype='<U20')}

From NetworkX

NetworkX has import and export functions from and towards the CSR format.

Other options

  • You have a GraphML file

  • You want to test our toy graphs

  • You want to generate a graph from a model

  • You want to load a graph from one of our referenced repositories (see NetSet and KONECT)

Take a look at the tutorials of the data section!