# Tutorial

This notebook gives an overview of scikit-network, a open-source Python package for machine learning on graphs.

## 1. Installation

The easiest way to install scikit-network is through pip.

[1]:

# uncomment to install
# !pip install scikit-network


## 2. Data

Each graph with $$n$$ nodes is represented by a square adjacency matrix $$A$$ of size $$n \times n$$. In its simplest form, the matrix $$A$$ is binary and we have $$A_{ij} =1$$ if and only if there is a link from node $$i$$ to node $$j$$. If the graph is undirected, the matrix $$A$$ is symmetric.

[2]:

import numpy as np

[3]:

adjacency = np.array([[0, 1, 1, 1, 0], [1, 0, 0, 0, 1], [1, 0, 0, 1, 0], [1, 0, 1, 0, 1], [0, 1, 0, 1, 0]])

[4]:

adjacency

[4]:

array([[0, 1, 1, 1, 0],
[1, 0, 0, 0, 1],
[1, 0, 0, 1, 0],
[1, 0, 1, 0, 1],
[0, 1, 0, 1, 0]])


### Sparse matrix

In scikit-network, the adajcency matrix is represented in the sparse CSR format of scipy.

[5]:

from scipy import sparse

[6]:

adjacency = sparse.csr_matrix(adjacency)

[7]:

adjacency

[7]:

<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 12 stored elements in Compressed Sparse Row format>


### Visualization

Here is the visualization of the above graph.

[8]:

from IPython.display import SVG
from sknetwork.visualization import svg_graph

[9]:

# name nodes by indices
names = np.arange(n_nodes)

[10]:

# visualization

[10]:


### Directed graphs

The adjacency matrix is not necessarily symmetric. There might be a link from node $$i$$ to node $$j$$ but not from node $$j$$ to node $$i$$.

[11]:

adjacency = np.array([[0, 1, 1, 1, 0], [0, 0, 0, 1, 1], [0, 1, 0, 0, 0], [1, 0, 0, 0, 1], [0, 1, 0, 0, 0]])

[12]:

adjacency

[12]:

array([[0, 1, 1, 1, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 1],
[0, 1, 0, 0, 0]])

[13]:

adjacency = sparse.csr_matrix(adjacency)

[14]:

# visualization

[14]:


### Bipartite graphs

Bipartite graphs are special graphs with edges between two sets of nodes. A bipartite graph can be represented by its biadjacency matrix $$B$$, a rectangular matrix of links between the two sets of nodes. We have $$B_{ij}=1$$ if and only if there is an edge between node $$i$$ of the first set (represented by the rows of $$B$$) and node $$j$$ of the second set (represented by the columns of $$B$$).

[15]:

biadjacency = np.array([[1, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 1, 1, 1, 0], [0, 0, 0, 1, 1]])

[16]:

biadjacency

[16]:

array([[1, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[1, 1, 1, 1, 0],
[0, 0, 0, 1, 1]])

[17]:

biadjacency = sparse.csr_matrix(biadjacency)

[18]:

biadjacency

[18]:

<4x5 sparse matrix of type '<class 'numpy.int64'>'
with 11 stored elements in Compressed Sparse Row format>

[19]:

from sknetwork.visualization import svg_bigraph

[20]:

# name nodes by indices
names_row = np.arange(n_row)
names_col = np.arange(n_col)

[21]:

# visualization

[21]:


### Weighted graphs

Weights on edges can be represented by an adjacency matrix with non-negative entries (or a biadjacency matrix for a bipartite graph).

[22]:

adjacency = np.array([[0, 2, 1, 3, 0], [2, 0, 0, 0, 3], [1, 0, 0, 2, 0], [3, 0, 2, 0, 1], [0, 3, 0, 1, 0]])

[23]:

adjacency

[23]:

array([[0, 2, 1, 3, 0],
[2, 0, 0, 0, 3],
[1, 0, 0, 2, 0],
[3, 0, 2, 0, 1],
[0, 3, 0, 1, 0]])

[24]:

adjacency = sparse.csr_matrix(adjacency)

[25]:

# visualization

[25]:


### Subgraphs

Subgraphs can easily be obtained by slicing the adjacency matrix.

[26]:

index = [0, 2, 3, 4]

[27]:

# visualization

[27]:


## 3. Algorithms

### Basic tools

Some basic tools for loading, processing and visualizing graphs are available as functions.

Module

Description

Functions

Data

from_cs, save, load, load_netset, …

Topology

Connectivity and structure

get_connected_components, is_acyclic, …

Path

Shortest paths and distances

get_distances, get_shortest_path, …

Linear algebra

Matrix operations

normalize, diagonal_pseudo_inverse, …

Utils

Useful tools

directed2undirected, get_degrees, get_neighbors, …

Visualization

Visualization tools

svg_graph, svg_bigraphs, …

[28]:

from sknetwork.data import karate_club
from sknetwork.utils import get_degrees

[29]:

adjacency = karate_club()

[30]:

get_degrees(adjacency)

[30]:

array([16,  9, 10,  6,  3,  4,  4,  4,  5,  2,  3,  1,  2,  5,  2,  2,  2,
2,  2,  3,  2,  2,  2,  5,  3,  3,  2,  4,  3,  4,  4,  6, 12, 17],
dtype=int32)


### Machine learning

Scikit-network is suitable for machine learning on graphs.

Each algorithm is available as an object with some methods among the following:

• fit: fit the algorithm to the graph. This method is mandatory.

• predict: predict the output (e.g., labels of nodes).

• predict_proba: predict the output with probability (e.g., probability distribution over labels).

• transform: transform the graph.

• fit_predict: apply fit + predict.

• fit_predict_proba: apply fit + predict_proba.

• fit_transform: apply fit + transform.

The output is an attribute of the object with an underscore (e.g., labels_).

Module

Description

Algorithms

Output

Clustering

Form clusters of nodes

Louvain, PropagationClustering

labels_

Classification

Classify nodes

DiffusionClassifier, NNClassifier, …

labels_

GNN

Graph neural networks

GNNClassifier

labels_

Regression

Assign values to nodes

Diffusion, Dirichlet

values_

Hierarchy

Get a hierarchical representation of nodes

Paris, LouvainHierarchy

dendrogram_

Embedding

Get a vectorial representation of nodes

Spectral, SVD, LouvainEmbedding, …

embedding_

Ranking

Give importance scores to nodes

PageRank, Katz, Betweenness, …

scores_

NNLinker

links_

[31]:

from sknetwork.data import karate_club
from sknetwork.classification import DiffusionClassifier

[32]:

dataset = karate_club(metadata=True)
position = dataset.position
labels = dataset.labels

[33]:

classifier = DiffusionClassifier()
classifier.fit(adjacency, {node: labels[node] for node in [0, 1, 33]})

[33]:

DiffusionClassifier(n_iter=10, centering=True, scale=5)

[34]:

labels_pred = classifier.predict()

[35]:

SVG(svg_graph(adjacency, position, labels=labels_pred))

[35]:

[36]:

probs = classifier.predict_proba()

[37]:

scores = probs[:, 1]

[38]:

SVG(svg_graph(adjacency, position, scores=scores))

[38]:


## 4. Example

We here show how to use scikit-network to cluster a graph, initially stored in a text file as a list of edges.

### Data

[39]:

# show first lines
with open('miserables.tsv', 'r') as f:
print(''.join(data[:5]))

Myriel  Napoleon
Myriel  Mlle Baptistine
Myriel  Mme Magloire
Myriel  Countess de Lo
Myriel  Geborand



[40]:

from sknetwork.data import from_csv

[41]:

graph = from_csv('miserables.tsv')

[42]:

adjacency = graph.adjacency
names = graph.names

[43]:

adjacency

[43]:

<77x77 sparse matrix of type '<class 'numpy.int64'>'
with 508 stored elements in Compressed Sparse Row format>


### Embedding

We here compute a 2D embedding for visualization.

[44]:

from sknetwork.embedding import Spring

[45]:

algorithm = Spring()

[46]:

position = algorithm.fit_transform(adjacency)

[47]:

SVG(svg_graph(adjacency, position, names=names))

[47]:


### Clustering

Finally, we cluster the graph by the Louvain algorithm.

[48]:

from sknetwork.clustering import Louvain

[49]:

algorithm = Louvain()

[50]:

SVG(svg_graph(adjacency, position, names=names, labels=labels))

[50]:


### Bipartite graphs

Observe that most algorithms apply to bipartite graphs, with the fit method applied to the biadjacency matrix. We show an example with a bipartite graph between movies and actors.

[51]:

# show first lines
with open('movie_actor.tsv', 'r') as f:
print(''.join(data[:5]))

Inception       Leonardo DiCaprio
Inception       Marion Cotillard
Inception       Joseph Gordon Lewitt
The Dark Knight Rises   Marion Cotillard
The Dark Knight Rises   Joseph Gordon Lewitt


[52]:

graph = from_csv('movie_actor.tsv', bipartite=True)

[53]:

biadjacency = graph.biadjacency
movies = graph.names_row
actors = graph.names_col

[54]:

algorithm = Louvain()

[55]:

SVG(svg_bigraph(biadjacency, names_row=movies, names_col=actors, labels_row=labels_row, labels_col=labels_col))

[55]: