# Classification

Node classification algorithms.

The attribute `labels_` gives the label of each node of the graph.

## Diffusion

class sknetwork.classification.DiffusionClassifier(n_iter: int = 10, centering: bool = True, scale: float = 5)[source]

Node classification by heat diffusion.

For each label, the temperature of a node corresponds to its probability to have this label.

Parameters:
• n_iter (int) – Number of iterations of the diffusion (discrete time).

• centering (bool) – If `True`, center the temperature of each label to its mean before classification (default).

• scale (float) – Multiplicative factor applied to tempreatures before softmax (default = 5). Used only when centering is `True`.

Variables:
• labels (np.ndarray, shape (n_labels,)) – Labels of nodes.

• probs (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.

• labels_row (np.ndarray) – Labels of rows, for bipartite graphs.

• labels_col (np.ndarray) – Labels of columns, for bipartite graphs.

• probs_row (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distributions over labels of rows, for bipartite graphs.

• probs_col (sparse.csr_matrix, shape (n_col, n_labels)) – Probability distributions over labels of columns, for bipartite graphs.

Example

```>>> from sknetwork.data import karate_club
>>> diffusion = DiffusionClassifier()
>>> labels_true = graph.labels
>>> labels = {0: labels_true[0], 33: labels_true[33]}
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.97
```

References

Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).

fit(input_matrix: csr_matrix | ndarray, labels: ndarray | dict | None = None, labels_row: ndarray | dict | None = None, labels_col: ndarray | dict | None = None, force_bipartite: bool = False) [source]

Compute the solution to the Dirichlet problem (temperatures at equilibrium).

Parameters:
• input_matrix (sparse.csr_matrix, np.ndarray) – Adjacency matrix or biadjacency matrix of the graph.

• labels (dict, np.ndarray) – Known labels (dictionary or vector of int). Negative values ignored.

• labels_row (dict, np.ndarray) – Labels of rows for bipartite graphs. Negative values ignored.

• labels_col (dict, np.ndarray) – Labels of columns for bipartite graphs. Negative values ignored.

• force_bipartite (bool) – If `True`, consider the input matrix as a biadjacency matrix (default = `False`).

Returns:

self

Return type:

`DiffusionClassifier`

fit_predict(*args, **kwargs) ndarray

Fit algorithm to the data and return the labels. Same parameters as the `fit` method.

Returns:

labels – Labels.

Return type:

np.ndarray

fit_predict_proba(*args, **kwargs) ndarray

Fit algorithm to the data and return the probability distribution over labels. Same parameters as the `fit` method.

Returns:

probs – Probability of each label.

Return type:

np.ndarray

fit_transform(*args, **kwargs) csr_matrix

Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the `fit` method.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

get_params()

Get parameters as dictionary.

Returns:

params – Parameters of the algorithm.

Return type:

dict

predict(columns: bool = False) ndarray

Return the labels predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

labels – Labels.

Return type:

np.ndarray

predict_proba(columns=False) ndarray

Return the probability distribution over labels as predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

np.ndarray

set_params(params: dict) Algorithm

Set parameters of the algorithm.

Parameters:

params (dict) – Parameters of the algorithm.

Returns:

self

Return type:

`Algorithm`

transform(columns=False) csr_matrix

Return the probability distribution over labels in sparse format.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

## Nearest neighbors

class sknetwork.classification.NNClassifier(n_neighbors: int = 3, embedding_method: BaseEmbedding | None = None, normalize: bool = True)[source]

Node classification by K-nearest neighbors in the embedding space.

Parameters:
• n_neighbors (int) – Number of nearest neighbors .

• embedding_method (`BaseEmbedding`) – Embedding method used to represent nodes in vector space. If `None` (default), use identity.

• normalize (bool) – If `True`, apply normalization so that all vectors have norm 1 in the embedding space.

Variables:
• labels (np.ndarray, shape (n_labels,)) – Labels of nodes.

• probs (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.

• labels_row (np.ndarray) – Labels of rows, for bipartite graphs.

• labels_col (np.ndarray) – Labels of columns, for bipartite graphs.

• probs_row (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distributions over labels of rows, for bipartite graphs.

• probs_col (sparse.csr_matrix, shape (n_col, n_labels)) – Probability distributions over labels of columns, for bipartite graphs.

Example

```>>> from sknetwork.classification import NNClassifier
>>> from sknetwork.data import karate_club
>>> classifier = NNClassifier(n_neighbors=1)
>>> labels_true = graph.labels
>>> labels = {0: labels_true[0], 33: labels_true[33]}
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.82
```
fit(input_matrix: csr_matrix | ndarray, labels: ndarray | dict | None = None, labels_row: ndarray | dict | None = None, labels_col: ndarray | dict | None = None) [source]

Node classification by k-nearest neighbors in the embedding space.

Parameters:
• input_matrix (sparse.csr_matrix, np.ndarray) – Adjacency matrix or biadjacency matrix of the graph.

• labels (np.ndarray, dict) – Known labels. Negative values ignored.

• labels_row (np.ndarray, dict) – Known labels of rows, for bipartite graphs.

• labels_col (np.ndarray, dict) – Known labels of columns, for bipartite graphs.

Returns:

self

Return type:

`KNN`

fit_predict(*args, **kwargs) ndarray

Fit algorithm to the data and return the labels. Same parameters as the `fit` method.

Returns:

labels – Labels.

Return type:

np.ndarray

fit_predict_proba(*args, **kwargs) ndarray

Fit algorithm to the data and return the probability distribution over labels. Same parameters as the `fit` method.

Returns:

probs – Probability of each label.

Return type:

np.ndarray

fit_transform(*args, **kwargs) csr_matrix

Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the `fit` method.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

get_params()

Get parameters as dictionary.

Returns:

params – Parameters of the algorithm.

Return type:

dict

predict(columns: bool = False) ndarray

Return the labels predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

labels – Labels.

Return type:

np.ndarray

predict_proba(columns=False) ndarray

Return the probability distribution over labels as predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

np.ndarray

set_params(params: dict) Algorithm

Set parameters of the algorithm.

Parameters:

params (dict) – Parameters of the algorithm.

Returns:

self

Return type:

`Algorithm`

transform(columns=False) csr_matrix

Return the probability distribution over labels in sparse format.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

## Propagation

class sknetwork.classification.Propagation(n_iter: float = -1, node_order: str | None = None, weighted: bool = True)[source]

Node classification by label propagation.

Parameters:
• n_iter (float) – Maximum number of iterations (-1 for infinity).

• node_order (str) –

• `'random'`: node labels are updated in random order.

• `'increasing'`: node labels are updated by increasing order of (in-) weight.

• `'decreasing'`: node labels are updated by decreasing order of (in-) weight.

• Otherwise, node labels are updated by index order.

• weighted (bool) – If `True`, the vote of each neighbor is proportional to the edge weight. Otherwise, all votes have weight 1.

Variables:
• labels (np.ndarray, shape (n_labels,)) – Labels of nodes.

• probs (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.

• labels_row (np.ndarray) – Labels of rows, for bipartite graphs.

• labels_col (np.ndarray) – Labels of columns, for bipartite graphs.

• probs_row (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distributions over labels of rows, for bipartite graphs.

• probs_col (sparse.csr_matrix, shape (n_col, n_labels)) – Probability distributions over labels of columns, for bipartite graphs.

Example

```>>> from sknetwork.classification import Propagation
>>> from sknetwork.data import karate_club
>>> propagation = Propagation()
>>> labels_true = graph.labels
>>> labels = {0: labels_true[0], 33: labels_true[33]}
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.94
```

References

Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 76(3), 036106.

fit(input_matrix: csr_matrix | ndarray, labels: ndarray | dict | None = None, labels_row: ndarray | dict | None = None, labels_col: ndarray | dict | None = None) [source]

Node classification by label propagation.

Parameters:
• input_matrix (sparse.csr_matrix, np.ndarray) – Adjacency matrix or biadjacency matrix of the graph.

• labels (np.ndarray, dict) – Known labels. Negative values ignored.

• labels_row (np.ndarray, dict) – Known labels of rows, for bipartite graphs.

• labels_col (np.ndarray, dict) – Known labels of columns, for bipartite graphs.

Returns:

self

Return type:

`Propagation`

fit_predict(*args, **kwargs) ndarray

Fit algorithm to the data and return the labels. Same parameters as the `fit` method.

Returns:

labels – Labels.

Return type:

np.ndarray

fit_predict_proba(*args, **kwargs) ndarray

Fit algorithm to the data and return the probability distribution over labels. Same parameters as the `fit` method.

Returns:

probs – Probability of each label.

Return type:

np.ndarray

fit_transform(*args, **kwargs) csr_matrix

Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the `fit` method.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

get_params()

Get parameters as dictionary.

Returns:

params – Parameters of the algorithm.

Return type:

dict

predict(columns: bool = False) ndarray

Return the labels predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

labels – Labels.

Return type:

np.ndarray

predict_proba(columns=False) ndarray

Return the probability distribution over labels as predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

np.ndarray

set_params(params: dict) Algorithm

Set parameters of the algorithm.

Parameters:

params (dict) – Parameters of the algorithm.

Returns:

self

Return type:

`Algorithm`

transform(columns=False) csr_matrix

Return the probability distribution over labels in sparse format.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

## PageRank

class sknetwork.classification.PageRankClassifier(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 0.0, n_jobs: int | None = None, verbose: bool = False)[source]

Node classification by multiple personalized PageRanks.

Parameters:
• damping_factor (float) – Probability to continue the random walk.

• solver (str) – Which solver to use: ‘piteration’, ‘diteration’, ‘bicgstab’, ‘lanczos’.

• n_iter (int) – Number of iterations for some solvers such as `'piteration'` or `'diteration'`.

• tol (float) – Tolerance for the convergence of some solvers such as `'bicgstab'` or `'lanczos'`.

Variables:
• labels (np.ndarray, shape (n_labels,)) – Labels of nodes.

• probs (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.

• labels_row (np.ndarray) – Labels of rows, for bipartite graphs.

• labels_col (np.ndarray) – Labels of columns, for bipartite graphs.

• probs_row (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distributions over labels of rows, for bipartite graphs.

• probs_col (sparse.csr_matrix, shape (n_col, n_labels)) – Probability distributions over labels of columns, for bipartite graphs.

Example

```>>> from sknetwork.classification import PageRankClassifier
>>> from sknetwork.data import karate_club
>>> pagerank = PageRankClassifier()
>>> labels_true = graph.labels
>>> labels = {0: labels_true[0], 33: labels_true[33]}
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.97
```

References

Lin, F., & Cohen, W. W. (2010). Semi-supervised classification of network data using very few labels. In IEEE International Conference on Advances in Social Networks Analysis and Mining.

fit(input_matrix: csr_matrix | ndarray, labels: ndarray | dict | None = None, labels_row: ndarray | dict | None = None, labels_col: ndarray | dict | None = None) RankClassifier

Fit algorithm to data.

Parameters:

• labels – Known labels (dictionary or array; negative values ignored).

• labels_row – Known labels on rows and columns (for bipartite graphs).

• labels_col – Known labels on rows and columns (for bipartite graphs).

Returns:

self

Return type:

`RankClassifier`

fit_predict(*args, **kwargs) ndarray

Fit algorithm to the data and return the labels. Same parameters as the `fit` method.

Returns:

labels – Labels.

Return type:

np.ndarray

fit_predict_proba(*args, **kwargs) ndarray

Fit algorithm to the data and return the probability distribution over labels. Same parameters as the `fit` method.

Returns:

probs – Probability of each label.

Return type:

np.ndarray

fit_transform(*args, **kwargs) csr_matrix

Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the `fit` method.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

get_params()

Get parameters as dictionary.

Returns:

params – Parameters of the algorithm.

Return type:

dict

predict(columns: bool = False) ndarray

Return the labels predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

labels – Labels.

Return type:

np.ndarray

predict_proba(columns=False) ndarray

Return the probability distribution over labels as predicted by the algorithm.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

np.ndarray

set_params(params: dict) Algorithm

Set parameters of the algorithm.

Parameters:

params (dict) – Parameters of the algorithm.

Returns:

self

Return type:

`Algorithm`

transform(columns=False) csr_matrix

Return the probability distribution over labels in sparse format.

Parameters:

columns (bool) – If `True`, return the prediction for columns.

Returns:

probs – Probability distribution over labels.

Return type:

sparse.csr_matrix

## Metrics

sknetwork.classification.get_accuracy_score(labels_true: ndarray, labels_pred: ndarray) float[source]

Return the proportion of correctly labeled samples. Negative labels ignored.

Parameters:
• labels_true (np.ndarray) – True labels.

• labels_pred (np.ndarray) – Predicted labels

Returns:

accuracy – A score between 0 and 1.

Return type:

float

Examples

```>>> import numpy as np
>>> labels_true = np.array([0, 0, 1, 1])
>>> labels_pred = np.array([0, 0, 0, 1])
>>> get_accuracy_score(labels_true, labels_pred)
0.75
```
sknetwork.classification.get_f1_score(labels_true: ndarray, labels_pred: ndarray, return_precision_recall: bool = False) float | Tuple[float, float, float][source]

Return the f1 score of binary classification. Negative labels ignored.

Parameters:
• labels_true (np.ndarray) – True labels.

• labels_pred (np.ndarray) – Predicted labels

• return_precision_recall (bool) – If `True`, also return precision and recall.

Returns:

score, [precision, recall] – F1 score (between 0 and 1). Optionally, also return precision and recall.

Return type:

np.ndarray

Examples

```>>> import numpy as np
>>> labels_true = np.array([0, 0, 1, 1])
>>> labels_pred = np.array([0, 0, 0, 1])
>>> np.round(get_f1_score(labels_true, labels_pred), 2)
0.67
```
sknetwork.classification.get_f1_scores(labels_true: ndarray, labels_pred: ndarray, return_precision_recall: bool = False) ndarray | Tuple[ndarray, ndarray, ndarray][source]

Return the f1 scores of multi-label classification (one per label). Negative labels ignored.

Parameters:
• labels_true (np.ndarray) – True labels.

• labels_pred (np.ndarray) – Predicted labels

• return_precision_recall (bool) – If `True`, also return precisions and recalls.

Returns:

scores, [precisions, recalls] – F1 scores (between 0 and 1). Optionally, also return F1 precisions and recalls.

Return type:

np.ndarray

Examples

```>>> import numpy as np
>>> labels_true = np.array([0, 0, 1, 1])
>>> labels_pred = np.array([0, 0, 0, 1])
>>> np.round(get_f1_scores(labels_true, labels_pred), 2)
array([0.8 , 0.67])
```
sknetwork.classification.get_average_f1_score(labels_true: ndarray, labels_pred: ndarray, average: str = 'macro') float[source]

Return the average f1 score of multi-label classification. Negative labels ignored.

Parameters:
• labels_true (np.ndarray) – True labels.

• labels_pred (np.ndarray) – Predicted labels

• average (str) – Averaging method. Can be either `'macro'` (default), `'micro'` or `'weighted'`.

Returns:

score – Average F1 score (between 0 and 1).

Return type:

float

Examples

```>>> import numpy as np
>>> labels_true = np.array([0, 0, 1, 1])
>>> labels_pred = np.array([0, 0, 0, 1])
>>> np.round(get_average_f1_score(labels_true, labels_pred), 2)
0.73
```
sknetwork.classification.get_confusion_matrix(labels_true: ndarray, labels_pred: ndarray) csr_matrix[source]

Return the confusion matrix in sparse format (true labels on rows, predicted labels on columns). Negative labels ignored.

Parameters:
• labels_true (np.ndarray) – True labels.

• labels_pred (np.ndarray) – Predicted labels

Returns:

confusion matrix – Confusion matrix.

Return type:

sparse.csr_matrix

Examples

```>>> import numpy as np
>>> labels_true = np.array([0, 0, 1, 1])
>>> labels_pred = np.array([0, 0, 0, 1])
>>> get_confusion_matrix(labels_true, labels_pred).toarray()
array([[2, 0],
[1, 1]])
```