Classification

Node classification algorithms.

The attribute labels_ assigns a label to each node of the graph.

PageRank

class sknetwork.classification.PageRankClassifier(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 0.0, n_jobs: Optional[int] = None, verbose: bool = False)[source]

Node classification by multiple personalized PageRanks.

  • Graphs

  • Digraphs

Parameters
  • damping_factor – Probability to continue the random walk.

  • solver (str) – Which solver to use: ‘piteration’, ‘diteration’, ‘bicgstab’, ‘lanczos’.

  • n_iter (int) – Number of iterations for some of the solvers such as 'piteration' or 'diteration'.

  • tol (float) – Tolerance for the convergence of some solvers such as 'bicgstab' or 'lanczos'.

Variables
  • labels_ (np.ndarray) – Label of each node (hard classification).

  • membership_ (sparse.csr_matrix) – Membership matrix (soft classification, columns = labels).

Example

>>> from sknetwork.classification import PageRankClassifier
>>> from sknetwork.data import karate_club
>>> pagerank = PageRankClassifier()
>>> graph = karate_club(metadata=True)
>>> adjacency = graph.adjacency
>>> labels_true = graph.labels
>>> seeds = {0: labels_true[0], 33: labels_true[33]}
>>> labels_pred = pagerank.fit_transform(adjacency, seeds)
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.97

References

Lin, F., & Cohen, W. W. (2010). Semi-supervised classification of network data using very few labels. In IEEE International Conference on Advances in Social Networks Analysis and Mining.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Union[numpy.ndarray, dict]) → sknetwork.classification.base_rank.RankClassifier

Fit algorithm to the data.

Parameters
  • adjacency – Adjacency matrix of the graph.

  • seeds – Seed nodes (labels as dictionary or array; negative values ignored).

Returns

self

Return type

RankClassifier

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

class sknetwork.classification.BiPageRankClassifier(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 0.0, n_jobs: Optional[int] = None, verbose: bool = False)[source]

Node classification for bipartite graphs by multiple personalized PageRanks .

  • Bigraphs

Parameters
  • damping_factor – Probability to continue the random walk.

  • solver (str) – Which solver to use: ‘piteration’, ‘diteration’, ‘bicgstab’, ‘lanczos’.

  • n_iter (int) – Number of iterations for some of the solvers such as 'piteration' or 'diteration'.

  • tol (float) – Tolerance for the convergence of some solvers such as 'bicgstab' or 'lanczos'.

Variables
  • labels_ (np.ndarray) – Label of each row.

  • labels_row_ (np.ndarray) – Label of each row (copy of labels_).

  • labels_col_ (np.ndarray) – Label of each column.

  • membership_ (sparse.csr_matrix) – Membership matrix of rows (soft classification, labels on columns).

  • membership_row_ (sparse.csr_matrix) – Membership matrix of rows (copy of membership_).

  • membership_col_ (sparse.csr_matrix) – Membership matrix of columns.

Example

>>> from sknetwork.classification import BiPageRankClassifier
>>> from sknetwork.data import star_wars
>>> bipagerank = BiPageRankClassifier()
>>> biadjacency = star_wars()
>>> seeds = {0: 1, 2: 0}
>>> bipagerank.fit_transform(biadjacency, seeds)
array([1, 1, 0, 0])
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Union[numpy.ndarray, dict], seeds_col: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.classification.base_rank.RankClassifier

Compute labels.

Parameters
  • biadjacency – Biadjacency matrix of the graph.

  • seeds_row – Seed rows (labels as dictionary or array; negative values ignored).

  • seeds_col – Seed columns (optional). Same format.

Returns

self

Return type

BiPageRankClassifier

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

Diffusion

class sknetwork.classification.DiffusionClassifier(n_iter: int = 10, damping_factor: Optional[float] = None, n_jobs: Optional[int] = None)[source]

Node classification using multiple diffusions.

  • Graphs

  • Digraphs

Parameters
  • n_iter (int) – Number of steps of the diffusion in discrete time (must be positive).

  • damping_factor (float (optional)) – Damping factor (default value = 1).

  • n_jobs – If positive, number of parallel jobs allowed (-1 means maximum number). If None, no parallel computations are made.

Variables
  • labels_ (np.ndarray) – Label of each node (hard classification).

  • membership_ (sparse.csr_matrix) – Membership matrix (soft classification, labels on columns).

Example

>>> from sknetwork.data import karate_club
>>> diffusion = DiffusionClassifier()
>>> graph = karate_club(metadata=True)
>>> adjacency = graph.adjacency
>>> labels_true = graph.labels
>>> seeds = {0: labels_true[0], 33: labels_true[33]}
>>> labels_pred = diffusion.fit_transform(adjacency, seeds)
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.94

References

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Union[numpy.ndarray, dict]) → sknetwork.classification.base_rank.RankClassifier

Fit algorithm to the data.

Parameters
  • adjacency – Adjacency matrix of the graph.

  • seeds – Seed nodes (labels as dictionary or array; negative values ignored).

Returns

self

Return type

RankClassifier

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

class sknetwork.classification.BiDiffusionClassifier(n_iter: int = 10, damping_factor: Optional[float] = None, n_jobs: Optional[int] = None)[source]

Node classification using multiple diffusions.

  • Bigraphs

Parameters
  • n_iter (int) – Number of steps of the diffusion in discrete time (must be positive).

  • damping_factor (float (optional)) – Damping factor (default value = 1).

  • n_jobs – If positive, number of parallel jobs allowed (-1 means maximum number). If None, no parallel computations are made.

Variables
  • labels_ (np.ndarray) – Label of each row.

  • labels_row_ (np.ndarray) – Label of each row (copy of labels_).

  • labels_col_ (np.ndarray) – Label of each column.

  • membership_ (sparse.csr_matrix) – Membership matrix of rows (soft classification, labels on columns).

  • membership_row_ (sparse.csr_matrix) – Membership matrix of rows (copy of membership_).

  • membership_col_ (sparse.csr_matrix) – Membership matrix of columns.

Example

>>> from sknetwork.classification import BiDiffusionClassifier
>>> from sknetwork.data import star_wars
>>> bidiffusion = BiDiffusionClassifier(n_iter=2)
>>> biadjacency = star_wars()
>>> seeds = {0: 1, 2: 0}
>>> bidiffusion.fit_transform(biadjacency, seeds)
array([1, 1, 0, 0])
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Union[numpy.ndarray, dict], seeds_col: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.classification.base_rank.RankClassifier

Compute labels.

Parameters
  • biadjacency – Biadjacency matrix of the graph.

  • seeds_row – Seed rows (labels as dictionary or array; negative values ignored).

  • seeds_col – Seed columns (optional). Same format.

Returns

self

Return type

BiPageRankClassifier

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

Dirichlet

class sknetwork.classification.DirichletClassifier(n_iter: int = 10, damping_factor: Optional[float] = None, n_jobs: Optional[int] = None, verbose: bool = False)[source]

Node classification using multiple Dirichlet problems.

  • Graphs

  • Digraphs

Parameters
  • n_iter (int) – If positive, the solution to the Dirichlet problem is approximated by power iteration for n_iter steps. Otherwise, the solution is computed through BiConjugate Stabilized Gradient descent.

  • damping_factor (float (optional)) – Damping factor (default value = 1).

  • n_jobs – If an integer value is given, denotes the number of workers to use (-1 means the maximum number will be used). If None, no parallel computations are made.

  • verbose – Verbose mode.

Variables
  • labels_ (np.ndarray) – Label of each node (hard classification).

  • membership_ (sparse.csr_matrix) – Membership matrix (soft classification, labels on columns).

Example

>>> from sknetwork.data import karate_club
>>> dirichlet = DirichletClassifier()
>>> graph = karate_club(metadata=True)
>>> adjacency = graph.adjacency
>>> labels_true = graph.labels
>>> seeds = {0: labels_true[0], 33: labels_true[33]}
>>> labels_pred = dirichlet.fit_transform(adjacency, seeds)
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.97

References

Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Union[numpy.ndarray, dict]) → sknetwork.classification.base_rank.RankClassifier

Fit algorithm to the data.

Parameters
  • adjacency – Adjacency matrix of the graph.

  • seeds – Seed nodes (labels as dictionary or array; negative values ignored).

Returns

self

Return type

RankClassifier

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

class sknetwork.classification.BiDirichletClassifier(n_iter: int = 10, damping_factor: Optional[float] = None, n_jobs: Optional[int] = None, verbose: bool = False)[source]

Node classification using multiple diffusions.

  • Bigraphs

Parameters
  • n_iter (int) – If positive, the solution to the Dirichlet problem is approximated by power iteration for n_iter steps. Otherwise, the solution is computed through BiConjugate Stabilized Gradient descent.

  • damping_factor (float (optional)) – Damping factor (default value = 1).

  • n_jobs – If positive, number of parallel jobs allowed (-1 means maximum number). If None, no parallel computations are made.

  • verbose – Verbose mode.

Variables
  • labels_ (np.ndarray) – Label of each row.

  • labels_row_ (np.ndarray) – Label of each row (copy of labels_).

  • labels_col_ (np.ndarray) – Label of each column.

  • membership_ (sparse.csr_matrix) – Membership matrix of rows (soft classification, labels on columns).

  • membership_row_ (sparse.csr_matrix) – Membership matrix of rows (copy of membership_).

  • membership_col_ (sparse.csr_matrix) – Membership matrix of columns.

Example

>>> from sknetwork.data import star_wars
>>> bidirichlet = BiDirichletClassifier()
>>> biadjacency = star_wars()
>>> seeds = {0: 1, 2: 0}
>>> bidirichlet.fit_transform(biadjacency, seeds)
array([1, 1, 0, 0])
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Union[numpy.ndarray, dict], seeds_col: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.classification.base_rank.RankClassifier

Compute labels.

Parameters
  • biadjacency – Biadjacency matrix of the graph.

  • seeds_row – Seed rows (labels as dictionary or array; negative values ignored).

  • seeds_col – Seed columns (optional). Same format.

Returns

self

Return type

BiPageRankClassifier

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

Propagation

class sknetwork.classification.Propagation(n_iter: int = - 1, node_order: str = None, weighted: bool = True)[source]

Node classification by label propagation.

  • Graphs

  • Digraphs

Parameters
  • n_iter (int) – Maximum number of iterations (-1 for infinity).

  • node_order (str) –

    • ‘random’: node labels are updated in random order.

    • ’increasing’: node labels are updated by increasing order of (in-)weight.

    • ’decreasing’: node labels are updated by decreasing order of (in-)weight.

    • Otherwise, node labels are updated by index order.

  • weighted (bool) – If True, the vote of each neighbor is proportional to the edge weight. Otherwise, all votes have weight 1.

Variables
  • labels_ (np.ndarray) – Label of each node.

  • membership_ (sparse.csr_matrix) – Membership matrix (columns = labels).

Example

>>> from sknetwork.classification import Propagation
>>> from sknetwork.data import karate_club
>>> propagation = Propagation()
>>> graph = karate_club(metadata=True)
>>> adjacency = graph.adjacency
>>> labels_true = graph.labels
>>> seeds = {0: labels_true[0], 33: labels_true[33]}
>>> labels_pred = propagation.fit_transform(adjacency, seeds)
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.94

References

Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 76(3), 036106.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Union[numpy.ndarray, dict] = None) → sknetwork.classification.propagation.Propagation[source]

Node classification by label propagation.

Parameters
  • adjacency – Adjacency matrix of the graph.

  • seeds – Seed nodes. Can be a dict {node: label} or an array where “-1” means no label.

Returns

self

Return type

Propagation

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

class sknetwork.classification.BiPropagation(n_iter: int = - 1)[source]

Node classification by label propagation in bipartite graphs.

  • Bigraphs

Parameters

n_iter – Maximum number of iteration (-1 for infinity).

Variables
  • labels_ (np.ndarray) – Label of each row.

  • labels_row_ (np.ndarray) – Label of each row (copy of labels_).

  • labels_col_ (np.ndarray) – Label of each column.

  • membership_ (sparse.csr_matrix) – Membership matrix of rows.

  • membership_row_ (sparse.csr_matrix) – Membership matrix of rows (copy of membership_).

  • membership_col_ (sparse.csr_matrix) – Membership matrix of columns.

Example

>>> from sknetwork.classification import BiPropagation
>>> from sknetwork.data import movie_actor
>>> bipropagation = BiPropagation()
>>> graph = movie_actor(metadata=True)
>>> biadjacency = graph.biadjacency
>>> seeds_row = {0: 0, 1: 2, 2: 1}
>>> len(bipropagation.fit_transform(biadjacency, seeds_row))
15
>>> len(bipropagation.labels_col_)
16
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Union[numpy.ndarray, dict], seeds_col: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.classification.propagation.BiPropagation[source]

Node classification by k-nearest neighbors in the embedding space.

Parameters
  • biadjacency – Biadjacency matrix of the graph.

  • seeds_row – Seed rows. Can be a dict {node: label} or an array where “-1” means no label.

  • seeds_col – Seed columns (optional). Same format.

Returns

self

Return type

BiPropagation

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

Nearest neighbors

class sknetwork.classification.KNN(embedding_method: sknetwork.embedding.base.BaseEmbedding = GSVD(n_components=10, regularization=None, relative_regularization=True, factor_row=0.5, factor_col=0.5, factor_singular=0.0, normalized=True, solver='auto'), n_neighbors: int = 5, factor_distance: float = 2, leaf_size: int = 16, p: float = 2, tol_nn: float = 0.01, n_jobs: Optional[int] = None)[source]

Node classification by K-nearest neighbors in the embedding space.

  • Graphs

  • Digraphs

  • Bigraphs

For bigraphs, classify rows only (see BiKNN for joint classification of rows and columns).

Parameters
  • embedding_method – Which algorithm to use to project the nodes in vector space. Default is GSVD.

  • n_neighbors – Number of nearest neighbors to consider.

  • factor_distance – Power weighting factor \(\alpha\) applied to the distance to each neighbor. Neighbor at distance :math:d has weight \(1 / d^\alpha\). Default is 2.

  • leaf_size – Leaf size passed to KDTree.

  • p – Which Minkowski p-norm to use. Default is 2 (Euclidean distance).

  • tol_nn – Tolerance in nearest neighbors search; the k-th returned value is guaranteed to be no further than 1 + tol_nn times the distance to the actual k-th nearest neighbor.

  • n_jobs – Number of jobs to schedule for parallel processing. If -1 is given all processors are used.

Variables
  • labels_ (np.ndarray) – Label of each node.

  • membership_ (sparse.csr_matrix) – Membership matrix (columns = labels).

Example

>>> from sknetwork.classification import KNN
>>> from sknetwork.embedding import GSVD
>>> from sknetwork.data import karate_club
>>> knn = KNN(GSVD(3), n_neighbors=1)
>>> graph = karate_club(metadata=True)
>>> adjacency = graph.adjacency
>>> labels_true = graph.labels
>>> seeds = {0: labels_true[0], 33: labels_true[33]}
>>> labels_pred = knn.fit_transform(adjacency, seeds)
>>> np.round(np.mean(labels_pred == labels_true), 2)
0.97
fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Union[numpy.ndarray, dict]) → sknetwork.classification.knn.KNN[source]

Node classification by k-nearest neighbors in the embedding space.

Parameters
  • adjacency – Adjacency or biadjacency matrix of the graph.

  • seeds – Seed nodes. Can be a dict {node: label} or an array where “-1” means no label.

Returns

self

Return type

KNN

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

class sknetwork.classification.BiKNN(embedding_method: sknetwork.embedding.base.BaseEmbedding = GSVD(n_components=10, regularization=None, relative_regularization=True, factor_row=0.5, factor_col=0.5, factor_singular=0.0, normalized=True, solver='auto'), n_neighbors: int = 5, factor_distance: float = 2, leaf_size: int = 16, p: float = 2, tol_nn: float = 0.01, n_jobs: int = 1)[source]

Node classification by K-nearest neighbors in the embedding space.

  • Bigraphs

Parameters
  • embedding_method – Which algorithm to use to project the nodes in vector space. Default is GSVD.

  • n_neighbors – Number of nearest neighbors to consider.

  • factor_distance – Power weighting factor :math:\alpha applied to the distance to each neighbor. Neighbor at distance :math:d has weight :math:1 / d^\alpha. Default is 2.

  • leaf_size – Leaf size passed to KDTree.

  • p – Which Minkowski p-norm to use. Default is 2 (Euclidean distance).

  • tol_nn – Tolerance in nearest neighbors search; the k-th returned value is guaranteed to be no further than 1 + tol_nn times the distance to the actual k-th nearest neighbor.

  • n_jobs – Number of jobs to schedule for parallel processing. If -1 is given all processors are used.

Variables
  • labels_ (np.ndarray) – Label of each row.

  • labels_row_ (np.ndarray) – Label of each row (copy of labels_).

  • labels_col_ (np.ndarray) – Label of each column.

  • membership_ (sparse.csr_matrix) – Membership matrix of rows.

  • membership_row_ (sparse.csr_matrix) – Membership matrix of rows (copy of membership_).

  • membership_col_ (sparse.csr_matrix) – Membership matrix of columns.

Example

>>> from sknetwork.classification import BiKNN
>>> from sknetwork.data import movie_actor
>>> biknn = BiKNN(n_neighbors=2)
>>> graph = movie_actor(metadata=True)
>>> biadjacency = graph.biadjacency
>>> seeds_row = {0: 0, 1: 2, 2: 1}
>>> len(biknn.fit_transform(biadjacency, seeds_row))
15
>>> len(biknn.labels_col_)
16
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Union[numpy.ndarray, dict], seeds_col: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.classification.knn.BiKNN[source]

Node classification by k-nearest neighbors in the embedding space.

Parameters
  • biadjacency – Biadjacency matrix of the graph.

  • seeds_row – Seed rows. Can be a dict {node: label} or an array where “-1” means no label.

  • seeds_col – Seed columns (optional). Same format.

Returns

self

Return type

BiKNN

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to the data and return the labels. Same parameters as the fit method.

Returns

labels – Labels.

Return type

np.ndarray

score(label: int)

Classification scores for a given label.

Parameters

label (int) – The label index of the class.

Returns

scores – Classification scores of shape (number of nodes,).

Return type

np.ndarray

Metrics

sknetwork.classification.accuracy_score(y_true: numpy.ndarray, y_pred: numpy.ndarray) → float[source]
Accuracy: number of correctly labeled samples over total number of elements.

In the case of binary classification, this is

\(P = \dfrac{TP + TN}{TP + TN + FP + FN}\).

Parameters
  • y_true (np.ndarray) – True labels.

  • y_pred (np.ndarray) – Predicted labels

Returns

precision – A score between 0 and 1.

Return type

float

Examples

>>> import numpy as np
>>> y_true = np.array([0, 0, 1, 1])
>>> y_pred = np.array([0, 0, 0, 1])
>>> accuracy_score(y_true, y_pred)
0.75