Classification
Node classification algorithms.
The attribute labels_
assigns a label to each node of the graph.
PageRank
- class sknetwork.classification.PageRankClassifier(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 0.0, n_jobs: Optional[int] = None, verbose: bool = False)[source]
Node classification by multiple personalized PageRanks.
- Parameters
damping_factor – Probability to continue the random walk.
solver (
str
) – Which solver to use: ‘piteration’, ‘diteration’, ‘bicgstab’, ‘lanczos’.n_iter (int) – Number of iterations for some of the solvers such as
'piteration'
or'diteration'
.tol (float) – Tolerance for the convergence of some solvers such as
'bicgstab'
or'lanczos'
.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
membership_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix.
labels_row_ (np.ndarray) – Labels of rows, for bipartite graphs.
labels_col_ (np.ndarray) – Labels of columns, for bipartite graphs.
membership_row_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix of rows, for bipartite graphs.
membership_col_ (sparse.csr_matrix, shape (n_col, n_labels)) – Membership matrix of columns, for bipartite graphs.
Example
>>> from sknetwork.classification import PageRankClassifier >>> from sknetwork.data import karate_club >>> pagerank = PageRankClassifier() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> seeds = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = pagerank.fit_transform(adjacency, seeds) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.97
References
Lin, F., & Cohen, W. W. (2010). Semi-supervised classification of network data using very few labels. In IEEE International Conference on Advances in Social Networks Analysis and Mining.
- fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.base_rank.RankClassifier
Fit algorithm to data.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
seeds – Seed nodes (labels as dictionary or array; negative values ignored).
seeds_row – Seed rows and columns (for bipartite graphs).
seeds_col – Seed rows and columns (for bipartite graphs).
- Returns
self
- Return type
RankClassifier
- fit_transform(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- score(label: int)
Classification scores for a given label.
- Parameters
label (int) – The label index of the class.
- Returns
scores – Classification scores of shape (number of nodes,).
- Return type
np.ndarray
Diffusion
- class sknetwork.classification.DiffusionClassifier(n_iter: int = 10, damping_factor: Optional[float] = None, centering: bool = True, n_jobs: Optional[int] = None)[source]
Node classification using multiple diffusions.
- Parameters
n_iter (int) – Number of steps of the diffusion in discrete time (must be positive).
damping_factor (float (optional)) – Damping factor (default value = 1).
centering (bool) – Whether to center the temperatures with respect to the mean after diffusion (default = True).
n_jobs (int) – If positive, number of parallel jobs allowed (-1 means maximum number). If
None
, no parallel computations are made.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
membership_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix.
labels_row_ (np.ndarray) – Labels of rows, for bipartite graphs.
labels_col_ (np.ndarray) – Labels of columns, for bipartite graphs.
membership_row_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix of rows, for bipartite graphs.
membership_col_ (sparse.csr_matrix, shape (n_col, n_labels)) – Membership matrix of columns, for bipartite graphs.
Example
>>> from sknetwork.data import karate_club >>> diffusion = DiffusionClassifier() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> seeds = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = diffusion.fit_transform(adjacency, seeds) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.94
References
de Lara, N., & Bonald, T. (2020). A Consistent Diffusion-Based Algorithm for Semi-Supervised Classification on Graphs. <https://arxiv.org/pdf/2008.11944.pdf> arXiv preprint arXiv:2008.11944.
Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs <http://pages.cs.wisc.edu/~jerryzhu/machineteaching/pub/thesis.pdf> (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).
- fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.base_rank.RankClassifier
Fit algorithm to data.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
seeds – Seed nodes (labels as dictionary or array; negative values ignored).
seeds_row – Seed rows and columns (for bipartite graphs).
seeds_col – Seed rows and columns (for bipartite graphs).
- Returns
self
- Return type
RankClassifier
- fit_transform(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- score(label: int)
Classification scores for a given label.
- Parameters
label (int) – The label index of the class.
- Returns
scores – Classification scores of shape (number of nodes,).
- Return type
np.ndarray
Dirichlet
- class sknetwork.classification.DirichletClassifier(n_iter: int = 10, damping_factor: Optional[float] = None, centering: bool = True, n_jobs: Optional[int] = None, verbose: bool = False)[source]
Node classification using multiple Dirichlet problems.
- Parameters
n_iter (int) – If positive, the solution to the Dirichlet problem is approximated by power iteration for n_iter steps. Otherwise, the solution is computed through BiConjugate Stabilized Gradient descent.
damping_factor (float (optional)) – Damping factor (default value = 1).
centering (bool) – Whether to center the temperatures with respect to the mean after diffusion (default = True).
n_jobs (int) – If an integer value is given, denotes the number of workers to use (-1 means the maximum number will be used). If
None
, no parallel computations are made.verbose – Verbose mode.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
membership_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix.
labels_row_ (np.ndarray) – Labels of rows, for bipartite graphs.
labels_col_ (np.ndarray) – Labels of columns, for bipartite graphs.
membership_row_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix of rows, for bipartite graphs.
membership_col_ (sparse.csr_matrix, shape (n_col, n_labels)) – Membership matrix of columns, for bipartite graphs.
Example
>>> from sknetwork.data import karate_club >>> dirichlet = DirichletClassifier() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> seeds = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = dirichlet.fit_transform(adjacency, seeds) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.97
References
Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).
- fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.base_rank.RankClassifier
Fit algorithm to data.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
seeds – Seed nodes (labels as dictionary or array; negative values ignored).
seeds_row – Seed rows and columns (for bipartite graphs).
seeds_col – Seed rows and columns (for bipartite graphs).
- Returns
self
- Return type
RankClassifier
- fit_transform(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- score(label: int)
Classification scores for a given label.
- Parameters
label (int) – The label index of the class.
- Returns
scores – Classification scores of shape (number of nodes,).
- Return type
np.ndarray
Propagation
- class sknetwork.classification.Propagation(n_iter: float = - 1, node_order: Optional[str] = None, weighted: bool = True)[source]
Node classification by label propagation.
- Parameters
n_iter (float) – Maximum number of iterations (-1 for infinity).
node_order (str) –
‘random’: node labels are updated in random order.
’increasing’: node labels are updated by increasing order of (in-)weight.
’decreasing’: node labels are updated by decreasing order of (in-)weight.
Otherwise, node labels are updated by index order.
weighted (bool) – If
True
, the vote of each neighbor is proportional to the edge weight. Otherwise, all votes have weight 1.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
membership_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix.
labels_row_ (np.ndarray) – Labels of rows, for bipartite graphs.
labels_col_ (np.ndarray) – Labels of columns, for bipartite graphs.
membership_row_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix of rows, for bipartite graphs.
membership_col_ (sparse.csr_matrix, shape (n_col, n_labels)) – Membership matrix of columns, for bipartite graphs.
Example
>>> from sknetwork.classification import Propagation >>> from sknetwork.data import karate_club >>> propagation = Propagation() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> seeds = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = propagation.fit_transform(adjacency, seeds) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.94
References
Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 76(3), 036106.
- fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.propagation.Propagation [source]
Node classification by label propagation.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
seeds – Seed nodes. Can be a dict {node: label} or an array where “-1” means no label.
seeds_row – Seeds of rows and columns (for bipartite graphs).
seeds_col – Seeds of rows and columns (for bipartite graphs).
- Returns
self
- Return type
- fit_transform(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- score(label: int)
Classification scores for a given label.
- Parameters
label (int) – The label index of the class.
- Returns
scores – Classification scores of shape (number of nodes,).
- Return type
np.ndarray
Nearest neighbors
- class sknetwork.classification.KNN(embedding_method: sknetwork.embedding.base.BaseEmbedding = GSVD(n_components=10, regularization=None, factor_row=0.5, factor_col=0.5, factor_singular=0.0, normalized=True, solver='lanczos'), n_neighbors: int = 5, factor_distance: float = 2, leaf_size: int = 16, p: float = 2, tol_nn: float = 0.01, n_jobs: Optional[int] = None)[source]
Node classification by K-nearest neighbors in the embedding space.
For bigraphs, classify rows only (see
BiKNN
for joint classification of rows and columns).- Parameters
embedding_method – Which algorithm to use to project the nodes in vector space. Default is
GSVD
.n_neighbors – Number of nearest neighbors to consider.
factor_distance – Power weighting factor \(\alpha\) applied to the distance to each neighbor. Neighbor at distance :math:
d
has weight \(1 / d^\alpha\). Default is 2.leaf_size – Leaf size passed to KDTree.
p – Which Minkowski p-norm to use. Default is 2 (Euclidean distance).
tol_nn – Tolerance in nearest neighbors search; the k-th returned value is guaranteed to be no further than
1 + tol_nn
times the distance to the actual k-th nearest neighbor.n_jobs – Number of jobs to schedule for parallel processing. If -1 is given all processors are used.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
membership_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix.
labels_row_ (np.ndarray) – Labels of rows, for bipartite graphs.
labels_col_ (np.ndarray) – Labels of columns, for bipartite graphs.
membership_row_ (sparse.csr_matrix, shape (n_row, n_labels)) – Membership matrix of rows, for bipartite graphs.
membership_col_ (sparse.csr_matrix, shape (n_col, n_labels)) – Membership matrix of columns, for bipartite graphs.
Example
>>> from sknetwork.classification import KNN >>> from sknetwork.embedding import GSVD >>> from sknetwork.data import karate_club >>> knn = KNN(GSVD(3), n_neighbors=1) >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> seeds = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = knn.fit_transform(adjacency, seeds) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.97
- fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.knn.KNN [source]
Node classification by k-nearest neighbors in the embedding space.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
seeds – Seed nodes. Can be a dict {node: label} or an array where “-1” means no label.
seeds_row – Seeds of rows and columns (for bipartite graphs).
seeds_col – Seeds of rows and columns (for bipartite graphs).
- Returns
self
- Return type
- fit_transform(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- score(label: int)
Classification scores for a given label.
- Parameters
label (int) – The label index of the class.
- Returns
scores – Classification scores of shape (number of nodes,).
- Return type
np.ndarray
Metrics
- sknetwork.classification.accuracy_score(y_true: numpy.ndarray, y_pred: numpy.ndarray) float [source]
- Accuracy: number of correctly labeled samples over total number of elements.
In the case of binary classification, this is
\(P = \dfrac{TP + TN}{TP + TN + FP + FN}\).
- Parameters
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels
- Returns
precision – A score between 0 and 1.
- Return type
float
Examples
>>> import numpy as np >>> y_true = np.array([0, 0, 1, 1]) >>> y_pred = np.array([0, 0, 0, 1]) >>> accuracy_score(y_true, y_pred) 0.75