Classification
Node classification algorithms.
The attribute labels_
gives the label of each node of the graph.
Diffusion
- class sknetwork.classification.DiffusionClassifier(n_iter: int = 10, centering: bool = True, scale: float = 5)[source]
Node classification by heat diffusion.
For each label, the temperature of a node corresponds to its probability to have this label.
- Parameters
n_iter (int) – Number of iterations of the diffusion (discrete time).
centering (bool) – If
True
, center the temperature of each label to its mean before classification (default).scale (float) – Multiplicative factor applied to tempreatures before softmax (default = 5). Used only when centering is
True
.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Labels of nodes.
probs_ (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.
labels_col (labels_row_,) – Labels of rows and columns, for bipartite graphs.
probs_col (probs_row_,) – Probability distributions over labels for rows and columns (for bipartite graphs).
Example
>>> from sknetwork.data import karate_club >>> diffusion = DiffusionClassifier() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> labels = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = diffusion.fit_predict(adjacency, labels) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.97
References
Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).
- fit(input_matrix: Union[scipy.sparse._csr.csr_matrix, numpy.ndarray], labels: Optional[Union[numpy.ndarray, dict]] = None, labels_row: Optional[Union[numpy.ndarray, dict]] = None, labels_col: Optional[Union[numpy.ndarray, dict]] = None, force_bipartite: bool = False) sknetwork.classification.diffusion.DiffusionClassifier [source]
Compute the solution to the Dirichlet problem (temperatures at equilibrium).
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
labels – Known labels (dictionary or vector of int). Negative values ignored.
labels_row – Labels of rows and columns for bipartite graphs. Negative values ignored.
labels_col – Labels of rows and columns for bipartite graphs. Negative values ignored.
force_bipartite – If
True
, consider the input matrix as a biadjacency matrix (default =False
).
- Returns
self
- Return type
- fit_predict(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- fit_predict_proba(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the probability distribution over labels. Same parameters as the
fit
method.- Returns
probs – Probability of each label.
- Return type
np.ndarray
- fit_transform(*args, **kwargs) scipy.sparse._csr.csr_matrix
Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the
fit
method.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
- get_params()
Get parameters as dictionary.
- Returns
params – Parameters of the algorithm.
- Return type
dict
- predict(columns=False) numpy.ndarray
Return the labels predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
labels – Labels.
- Return type
np.ndarray
- predict_proba(columns=False) numpy.ndarray
Return the probability distribution over labels as predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
np.ndarray
- set_params(params: dict) sknetwork.base.Algorithm
Set parameters of the algorithm.
- Parameters
params (dict) – Parameters of the algorithm.
- Returns
self
- Return type
Algorithm
- transform(columns=False) scipy.sparse._csr.csr_matrix
Return the probability distribution over labels in sparse format.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
Nearest neighbors
- class sknetwork.classification.NNClassifier(n_neighbors: int = 3, embedding_method: Optional[sknetwork.embedding.base.BaseEmbedding] = None, normalize: bool = True)[source]
Node classification by K-nearest neighbors in the embedding space.
- Parameters
n_neighbors – Number of nearest neighbors .
embedding_method – Embedding method used to represent nodes in vector space. If
None
(default), use identity.normalize – If
True
, apply normalization so that all vectors have norm 1 in the embedding space.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Labels of nodes.
probs_ (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.
labels_col (labels_row_,) – Labels of rows and columns, for bipartite graphs.
probs_col (probs_row_,) – Probability distributions over labels for rows and columns (for bipartite graphs).
Example
>>> from sknetwork.classification import NNClassifier >>> from sknetwork.data import karate_club >>> classifier = NNClassifier(n_neighbors=1) >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> labels = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = classifier.fit_predict(adjacency, labels) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.82
- fit(input_matrix: Union[scipy.sparse._csr.csr_matrix, numpy.ndarray], labels: Optional[Union[numpy.ndarray, dict]] = None, labels_row: Optional[Union[numpy.ndarray, dict]] = None, labels_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.knn.NNClassifier [source]
Node classification by k-nearest neighbors in the embedding space.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
labels – Known labels (dictionary or array). Negative values ignored.
labels_row – Known labels of rows and columns (for bipartite graphs).
labels_col – Known labels of rows and columns (for bipartite graphs).
- Returns
self
- Return type
KNN
- fit_predict(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- fit_predict_proba(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the probability distribution over labels. Same parameters as the
fit
method.- Returns
probs – Probability of each label.
- Return type
np.ndarray
- fit_transform(*args, **kwargs) scipy.sparse._csr.csr_matrix
Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the
fit
method.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
- get_params()
Get parameters as dictionary.
- Returns
params – Parameters of the algorithm.
- Return type
dict
- predict(columns=False) numpy.ndarray
Return the labels predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
labels – Labels.
- Return type
np.ndarray
- predict_proba(columns=False) numpy.ndarray
Return the probability distribution over labels as predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
np.ndarray
- set_params(params: dict) sknetwork.base.Algorithm
Set parameters of the algorithm.
- Parameters
params (dict) – Parameters of the algorithm.
- Returns
self
- Return type
Algorithm
- transform(columns=False) scipy.sparse._csr.csr_matrix
Return the probability distribution over labels in sparse format.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
Propagation
- class sknetwork.classification.Propagation(n_iter: float = - 1, node_order: Optional[str] = None, weighted: bool = True)[source]
Node classification by label propagation.
- Parameters
n_iter (float) – Maximum number of iterations (-1 for infinity).
node_order (str) –
‘random’: node labels are updated in random order.
’increasing’: node labels are updated by increasing order of (in-)weight.
’decreasing’: node labels are updated by decreasing order of (in-)weight.
Otherwise, node labels are updated by index order.
weighted (bool) – If
True
, the vote of each neighbor is proportional to the edge weight. Otherwise, all votes have weight 1.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
probs_ (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.
labels_col (labels_row_,) – Labels of rows and columns, for bipartite graphs.
probs_col (probs_row_,) – Probability distributions over labels for rows and columns (for bipartite graphs).
Example
>>> from sknetwork.classification import Propagation >>> from sknetwork.data import karate_club >>> propagation = Propagation() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> labels = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = propagation.fit_predict(adjacency, labels) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.94
References
Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 76(3), 036106.
- fit(input_matrix: Union[scipy.sparse._csr.csr_matrix, numpy.ndarray], labels: Optional[Union[numpy.ndarray, dict]] = None, labels_row: Optional[Union[numpy.ndarray, dict]] = None, labels_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.propagation.Propagation [source]
Node classification by label propagation.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
labels – Known labels (dictionary or array). Negative values ignored.
labels_row – Known labels of rows and columns (for bipartite graphs).
labels_col – Known labels of rows and columns (for bipartite graphs).
- Returns
self
- Return type
- fit_predict(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- fit_predict_proba(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the probability distribution over labels. Same parameters as the
fit
method.- Returns
probs – Probability of each label.
- Return type
np.ndarray
- fit_transform(*args, **kwargs) scipy.sparse._csr.csr_matrix
Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the
fit
method.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
- get_params()
Get parameters as dictionary.
- Returns
params – Parameters of the algorithm.
- Return type
dict
- predict(columns=False) numpy.ndarray
Return the labels predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
labels – Labels.
- Return type
np.ndarray
- predict_proba(columns=False) numpy.ndarray
Return the probability distribution over labels as predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
np.ndarray
- set_params(params: dict) sknetwork.base.Algorithm
Set parameters of the algorithm.
- Parameters
params (dict) – Parameters of the algorithm.
- Returns
self
- Return type
Algorithm
- transform(columns=False) scipy.sparse._csr.csr_matrix
Return the probability distribution over labels in sparse format.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
PageRank
- class sknetwork.classification.PageRankClassifier(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 0.0, n_jobs: Optional[int] = None, verbose: bool = False)[source]
Node classification by multiple personalized PageRanks.
- Parameters
damping_factor – Probability to continue the random walk.
solver (
str
) – Which solver to use: ‘piteration’, ‘diteration’, ‘bicgstab’, ‘lanczos’.n_iter (int) – Number of iterations for some solvers such as
'piteration'
or'diteration'
.tol (float) – Tolerance for the convergence of some solvers such as
'bicgstab'
or'lanczos'
.
- Variables
labels_ (np.ndarray, shape (n_labels,)) – Label of each node.
probs_ (sparse.csr_matrix, shape (n_row, n_labels)) – Probability distribution over labels.
labels_col (labels_row_,) – Labels of rows and columns, for bipartite graphs.
probs_col (probs_row_,) – Probability distributions over labels for rows and columns (for bipartite graphs).
Example
>>> from sknetwork.classification import PageRankClassifier >>> from sknetwork.data import karate_club >>> pagerank = PageRankClassifier() >>> graph = karate_club(metadata=True) >>> adjacency = graph.adjacency >>> labels_true = graph.labels >>> labels = {0: labels_true[0], 33: labels_true[33]} >>> labels_pred = pagerank.fit_predict(adjacency, labels) >>> np.round(np.mean(labels_pred == labels_true), 2) 0.97
References
Lin, F., & Cohen, W. W. (2010). Semi-supervised classification of network data using very few labels. In IEEE International Conference on Advances in Social Networks Analysis and Mining.
- fit(input_matrix: Union[scipy.sparse._csr.csr_matrix, numpy.ndarray], labels: Optional[Union[numpy.ndarray, dict]] = None, labels_row: Optional[Union[numpy.ndarray, dict]] = None, labels_col: Optional[Union[numpy.ndarray, dict]] = None) sknetwork.classification.base_rank.RankClassifier
Fit algorithm to data.
- Parameters
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
labels – Known labels (dictionary or array; negative values ignored).
labels_row – Known labels on rows and columns (for bipartite graphs).
labels_col – Known labels on rows and columns (for bipartite graphs).
- Returns
self
- Return type
RankClassifier
- fit_predict(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the labels. Same parameters as the
fit
method.- Returns
labels – Labels.
- Return type
np.ndarray
- fit_predict_proba(*args, **kwargs) numpy.ndarray
Fit algorithm to the data and return the probability distribution over labels. Same parameters as the
fit
method.- Returns
probs – Probability of each label.
- Return type
np.ndarray
- fit_transform(*args, **kwargs) scipy.sparse._csr.csr_matrix
Fit algorithm to the data and return the probability distribution over labels in sparse format. Same parameters as the
fit
method.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
- get_params()
Get parameters as dictionary.
- Returns
params – Parameters of the algorithm.
- Return type
dict
- predict(columns=False) numpy.ndarray
Return the labels predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
labels – Labels.
- Return type
np.ndarray
- predict_proba(columns=False) numpy.ndarray
Return the probability distribution over labels as predicted by the algorithm.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
np.ndarray
- set_params(params: dict) sknetwork.base.Algorithm
Set parameters of the algorithm.
- Parameters
params (dict) – Parameters of the algorithm.
- Returns
self
- Return type
Algorithm
- transform(columns=False) scipy.sparse._csr.csr_matrix
Return the probability distribution over labels in sparse format.
- Parameters
columns (bool) – If
True
, return the prediction for columns.- Returns
probs – Probability distribution over labels.
- Return type
sparse.csr_matrix
Metrics
- sknetwork.classification.get_accuracy_score(labels_true: numpy.ndarray, labels_pred: numpy.ndarray) float [source]
Return the proportion of correctly labeled samples. Negative labels ignored.
- Parameters
labels_true (np.ndarray) – True labels.
labels_pred (np.ndarray) – Predicted labels
- Returns
accuracy – A score between 0 and 1.
- Return type
float
Examples
>>> import numpy as np >>> labels_true = np.array([0, 0, 1, 1]) >>> labels_pred = np.array([0, 0, 0, 1]) >>> get_accuracy_score(labels_true, labels_pred) 0.75
- sknetwork.classification.get_f1_score(labels_true: numpy.ndarray, labels_pred: numpy.ndarray, return_precision_recall: bool = False) Union[float, Tuple[float, float, float]] [source]
Return the f1 score of binary classification. Negative labels ignored.
- Parameters
labels_true (np.ndarray) – True labels.
labels_pred (np.ndarray) – Predicted labels
return_precision_recall (bool) – If
True
, also return precision and recall.
- Returns
score, [precision, recall] – F1 score (between 0 and 1). Optionally, also return precision and recall.
- Return type
np.ndarray
Examples
>>> import numpy as np >>> labels_true = np.array([0, 0, 1, 1]) >>> labels_pred = np.array([0, 0, 0, 1]) >>> np.round(get_f1_score(labels_true, labels_pred), 2) 0.67
- sknetwork.classification.get_f1_scores(labels_true: numpy.ndarray, labels_pred: numpy.ndarray, return_precision_recall: bool = False) Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]] [source]
Return the f1 scores of multi-label classification (one per label). Negative labels ignored.
- Parameters
labels_true (np.ndarray) – True labels.
labels_pred (np.ndarray) – Predicted labels
return_precision_recall (bool) – If
True
, also return precisions and recalls.
- Returns
scores, [precisions, recalls] – F1 scores (between 0 and 1). Optionally, also return F1 precisions and recalls.
- Return type
np.ndarray
Examples
>>> import numpy as np >>> labels_true = np.array([0, 0, 1, 1]) >>> labels_pred = np.array([0, 0, 0, 1]) >>> np.round(get_f1_scores(labels_true, labels_pred), 2) array([0.8 , 0.67])
- sknetwork.classification.get_average_f1_score(labels_true: numpy.ndarray, labels_pred: numpy.ndarray, average: str = 'macro') float [source]
Return the average f1 score of multi-label classification. Negative labels ignored.
- Parameters
labels_true (np.ndarray) – True labels.
labels_pred (np.ndarray) – Predicted labels
average (str) – Averaging method. Can be either
'macro'
(default),'micro'
or'weighted'
.
- Returns
score – Average F1 score (between 0 and 1).
- Return type
float
Examples
>>> import numpy as np >>> labels_true = np.array([0, 0, 1, 1]) >>> labels_pred = np.array([0, 0, 0, 1]) >>> np.round(get_average_f1_score(labels_true, labels_pred), 2) 0.73
- sknetwork.classification.get_confusion_matrix(labels_true: numpy.ndarray, labels_pred: numpy.ndarray) scipy.sparse._csr.csr_matrix [source]
Return the confusion matrix in sparse format (true labels on rows, predicted labels on columns). Negative labels ignored.
- Parameters
labels_true (np.ndarray) – True labels.
labels_pred (np.ndarray) – Predicted labels
- Returns
confusion matrix – Confusion matrix.
- Return type
sparse.csr_matrix
Examples
>>> import numpy as np >>> labels_true = np.array([0, 0, 1, 1]) >>> labels_pred = np.array([0, 0, 0, 1]) >>> get_confusion_matrix(labels_true, labels_pred).toarray() array([[2, 0], [1, 1]])