Ranking
Node ranking algorithms.
The attribute scores_
assigns a score of importance to each node of the graph.
PageRank
- class sknetwork.ranking.PageRank(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 1e-06)[source]
PageRank of each node, corresponding to its frequency of visit by a random walk.
The random walk restarts with some fixed probability. The restart distribution can be personalized by the user. This variant is known as Personalized PageRank.
- Parameters:
damping_factor (float) – Probability to continue the random walk.
solver (str) –
'piteration'
, use power iteration for a given number of iterations.'diteration'
, use asynchronous parallel diffusion for a given number of iterations.'lanczos'
, use eigensolver with a given tolerance.'bicgstab'
, use Biconjugate Gradient Stabilized method for a given tolerance.'RH'
, use a Ruffini-Horner polynomial evaluation.'push'
, use push-based algorithm for a given tolerance
n_iter (int) – Number of iterations for some solvers.
tol (float) – Tolerance for the convergence of some solvers.
- Variables:
scores (np.ndarray) – PageRank score of each node.
scores_row (np.ndarray) – Scores of rows, for bipartite graphs.
scores_col (np.ndarray) – Scores of columns, for bipartite graphs.
Example
>>> from sknetwork.ranking import PageRank >>> from sknetwork.data import house >>> pagerank = PageRank() >>> adjacency = house() >>> weights = {0: 1} >>> scores = pagerank.fit_predict(adjacency, weights) >>> np.round(scores, 2) array([0.29, 0.24, 0.12, 0.12, 0.24])
References
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
- fit(input_matrix: csr_matrix | ndarray, weights: dict | ndarray | None = None, weights_row: dict | ndarray | None = None, weights_col: dict | ndarray | None = None, force_bipartite: bool = False) PageRank [source]
Compute the pagerank of each node.
- Parameters:
input_matrix (sparse.csr_matrix, np.ndarray) – Adjacency matrix or biadjacency matrix of the graph.
weights (np.ndarray, dict) – Weights of the restart distribution for Personalized PageRank. If
None
, the uniform distribution is used (no personalization, default).weights_row (np.ndarray, dict) – Weights on rows of the restart distribution for Personalized PageRank. Used for bipartite graphs. If both weights_row and weights_col are
None
(default), the uniform distribution on rows is used.weights_col (np.ndarray, dict) – Weights on columns of the restart distribution for Personalized PageRank. Used for bipartite graphs.
force_bipartite (bool) – If
True
, consider the input matrix as the biadjacency matrix of a bipartite graph.
- Returns:
self
- Return type:
- fit_predict(*args, **kwargs) ndarray
Fit algorithm to data and return the scores. Same parameters as the
fit
method.- Returns:
scores – Scores.
- Return type:
np.ndarray
- get_params()
Get parameters as dictionary.
- Returns:
params – Parameters of the algorithm.
- Return type:
dict
- predict(columns: bool = False) ndarray
Return the scores predicted by the algorithm.
- Parameters:
columns (bool) – If
True
, return the prediction for columns.- Returns:
scores – Scores.
- Return type:
np.ndarray
- set_params(params: dict) Algorithm
Set parameters of the algorithm.
- Parameters:
params (dict) – Parameters of the algorithm.
- Returns:
self
- Return type:
Algorithm
Katz
- class sknetwork.ranking.Katz(damping_factor: float = 0.5, path_length: int = 4)[source]
Katz centrality, defined by:
\(\sum_{k=1}^K\alpha^k(A^k)^T\mathbf{1}\)
where \(A\) is the adjacency matrix, \(\alpha\) is the damping factor and \(K\) is the path length.
- Parameters:
damping_factor (float) – Damping factor for path contributions.
path_length (int) – Maximum length of the paths.
- Variables:
scores (np.ndarray) – Score of each node.
scores_row (np.ndarray) – Scores of rows, for bipartite graphs.
scores_col (np.ndarray) – Scores of columns, for bipartite graphs.
Examples
>>> from sknetwork.data.toy_graphs import house >>> adjacency = house() >>> katz = Katz() >>> scores = katz.fit_predict(adjacency) >>> np.round(scores, 2) array([6.5 , 8.25, 5.62, 5.62, 8.25])
References
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39-43.
- fit(input_matrix: csr_matrix | ndarray | LinearOperator) Katz [source]
Katz centrality.
- Parameters:
input_matrix – Adjacency matrix or biadjacency matrix of the graph.
- Returns:
self
- Return type:
- fit_predict(*args, **kwargs) ndarray
Fit algorithm to data and return the scores. Same parameters as the
fit
method.- Returns:
scores – Scores.
- Return type:
np.ndarray
- get_params()
Get parameters as dictionary.
- Returns:
params – Parameters of the algorithm.
- Return type:
dict
- predict(columns: bool = False) ndarray
Return the scores predicted by the algorithm.
- Parameters:
columns (bool) – If
True
, return the prediction for columns.- Returns:
scores – Scores.
- Return type:
np.ndarray
- set_params(params: dict) Algorithm
Set parameters of the algorithm.
- Parameters:
params (dict) – Parameters of the algorithm.
- Returns:
self
- Return type:
Algorithm
HITS
- class sknetwork.ranking.HITS(solver: str | SVDSolver = 'lanczos')[source]
Hub and authority scores of each node. For bipartite graphs, the hub score is computed on rows and the authority score on columns.
- Parameters:
solver (
'lanczos'
(default, Lanczos algorithm) orSVDSolver
(custom solver)) – Which solver to use.- Variables:
scores (np.ndarray) – Hub score of each node.
scores_row (np.ndarray) – Hub score of each row, for bipartite graphs.
scores_col (np.ndarray) – Authority score of each column, for bipartite graphs.
Example
>>> from sknetwork.ranking import HITS >>> from sknetwork.data import star_wars >>> hits = HITS() >>> biadjacency = star_wars() >>> scores = hits.fit_predict(biadjacency) >>> np.round(scores, 2) array([0.5 , 0.23, 0.69, 0.46])
References
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604-632.
- fit(adjacency: csr_matrix | ndarray) HITS [source]
Compute HITS algorithm with a spectral method.
- Parameters:
adjacency – Adjacency or biadjacency matrix of the graph.
- Returns:
self
- Return type:
- fit_predict(*args, **kwargs) ndarray
Fit algorithm to data and return the scores. Same parameters as the
fit
method.- Returns:
scores – Scores.
- Return type:
np.ndarray
- get_params()
Get parameters as dictionary.
- Returns:
params – Parameters of the algorithm.
- Return type:
dict
- predict(columns: bool = False) ndarray
Return the scores predicted by the algorithm.
- Parameters:
columns (bool) – If
True
, return the prediction for columns.- Returns:
scores – Scores.
- Return type:
np.ndarray
- set_params(params: dict) Algorithm
Set parameters of the algorithm.
- Parameters:
params (dict) – Parameters of the algorithm.
- Returns:
self
- Return type:
Algorithm
Betweenness centrality
- class sknetwork.ranking.Betweenness(normalized: bool = False)
Betweenness centrality, based on Brandes’ algorithm.
- Variables:
scores (np.ndarray) – Betweenness centrality value of each node
Example
>>> from sknetwork.ranking import Betweenness >>> from sknetwork.data.toy_graphs import bow_tie >>> betweenness = Betweenness() >>> adjacency = bow_tie() >>> scores = betweenness.fit_transform(adjacency) >>> scores array([4., 0., 0., 0., 0.])
References
Brandes, Ulrik (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology.
- fit(adjacency: csr_matrix | ndarray) Betweenness
Fit algorithm to data.
- fit_predict(*args, **kwargs) ndarray
Fit algorithm to data and return the scores. Same parameters as the
fit
method.- Returns:
scores – Scores.
- Return type:
np.ndarray
- get_params()
Get parameters as dictionary.
- Returns:
params – Parameters of the algorithm.
- Return type:
dict
- predict(columns: bool = False) ndarray
Return the scores predicted by the algorithm.
- Parameters:
columns (bool) – If
True
, return the prediction for columns.- Returns:
scores – Scores.
- Return type:
np.ndarray
- set_params(params: dict) Algorithm
Set parameters of the algorithm.
- Parameters:
params (dict) – Parameters of the algorithm.
- Returns:
self
- Return type:
Algorithm
Closeness centrality
- class sknetwork.ranking.Closeness(method: str = 'exact', tol: float = 0.1)[source]
Ranking by closeness centrality of each node in a connected graph, corresponding to the average length of the shortest paths from that node to all the other ones.
- Parameters:
method – Denotes if the results should be exact or approximate.
tol (float) – If
method=='approximate'
, the allowed tolerance on each score entry.
- Variables:
scores (np.ndarray) – Closeness centrality of each node.
Example
>>> from sknetwork.ranking import Closeness >>> from sknetwork.data import cyclic_digraph >>> closeness = Closeness() >>> adjacency = cyclic_digraph(3) >>> scores = closeness.fit_predict(adjacency) >>> np.round(scores, 2) array([0.67, 0.67, 0.67])
References
Eppstein, D., & Wang, J. (2001, January). Fast approximation of centrality. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms (pp. 228-229). Society for Industrial and Applied Mathematics.
- fit(adjacency: csr_matrix | ndarray) Closeness [source]
Closeness centrality for connected graphs.
- Parameters:
adjacency – Adjacency matrix of the graph.
- Returns:
self
- Return type:
- fit_predict(*args, **kwargs) ndarray
Fit algorithm to data and return the scores. Same parameters as the
fit
method.- Returns:
scores – Scores.
- Return type:
np.ndarray
- get_params()
Get parameters as dictionary.
- Returns:
params – Parameters of the algorithm.
- Return type:
dict
- predict(columns: bool = False) ndarray
Return the scores predicted by the algorithm.
- Parameters:
columns (bool) – If
True
, return the prediction for columns.- Returns:
scores – Scores.
- Return type:
np.ndarray
- set_params(params: dict) Algorithm
Set parameters of the algorithm.
- Parameters:
params (dict) – Parameters of the algorithm.
- Returns:
self
- Return type:
Algorithm
Post-processing
- sknetwork.ranking.top_k(scores: ndarray, k: int = 1, sort: bool = True)[source]
Return the indices of the k elements of highest values.
- Parameters:
scores (np.ndarray) – Array of values.
k (int) – Number of elements to return.
sort (bool) – If
True
, sort the indices in decreasing order of value (element of highest value first).
Examples
>>> top_k([1, 3, 2], k=2) array([1, 2])