Ranking

Node ranking algorithms.

The attribute scores_ assigns a score of importance to each node of the graph.

PageRank

class sknetwork.ranking.PageRank(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 1e-06)[source]

PageRank of each node, corresponding to its frequency of visit by a random walk.

The random walk restarts with some fixed probability. The restart distribution can be personalized by the user. This variant is known as Personalized PageRank.

Parameters:

damping_factor (float) – Probability to continue the random walk.
solver (str) –
- 'piteration', use power iteration for a given number of iterations.
- 'diteration', use asynchronous parallel diffusion for a given number of iterations.
- 'lanczos', use eigensolver with a given tolerance.
- 'bicgstab', use Biconjugate Gradient Stabilized method for a given tolerance.
- 'RH', use a Ruffini-Horner polynomial evaluation.
- 'push', use push-based algorithm for a given tolerance
n_iter (int) – Number of iterations for some solvers.
tol (float) – Tolerance for the convergence of some solvers.

Variables:

scores (np.ndarray) – PageRank score of each node.

Example

>>> from sknetwork.ranking import PageRank
>>> from sknetwork.data import house
>>> pagerank = PageRank()
>>> adjacency = house()
>>> weights = {0: 1}
>>> scores = pagerank.fit_predict(adjacency, weights)
>>> np.round(scores, 2)
array([0.29, 0.24, 0.12, 0.12, 0.24])

References

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.

Compute the pagerank of each node.

Parameters:

input_matrix (sparse.csr_matrix, np.ndarray) – Adjacency matrix or biadjacency matrix of the graph.
weights (np.ndarray, dict) – Weights of the restart distribution for Personalized PageRank. If None, the uniform distribution is used (no personalization, default).
weights_row (np.ndarray, dict) – Weights on rows of the restart distribution for Personalized PageRank. Used for bipartite graphs. If both weights_row and weights_col are None (default), the uniform distribution on rows is used.
weights_col (np.ndarray, dict) – Weights on columns of the restart distribution for Personalized PageRank. Used for bipartite graphs.
force_bipartite (bool) – If True, consider the input matrix as the biadjacency matrix of a bipartite graph.

Returns:

self

Return type:

PageRank

fit_predict(*args, **kwargs) → ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns:: scores – Scores.
Return type:: np.ndarray

get_params()

Get parameters as dictionary.

Returns:: params – Parameters of the algorithm.
Return type:: dict

predict(columns: bool = False) → ndarray

Return the scores predicted by the algorithm.

Parameters:: columns (bool) – If True, return the prediction for columns.
Returns:: scores – Scores.
Return type:: np.ndarray

set_params(params: dict) → Algorithm

Set parameters of the algorithm.

Parameters:: params (dict) – Parameters of the algorithm.
Returns:: self
Return type:: Algorithm

Katz

class sknetwork.ranking.Katz(damping_factor: float = 0.5, path_length: int = 4)[source]

Katz centrality, defined by:

\(\sum_{k=1}^K\alpha^k(A^k)^T\mathbf{1}\)

where \(A\) is the adjacency matrix, \(\alpha\) is the damping factor and \(K\) is the path length.

Parameters:

damping_factor (float) – Damping factor for path contributions.
path_length (int) – Maximum length of the paths.

Variables:

scores (np.ndarray) – Score of each node.

Examples

>>> from sknetwork.data.toy_graphs import house
>>> adjacency = house()
>>> katz = Katz()
>>> scores = katz.fit_predict(adjacency)
>>> np.round(scores, 2)
array([6.5 , 8.25, 5.62, 5.62, 8.25])

References

Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39-43.

fit(input_matrix: csr_matrix | ndarray | LinearOperator) → Katz[source]

Katz centrality.

Parameters:: input_matrix – Adjacency matrix or biadjacency matrix of the graph.
Returns:: self
Return type:: Katz

fit_predict(*args, **kwargs) → ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns:: scores – Scores.
Return type:: np.ndarray

get_params()

Get parameters as dictionary.

Returns:: params – Parameters of the algorithm.
Return type:: dict

predict(columns: bool = False) → ndarray

Return the scores predicted by the algorithm.

Parameters:: columns (bool) – If True, return the prediction for columns.
Returns:: scores – Scores.
Return type:: np.ndarray

set_params(params: dict) → Algorithm

Set parameters of the algorithm.

Parameters:: params (dict) – Parameters of the algorithm.
Returns:: self
Return type:: Algorithm

HITS

class sknetwork.ranking.HITS(solver: str | SVDSolver = 'lanczos')[source]

Hub and authority scores of each node. For bipartite graphs, the hub score is computed on rows and the authority score on columns.

Parameters:: solver ('lanczos' (default, Lanczos algorithm) or SVDSolver (custom solver)) – Which solver to use.
Variables:: scores (np.ndarray) – Hub score of each node.

Example

>>> from sknetwork.ranking import HITS
>>> from sknetwork.data import star_wars
>>> hits = HITS()
>>> biadjacency = star_wars()
>>> scores = hits.fit_predict(biadjacency)
>>> np.round(scores, 2)
array([0.5 , 0.23, 0.69, 0.46])

References

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604-632.

fit(adjacency: csr_matrix | ndarray) → HITS[source]

Compute HITS algorithm with a spectral method.

Parameters:: adjacency – Adjacency or biadjacency matrix of the graph.
Returns:: self
Return type:: HITS

fit_predict(*args, **kwargs) → ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns:: scores – Scores.
Return type:: np.ndarray

get_params()

Get parameters as dictionary.

Returns:: params – Parameters of the algorithm.
Return type:: dict

predict(columns: bool = False) → ndarray

Return the scores predicted by the algorithm.

Parameters:: columns (bool) – If True, return the prediction for columns.
Returns:: scores – Scores.
Return type:: np.ndarray

set_params(params: dict) → Algorithm

Set parameters of the algorithm.

Parameters:: params (dict) – Parameters of the algorithm.
Returns:: self
Return type:: Algorithm

Betweenness centrality

class sknetwork.ranking.Betweenness(normalized: bool = False)

Betweenness centrality, based on Brandes’ algorithm.

Variables:: scores (np.ndarray) – Betweenness centrality value of each node

Example

>>> from sknetwork.ranking import Betweenness
>>> from sknetwork.data.toy_graphs import bow_tie
>>> betweenness = Betweenness()
>>> adjacency = bow_tie()
>>> scores = betweenness.fit_transform(adjacency)
>>> scores
array([4., 0., 0., 0., 0.])

References

Brandes, Ulrik (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology.

fit(adjacency: csr_matrix | ndarray) → Betweenness: Fit algorithm to data.

fit_predict(*args, **kwargs) → ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns:: scores – Scores.
Return type:: np.ndarray

get_params()

Get parameters as dictionary.

Returns:: params – Parameters of the algorithm.
Return type:: dict

predict(columns: bool = False) → ndarray

Return the scores predicted by the algorithm.

Parameters:: columns (bool) – If True, return the prediction for columns.
Returns:: scores – Scores.
Return type:: np.ndarray

set_params(params: dict) → Algorithm

Set parameters of the algorithm.

Parameters:: params (dict) – Parameters of the algorithm.
Returns:: self
Return type:: Algorithm

Closeness centrality

class sknetwork.ranking.Closeness(method: str = 'exact', tol: float = 0.1)[source]

Ranking by closeness centrality of each node in a connected graph, corresponding to the average length of the shortest paths from that node to all the other ones.

Parameters:

method – Denotes if the results should be exact or approximate.
tol (float) – If method=='approximate', the allowed tolerance on each score entry.

Variables:

scores (np.ndarray) – Closeness centrality of each node.

Example

>>> from sknetwork.ranking import Closeness
>>> from sknetwork.data import cyclic_digraph
>>> closeness = Closeness()
>>> adjacency = cyclic_digraph(3)
>>> scores = closeness.fit_predict(adjacency)
>>> np.round(scores, 2)
array([0.67, 0.67, 0.67])

References

Eppstein, D., & Wang, J. (2001, January). Fast approximation of centrality. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms (pp. 228-229). Society for Industrial and Applied Mathematics.

fit(adjacency: csr_matrix | ndarray) → Closeness[source]

Closeness centrality for connected graphs.

Parameters:: adjacency – Adjacency matrix of the graph.
Returns:: self
Return type:: Closeness

fit_predict(*args, **kwargs) → ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns:: scores – Scores.
Return type:: np.ndarray

get_params()

Get parameters as dictionary.

Returns:: params – Parameters of the algorithm.
Return type:: dict

predict(columns: bool = False) → ndarray

Return the scores predicted by the algorithm.

Parameters:: columns (bool) – If True, return the prediction for columns.
Returns:: scores – Scores.
Return type:: np.ndarray

set_params(params: dict) → Algorithm

Set parameters of the algorithm.

Parameters:: params (dict) – Parameters of the algorithm.
Returns:: self
Return type:: Algorithm

Post-processing

sknetwork.ranking.top_k(scores: ndarray, k: int = 1, sort: bool = True)[source]

Return the indices of the k elements of highest values.

Parameters:

scores (np.ndarray) – Array of values.
k (int) – Number of elements to return.
sort (bool) – If True, sort the indices in decreasing order of value (element of highest value first).

Examples

>>> top_k([1, 3, 2], k=2)
array([1, 2])