Ranking

Node ranking algorithms.

The attribute scores_ assigns a score of importance to each node of the graph.

PageRank

class sknetwork.ranking.PageRank(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 1e-06)[source]

PageRank of each node, corresponding to its frequency of visit by a random walk.

The random walk restarts with some fixed probability. The restart distribution can be personalized by the user. This variant is known as Personalized PageRank.

Parameters
  • damping_factor (float) – Probability to continue the random walk.

  • solver (str) –

    • 'piteration', use power iteration for a given number of iterations.

    • 'diteration', use asynchronous parallel diffusion for a given number of iterations.

    • 'lanczos', use eigensolver with a given tolerance.

    • 'bicgstab', use Biconjugate Gradient Stabilized method for a given tolerance.

    • 'RH', use a Ruffini-Horner polynomial evaluation.

    • 'push', use push-based algorithm for a given tolerance

  • n_iter (int) – Number of iterations for some solvers.

  • tol (float) – Tolerance for the convergence of some solvers.

Variables
  • scores_ (np.ndarray) – PageRank score of each node.

  • scores_row_ (np.ndarray) – Scores of rows, for bipartite graphs.

  • scores_col_ (np.ndarray) – Scores of columns, for bipartite graphs.

Example

>>> from sknetwork.ranking import PageRank
>>> from sknetwork.data import house
>>> pagerank = PageRank()
>>> adjacency = house()
>>> seeds = {0: 1}
>>> scores = pagerank.fit_transform(adjacency, seeds)
>>> np.round(scores, 2)
array([0.29, 0.24, 0.12, 0.12, 0.24])

References

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.

fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray, scipy.sparse.linalg.interface.LinearOperator], seeds: Optional[Union[numpy.ndarray, dict]] = None, seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None, force_bipartite: bool = False) sknetwork.ranking.pagerank.PageRank[source]

Fit algorithm to data.

Parameters
  • input_matrix – Adjacency matrix or biadjacency matrix of the graph.

  • seeds – Parameter to be used for Personalized PageRank. Restart distribution as a vector or a dict (node: weight). If None, the uniform distribution is used (no personalization, default).

  • seeds_row – Parameter to be used for Personalized PageRank on bipartite graphs. Restart distribution as vectors or dicts on rows, columns (node: weight). If both seeds_row and seeds_col are None (default), the uniform distribution on rows is used.

  • seeds_col – Parameter to be used for Personalized PageRank on bipartite graphs. Restart distribution as vectors or dicts on rows, columns (node: weight). If both seeds_row and seeds_col are None (default), the uniform distribution on rows is used.

  • force_bipartite – If True, consider the input matrix as the biadjacency matrix of a bipartite graph.

Returns

self

Return type

PageRank

fit_transform(*args, **kwargs) numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Katz

class sknetwork.ranking.Katz(damping_factor: float = 0.5, path_length: int = 4)[source]

Katz centrality, defined by:

\(\sum_{k=1}^K\alpha^k(A^k)^T\mathbf{1}\).

Parameters
  • damping_factor (float) – Decay parameter for path contributions.

  • path_length (int) – Maximum length of the paths to take into account.

Variables
  • scores_ (np.ndarray) – Score of each node.

  • scores_row_ (np.ndarray) – Scores of rows, for bipartite graphs.

  • scores_col_ (np.ndarray) – Scores of columns, for bipartite graphs.

Examples

>>> from sknetwork.data.toy_graphs import house
>>> adjacency = house()
>>> katz = Katz()
>>> scores = katz.fit_transform(adjacency)
>>> np.round(scores, 2)
array([6.5 , 8.25, 5.62, 5.62, 8.25])

References

Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39-43.

fit(input_matrix: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray, scipy.sparse.linalg.interface.LinearOperator]) sknetwork.ranking.katz.Katz[source]

Katz centrality.

Parameters

input_matrix – Adjacency matrix or biadjacency matrix of the graph.

Returns

self

Return type

Katz

fit_transform(*args, **kwargs) numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

HITS

class sknetwork.ranking.HITS(solver: Union[str, sknetwork.linalg.svd_solver.SVDSolver] = 'lanczos')[source]

Hub and authority scores of each node. For bipartite graphs, the hub score is computed on rows and the authority score on columns.

Parameters

solver ('lanczos' (default, Lanczos algorithm) or SVDSolver (custom solver)) – Which solver to use.

Variables
  • scores_ (np.ndarray) – Hub score of each node.

  • scores_row_ (np.ndarray) – Hub score of each row, for bipartite graphs.

  • scores_col_ (np.ndarray) – Authority score of each column, for bipartite graphs.

Example

>>> from sknetwork.ranking import HITS
>>> from sknetwork.data import star_wars
>>> hits = HITS()
>>> biadjacency = star_wars()
>>> scores = hits.fit_transform(biadjacency)
>>> np.round(scores, 2)
array([0.5 , 0.23, 0.69, 0.46])

References

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604-632.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) sknetwork.ranking.hits.HITS[source]

Compute HITS algorithm with a spectral method.

Parameters

adjacency – Adjacency or biadjacency matrix of the graph.

Returns

self

Return type

HITS

fit_transform(*args, **kwargs) numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Betweenness centrality

class sknetwork.ranking.Betweenness(normalized: bool = False)

Betweenness centrality, based on Brandes’ algorithm.

Variables

scores_ (np.ndarray) – Betweenness centrality value of each node

Example

>>> from sknetwork.ranking import Betweenness
>>> from sknetwork.data.toy_graphs import bow_tie
>>> betweenness = Betweenness()
>>> adjacency = bow_tie()
>>> scores = betweenness.fit_transform(adjacency)
>>> scores
array([4., 0., 0., 0., 0.])

References

Brandes, Ulrik (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) sknetwork.ranking.betweenness.Betweenness

Fit Algorithm to the data.

fit_transform(*args, **kwargs) numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Closeness centrality

class sknetwork.ranking.Closeness(method: str = 'exact', tol: float = 0.1, n_jobs: Optional[int] = None)[source]

Closeness centrality of each node in a connected graph, corresponding to the average length of the shortest paths from that node to all the other ones.

For a directed graph, the closeness centrality is computed in terms of outgoing paths.

Parameters
  • method – Denotes if the results should be exact or approximate.

  • tol (float) – If method=='approximate', the allowed tolerance on each score entry.

  • n_jobs – If an integer value is given, denotes the number of workers to use (-1 means the maximum number will be used). If None, no parallel computations are made.

Variables

scores_ (np.ndarray) – Closeness centrality of each node.

Example

>>> from sknetwork.ranking import Closeness
>>> from sknetwork.data import cyclic_digraph
>>> closeness = Closeness()
>>> adjacency = cyclic_digraph(3)
>>> scores = closeness.fit_transform(adjacency)
>>> np.round(scores, 2)
array([0.67, 0.67, 0.67])

References

Eppstein, D., & Wang, J. (2001, January). Fast approximation of centrality. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms (pp. 228-229). Society for Industrial and Applied Mathematics.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) sknetwork.ranking.closeness.Closeness[source]

Closeness centrality for connected graphs.

Parameters

adjacency – Adjacency matrix of the graph.

Returns

self

Return type

Closeness

fit_transform(*args, **kwargs) numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Harmonic centrality

class sknetwork.ranking.Harmonic(n_jobs: Optional[int] = None)[source]

Harmonic centrality of each node in a connected graph, corresponding to the average inverse length of the shortest paths from that node to all the other ones.

For a directed graph, the harmonic centrality is computed in terms of outgoing paths.

Parameters

n_jobs – If an integer value is given, denotes the number of workers to use (-1 means the maximum number will be used). If None, no parallel computations are made.

Variables

scores_ (np.ndarray) – Score of each node.

Example

>>> from sknetwork.ranking import Harmonic
>>> from sknetwork.data import house
>>> harmonic = Harmonic()
>>> adjacency = house()
>>> scores = harmonic.fit_transform(adjacency)
>>> np.round(scores, 2)
array([3. , 3.5, 3. , 3. , 3.5])

References

Marchiori, M., & Latora, V. (2000). Harmony in the small-world. Physica A: Statistical Mechanics and its Applications, 285(3-4), 539-546.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) sknetwork.ranking.harmonic.Harmonic[source]

Harmonic centrality for connected graphs.

Parameters

adjacency – Adjacency matrix of the graph.

Returns

self

Return type

Harmonic

fit_transform(*args, **kwargs) numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Post-processing

sknetwork.ranking.top_k(scores: numpy.ndarray, k: int = 1)[source]

Index of the k elements with highest value.

Parameters
  • scores (np.ndarray) – Array of values.

  • k (int) – Number of elements to return.

Examples

>>> scores = np.array([0, 1, 0, 0.5])
>>> top_k(scores, k=2)
array([1, 3])

Notes

This is a basic implementation that sorts the entire array to find its top k elements.