Ranking

Node ranking algorithms.

The attribute scores_ assigns a score of importance to each node of the graph.

PageRank

class sknetwork.ranking.PageRank(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 1e-06)[source]

PageRank of each node, corresponding to its frequency of visit by a random walk.

The random walk restarts with some fixed probability. The restart distribution can be personalized by the user. This variant is known as Personalized PageRank.

  • Graphs

  • Digraphs

Parameters
  • damping_factor (float) – Probability to continue the random walk.

  • solver (str) –

    • 'piteration', use power iteration for a given number of iterations.

    • 'diteration', use asynchronous parallel diffusion for a given number of iterations.

    • 'lanczos', use eigensolver with a given tolerance.

    • 'bicgstab', use Biconjugate Gradient Stabilized method for a given tolerance.

    • 'RH', use a Ruffini-Horner polynomial evaluation.

  • n_iter (int) – Number of iterations for some solvers.

  • tol (float) – Tolerance for the convergence of some solvers.

Variables

scores_ (np.ndarray) – PageRank score of each node.

Example

>>> from sknetwork.ranking import PageRank
>>> from sknetwork.data import house
>>> pagerank = PageRank()
>>> adjacency = house()
>>> seeds = {0: 1}
>>> scores = pagerank.fit_transform(adjacency, seeds)
>>> np.round(scores, 2)
array([0.29, 0.24, 0.12, 0.12, 0.24])

References

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray, scipy.sparse.linalg.interface.LinearOperator], seeds: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.ranking.pagerank.PageRank[source]

Fit algorithm to data.

Parameters
  • adjacency – Adjacency matrix.

  • seeds – Parameter to be used for Personalized PageRank. Restart distribution as a vector or a dict (node: weight). If None, the uniform distribution is used (no personalization, default).

Returns

self

Return type

PageRank

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

class sknetwork.ranking.BiPageRank(damping_factor: float = 0.85, solver: str = 'piteration', n_iter: int = 10, tol: float = 0)[source]

Compute the PageRank of each node through a random walk in the bipartite graph.

  • Bigraphs

Parameters
  • damping_factor (float) – Probability to continue the random walk.

  • solver (str) –

    • piteration, use power iteration for a given number of iterations.

    • diteration, use asynchronous parallel diffusion for a given number of iterations.

    • lanczos, use eigensolver for a given tolerance.

    • bicgstab, use Biconjugate Gradient Stabilized method for a given tolerance.

  • n_iter (int) – Number of iterations for some solvers.

  • tol (float) – Tolerance for the convergence of some solvers.

Variables
  • scores_ (np.ndarray) – PageRank score of each row.

  • scores_row_ (np.ndarray) – PageRank score of each row (copy of scores_).

  • scores_col_ (np.ndarray) – PageRank score of each column.

Example

>>> from sknetwork.ranking import BiPageRank
>>> from sknetwork.data import star_wars
>>> bipagerank = BiPageRank()
>>> biadjacency = star_wars()
>>> seeds = {0: 1}
>>> scores = bipagerank.fit_transform(biadjacency, seeds)
>>> np.round(scores, 2)
array([0.45, 0.11, 0.28, 0.17])
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None) → sknetwork.ranking.pagerank.BiPageRank[source]

Fit algorithm to data.

Parameters
  • biadjacency – Biadjacency matrix.

  • seeds_row – Parameter to be used for Personalized BiPageRank. Restart distribution as vectors or dicts on rows, columns (node: weight). If both seeds_row and seeds_col are None (default), the uniform distribution on rows is used.

  • seeds_col – Parameter to be used for Personalized BiPageRank. Restart distribution as vectors or dicts on rows, columns (node: weight). If both seeds_row and seeds_col are None (default), the uniform distribution on rows is used.

Returns

self

Return type

BiPageRank

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Diffusion

class sknetwork.ranking.Diffusion(n_iter: int = 3, damping_factor: Optional[float] = None)[source]

Ranking by diffusion along the edges (heat equation).

  • Graphs

  • Digraphs

Parameters
  • n_iter (int) – Number of steps of the diffusion in discrete time (must be positive).

  • damping_factor (float (optional)) – Damping factor (default value = 1).

Variables

scores_ (np.ndarray) – Score of each node (= temperature).

Example

>>> from sknetwork.data import house
>>> diffusion = Diffusion(n_iter=2)
>>> adjacency = house()
>>> seeds = {0: 1, 2: 0}
>>> scores = diffusion.fit_transform(adjacency, seeds)
>>> np.round(scores, 2)
array([0.58, 0.56, 0.38, 0.58, 0.42])

References

Chung, F. (2007). The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, init: Optional[float] = None) → sknetwork.ranking.diffusion.Diffusion[source]

Compute the diffusion (temperatures at equilibrium).

Parameters
  • adjacency – Adjacency matrix of the graph.

  • seeds – Temperatures of seed nodes in initial state (dictionary or vector). Negative temperatures ignored.

  • init – Temperature of non-seed nodes in initial state. If None, use the average temperature of seed nodes (default).

Returns

self

Return type

Diffusion

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

class sknetwork.ranking.BiDiffusion(n_iter: int = 3, damping_factor: Optional[float] = None)[source]

Ranking by diffusion along the edges of a bipartite graph (heat equation).

  • Bigraphs

Parameters
  • n_iter (int) – Number of steps of the diffusion in discrete time (must be positive).

  • damping_factor (float (optional)) – Damping factor (default value = 1).

Variables
  • scores_ (np.ndarray) – Scores of rows.

  • scores_row_ (np.ndarray) – Scores of rows (copy of scores_).

  • scores_col_ (np.ndarray) – Scores of columns.

Example

>>> from sknetwork.ranking import BiDiffusion
>>> from sknetwork.data import star_wars
>>> bidiffusion = BiDiffusion(n_iter=2)
>>> biadjacency = star_wars()
>>> scores = bidiffusion.fit_transform(biadjacency, seeds_row = {0: 1, 2: 0})
>>> np.round(scores, 2)
array([0.5 , 0.5 , 0.46, 0.44])
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None, init: Optional[float] = None) → sknetwork.ranking.diffusion.BiDiffusion[source]

Compute the diffusion (temperatures at equilibrium).

Parameters
  • biadjacency – Biadjacency matrix, shape (n_row, n_col).

  • seeds_row – Temperatures of seed rows in initial state (dictionary or vector of size n_row). Negative temperatures ignored.

  • seeds_col – Temperatures of seed columns in initial state (dictionary or vector of size n_col). Negative temperatures ignored.

  • init – Temperature of non-seed nodes in initial state. If None, use the average temperature of seed nodes (default).

Returns

self

Return type

BiDiffusion

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Dirichlet

class sknetwork.ranking.Dirichlet(n_iter: int = 10, damping_factor: Optional[float] = None, verbose: bool = False)[source]

Ranking by the Dirichlet problem (heat diffusion with boundary constraints).

  • Graphs

  • Digraphs

Parameters
  • n_iter (int) – If positive, number of steps of the diffusion in discrete time. Otherwise, solve the Dirichlet problem by the bi-conjugate gradient stabilized method.

  • damping_factor (float (optional)) – Damping factor (default value = 1).

  • verbose (bool) – Verbose mode.

Variables

scores_ (np.ndarray) – Score of each node (= temperature).

Example

>>> from sknetwork.ranking import Dirichlet
>>> from sknetwork.data import house
>>> dirichlet = Dirichlet()
>>> adjacency = house()
>>> seeds = {0: 1, 2: 0}
>>> scores = dirichlet.fit_transform(adjacency, seeds)
>>> np.round(scores, 2)
array([1.  , 0.54, 0.  , 0.31, 0.62])

References

Chung, F. (2007). The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds: Optional[Union[numpy.ndarray, dict]] = None, init: Optional[float] = None) → sknetwork.ranking.diffusion.Dirichlet[source]

Compute the solution to the Dirichlet problem (temperatures at equilibrium).

Parameters
  • adjacency – Adjacency matrix of the graph.

  • seeds – Temperatures of seed nodes (dictionary or vector). Negative temperatures ignored.

  • init – Temperature of non-seed nodes in initial state. If None, use the average temperature of seed nodes (default).

Returns

self

Return type

Dirichlet

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

class sknetwork.ranking.BiDirichlet(n_iter: int = 10, damping_factor: Optional[float] = None, verbose: bool = False)[source]

Ranking by the Dirichlet problem in bipartite graphs (heat diffusion with boundary constraints).

  • Bigraphs

Parameters
  • n_iter (int) – If positive, number of steps of the diffusion in discrete time. Otherwise, solve the Dirichlet problem by the bi-conjugate gradient stabilized method.

  • damping_factor (float (optional)) – Damping factor (default value = 1).

  • verbose (bool) – Verbose mode.

Variables
  • scores_ (np.ndarray) – Scores of rows.

  • scores_row_ (np.ndarray) – Scores of rows (copy of scores_).

  • scores_col_ (np.ndarray) – Scores of columns.

Example

>>> from sknetwork.ranking import BiDirichlet
>>> from sknetwork.data import star_wars
>>> bidirichlet = BiDirichlet()
>>> biadjacency = star_wars()
>>> scores = bidirichlet.fit_transform(biadjacency, seeds_row = {0: 1, 2: 0})
>>> np.round(scores, 2)
array([1.  , 0.5 , 0.  , 0.29])
fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], seeds_row: Optional[Union[numpy.ndarray, dict]] = None, seeds_col: Optional[Union[numpy.ndarray, dict]] = None, init: Optional[float] = None) → sknetwork.ranking.diffusion.BiDirichlet[source]

Compute the solution to the Dirichlet problem (temperatures at equilibrium).

Parameters
  • biadjacency – Biadjacency matrix, shape (n_row, n_col).

  • seeds_row – Temperatures of seed rows (dictionary or vector of size n_row). Negative temperatures ignored.

  • seeds_col – Temperatures of seed columns (dictionary or vector of size n_col). Negative temperatures ignored.

  • init – Temperature of non-seed nodes in initial state. If None, use the average temperature of seed nodes (default).

Returns

self

Return type

BiDirichlet

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Katz

class sknetwork.ranking.Katz(damping_factor: float = 0.5, path_length: int = 4)[source]

Katz centrality:

\(\sum_{k=1}^K\alpha^k(A^k)^T\mathbf{1}\).

  • Graphs

  • Digraphs

Parameters
  • damping_factor (float) – Decay parameter for path contributions.

  • path_length (int) – Maximum length of the paths to take into account.

Examples

>>> from sknetwork.data.toy_graphs import house
>>> adjacency = house()
>>> katz = Katz()
>>> scores = katz.fit_transform(adjacency)
>>> np.round(scores, 2)
array([6.5 , 8.25, 5.62, 5.62, 8.25])

References

Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39-43.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray, scipy.sparse.linalg.interface.LinearOperator]) → sknetwork.ranking.katz.Katz[source]

Katz centrality.

Parameters

adjacency – Adjacency matrix of the graph.

Returns

self

Return type

Katz

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

class sknetwork.ranking.BiKatz(damping_factor: float = 0.5, path_length: int = 4)[source]

Katz centrality for bipartite graphs.

  • Bigraphs

Parameters
  • damping_factor (float) – Decay parameter for path contributions.

  • path_length (int) – Maximum length of the paths to take into account.

Examples

>>> from sknetwork.data.toy_graphs import star_wars
>>> biadjacency = star_wars()
>>> bikatz = BiKatz()
>>> scores = bikatz.fit_transform(biadjacency)
>>> np.round(scores, 2)
array([6.38, 3.06, 8.81, 5.75])

References

Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39-43.

fit(biadjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) → sknetwork.ranking.katz.BiKatz[source]

Katz centrality.

Parameters

biadjacency – Biadjacency matrix of the graph.

Returns

self

Return type

BiKatz

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

HITS

class sknetwork.ranking.HITS(solver: Union[str, sknetwork.linalg.svd_solver.SVDSolver] = 'auto', **kwargs)[source]

Hub and authority scores of each node. For bipartite graphs, the hub score is computed on rows and the authority score on columns.

  • Graphs

  • Digraphs

  • Bigraphs

Parameters
  • solver ('auto', 'halko', 'lanczos' or SVDSolver) –

    Which singular value solver to use.

    • 'auto' call the auto_solver function.

    • 'halko': randomized method, fast but less accurate than 'lanczos' for ill-conditioned matrices.

    • 'lanczos': power-iteration based method.

    • SVDSolver: custom solver.

  • **kwargs – See sknetwork.linalg.svd_solver.LanczosSVD or sknetwork.linalg.svd_solver.HalkoSVD.

Variables
  • scores_ (np.ndarray) – Hub score of each row.

  • scores_row_ (np.ndarray) – Hub score of each row (copy of scores_row_).

  • scores_col_ (np.ndarray) – Authority score of each column.

Example

>>> from sknetwork.ranking import HITS
>>> from sknetwork.data import star_wars
>>> hits = HITS()
>>> biadjacency = star_wars()
>>> scores = hits.fit_transform(biadjacency)
>>> np.round(scores, 2)
array([0.5 , 0.23, 0.69, 0.46])

References

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604-632.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) → sknetwork.ranking.hits.HITS[source]

Compute HITS algorithm with a spectral method.

Parameters

adjacency – Adjacency or biadjacency matrix of the graph.

Returns

self

Return type

HITS

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Betweenness centrality

class sknetwork.ranking.Betweenness(normalized: bool = False)

Betweenness centrality, based on Brandes’ algorithm.

  • Graphs

Variables

scores_ (np.ndarray) – Betweenness centrality value of each node

Example

>>> from sknetwork.ranking import Betweenness
>>> from sknetwork.data.toy_graphs import bow_tie
>>> betweenness = Betweenness()
>>> adjacency = bow_tie()
>>> scores = betweenness.fit_transform(adjacency)
>>> scores
array([4., 0., 0., 0., 0.])

References

Brandes, Ulrik (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology.

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Closeness centrality

class sknetwork.ranking.Closeness(method: str = 'exact', tol: float = 0.1, n_jobs: Optional[int] = None)[source]

Closeness centrality of each node in a connected graph, corresponding to the average length of the shortest paths from that node to all the other ones.

For a directed graph, the closeness centrality is computed in terms of outgoing paths.

  • Graphs

  • Digraphs

Parameters
  • method – Denotes if the results should be exact or approximate.

  • tol (float) – If method=='approximate', the allowed tolerance on each score entry.

  • n_jobs – If an integer value is given, denotes the number of workers to use (-1 means the maximum number will be used). If None, no parallel computations are made.

Variables

scores_ (np.ndarray) – Closeness centrality of each node.

Example

>>> from sknetwork.ranking import Closeness
>>> from sknetwork.data import cyclic_digraph
>>> closeness = Closeness()
>>> adjacency = cyclic_digraph(3)
>>> scores = closeness.fit_transform(adjacency)
>>> np.round(scores, 2)
array([0.67, 0.67, 0.67])

References

Eppstein, D., & Wang, J. (2001, January). Fast approximation of centrality. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms (pp. 228-229). Society for Industrial and Applied Mathematics.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) → sknetwork.ranking.closeness.Closeness[source]

Closeness centrality for connected graphs.

Parameters

adjacency – Adjacency matrix of the graph.

Returns

self

Return type

Closeness

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Harmonic centrality

class sknetwork.ranking.Harmonic(n_jobs: Optional[int] = None)[source]

Harmonic centrality of each node in a connected graph, corresponding to the average inverse length of the shortest paths from that node to all the other ones.

For a directed graph, the harmonic centrality is computed in terms of outgoing paths.

  • Graphs

  • Digraphs

Parameters

n_jobs – If an integer value is given, denotes the number of workers to use (-1 means the maximum number will be used). If None, no parallel computations are made.

Variables

scores_ (np.ndarray) – Harmonic centrality of each node.

Example

>>> from sknetwork.ranking import Harmonic
>>> from sknetwork.data import house
>>> harmonic = Harmonic()
>>> adjacency = house()
>>> scores = harmonic.fit_transform(adjacency)
>>> np.round(scores, 2)
array([3. , 3.5, 3. , 3. , 3.5])

References

Marchiori, M., & Latora, V. (2000). Harmony in the small-world. Physica A: Statistical Mechanics and its Applications, 285(3-4), 539-546.

fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray]) → sknetwork.ranking.harmonic.Harmonic[source]

Harmonic centrality for connected graphs.

Parameters

adjacency – Adjacency matrix of the graph.

Returns

self

Return type

Harmonic

fit_transform(*args, **kwargs) → numpy.ndarray

Fit algorithm to data and return the scores. Same parameters as the fit method.

Returns

scores – Scores.

Return type

np.ndarray

Post-processing

sknetwork.ranking.top_k(scores: numpy.ndarray, k: int = 1)[source]

Index of the k elements with highest value.

Parameters
  • scores (np.ndarray) – Array of values.

  • k (int) – Number of elements to return.

Examples

>>> scores = np.array([0, 1, 0, 0.5])
>>> top_k(scores, k=2)
array([1, 3])

Notes

This is a basic implementation that sorts the entire array to find its top k elements.