Link prediction
Link prediction algorithms.
The method predict
assigns a scores to edges.
First order methods
- class sknetwork.linkpred.CommonNeighbors[source]
Link prediction by common neighbors:
\(s(i, j) = |\Gamma_i \cap \Gamma_j|\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> cn = CommonNeighbors() >>> similarities = cn.fit_predict(adjacency, 0) >>> similarities array([2, 1, 1, 1, 1]) >>> similarities = cn.predict([0, 1]) >>> similarities array([[2, 1, 1, 1, 1], [1, 3, 0, 2, 1]]) >>> similarities = cn.predict((0, 1)) >>> similarities 1 >>> similarities = cn.predict([(0, 1), (1, 2)]) >>> similarities array([1, 0])
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.JaccardIndex[source]
Link prediction by Jaccard Index:
\(s(i, j) = \dfrac{|\Gamma_i \cap \Gamma_j|}{|\Gamma_i \cup \Gamma_j|}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> jaccard = JaccardIndex() >>> similarities = jaccard.fit_predict(adjacency, 0) >>> similarities.round(2) array([1. , 0.25, 0.33, 0.33, 0.25]) >>> similarities = jaccard.predict([0, 1]) >>> similarities.round(2) array([[1. , 0.25, 0.33, 0.33, 0.25], [0.25, 1. , 0. , 0.67, 0.2 ]]) >>> similarities = jaccard.predict((0, 1)) >>> similarities.round(2) 0.25 >>> similarities = jaccard.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.25, 0. ])
References
Levandowsky, M., & Winter, D. (1971). Distance between sets. Nature, 234(5323), 34-35.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.SaltonIndex[source]
Link prediction by Salton Index:
\(s(i, j) = \dfrac{|\Gamma_i \cap \Gamma_j|}{\sqrt{|\Gamma_i|.|\Gamma_j|}}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> salton = SaltonIndex() >>> similarities = salton.fit_predict(adjacency, 0) >>> similarities.round(2) array([1. , 0.41, 0.5 , 0.5 , 0.41]) >>> similarities = salton.predict([0, 1]) >>> similarities.round(2) array([[1. , 0.41, 0.5 , 0.5 , 0.41], [0.41, 1. , 0. , 0.82, 0.33]]) >>> similarities = salton.predict((0, 1)) >>> similarities.round(2) 0.41 >>> similarities = salton.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.41, 0. ])
References
Martínez, V., Berzal, F., & Cubero, J. C. (2016). A survey of link prediction in complex networks. ACM computing surveys (CSUR), 49(4), 1-33.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.SorensenIndex[source]
Link prediction by Salton Index:
\(s(i, j) = \dfrac{2|\Gamma_i \cap \Gamma_j|}{|\Gamma_i|+|\Gamma_j|}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> sorensen = SorensenIndex() >>> similarities = sorensen.fit_predict(adjacency, 0) >>> similarities.round(2) array([1. , 0.4, 0.5, 0.5, 0.4]) >>> similarities = sorensen.predict([0, 1]) >>> similarities.round(2) array([[1. , 0.4 , 0.5 , 0.5 , 0.4 ], [0.4 , 1. , 0. , 0.8 , 0.33]]) >>> similarities = sorensen.predict((0, 1)) >>> similarities.round(2) 0.4 >>> similarities = sorensen.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.4, 0. ])
References
Sørensen, T. J. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. I kommission hos E. Munksgaard.
Martínez, V., Berzal, F., & Cubero, J. C. (2016). A survey of link prediction in complex networks. ACM computing surveys (CSUR), 49(4), 1-33.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.HubPromotedIndex[source]
Link prediction by Hub Promoted Index:
\(s(i, j) = \dfrac{2|\Gamma_i \cap \Gamma_j|}{min(|\Gamma_i|,|\Gamma_j|)}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> hpi = HubPromotedIndex() >>> similarities = hpi.fit_predict(adjacency, 0) >>> similarities.round(2) array([1. , 0.5, 0.5, 0.5, 0.5]) >>> similarities = hpi.predict([0, 1]) >>> similarities.round(2) array([[1. , 0.5 , 0.5 , 0.5 , 0.5 ], [0.5 , 1. , 0. , 1. , 0.33]]) >>> similarities = hpi.predict((0, 1)) >>> similarities.round(2) 0.5 >>> similarities = hpi.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.5, 0. ])
References
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L. (2002). Hierarchical organization of modularity in metabolic networks. science, 297(5586), 1551-1555.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.HubDepressedIndex[source]
Link prediction by Hub Depressed Index:
\(s(i, j) = \dfrac{2|\Gamma_i \cap \Gamma_j|}{max(|\Gamma_i|,|\Gamma_j|)}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> hdi = HubDepressedIndex() >>> similarities = hdi.fit_predict(adjacency, 0) >>> similarities.round(2) array([1. , 0.33, 0.5 , 0.5 , 0.33]) >>> similarities = hdi.predict([0, 1]) >>> similarities.round(2) array([[1. , 0.33, 0.5 , 0.5 , 0.33], [0.33, 1. , 0. , 0.67, 0.33]]) >>> similarities = hdi.predict((0, 1)) >>> similarities.round(2) 0.33 >>> similarities = hdi.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.33, 0. ])
References
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L. (2002). Hierarchical organization of modularity in metabolic networks. science, 297(5586), 1551-1555.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.AdamicAdar[source]
Link prediction by Adamic-Adar index:
\(s(i, j) = \underset{z \in \Gamma_i \cap \Gamma_j}{\sum} \dfrac{1}{\log |\Gamma_z|}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> aa = AdamicAdar() >>> similarities = aa.fit_predict(adjacency, 0) >>> similarities.round(2) array([1.82, 0.91, 0.91, 0.91, 0.91]) >>> similarities = aa.predict([0, 1]) >>> similarities.round(2) array([[1.82, 0.91, 0.91, 0.91, 0.91], [0.91, 3.8 , 0. , 2.35, 1.44]]) >>> similarities = aa.predict((0, 1)) >>> similarities.round(2) 0.91 >>> similarities = aa.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.91, 0. ])
References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.ResourceAllocation[source]
Link prediction by Resource Allocation index:
\(s(i, j) = \underset{z \in \Gamma_i \cap \Gamma_j}{\sum} \dfrac{1}{|\Gamma_z|}\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> ra = ResourceAllocation() >>> similarities = ra.fit_predict(adjacency, 0) >>> similarities.round(2) array([0.67, 0.33, 0.33, 0.33, 0.33]) >>> similarities = ra.predict([0, 1]) >>> similarities.round(2) array([[0.67, 0.33, 0.33, 0.33, 0.33], [0.33, 1.33, 0. , 0.83, 0.5 ]]) >>> similarities = ra.predict((0, 1)) >>> similarities.round(2) 0.33 >>> similarities = ra.predict([(0, 1), (1, 2)]) >>> similarities.round(2) array([0.33, 0. ])
References
Zhou, T., Lü, L., & Zhang, Y. C. (2009). Predicting missing links via local information. The European Physical Journal B, 71(4), 623-630.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
- class sknetwork.linkpred.PreferentialAttachment[source]
Link prediction by Preferential Attachment index:
\(s(i, j) = |\Gamma_i||\Gamma_j|\).
- Variables
indptr_ (np.ndarray) – Pointer index for neighbors.
indices_ (np.ndarray) – Concatenation of neighbors.
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> pa = PreferentialAttachment() >>> similarities = pa.fit_predict(adjacency, 0) >>> similarities array([4, 6, 4, 4, 6], dtype=int32) >>> similarities = pa.predict([0, 1]) >>> similarities array([[4, 6, 4, 4, 6], [6, 9, 6, 6, 9]], dtype=int32) >>> similarities = pa.predict((0, 1)) >>> similarities 6 >>> similarities = pa.predict([(0, 1), (1, 2)]) >>> similarities array([6, 6], dtype=int32)
References
Albert, R., Barabási, L. (2002). Statistical mechanics of complex networks Reviews of Modern Physics.
- fit(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray])
Fit algorithm to the data.
- Parameters
adjacency – Adjacency matrix of the graph
- Returns
self
- Return type
FirstOrder
- fit_predict(adjacency, query)
Fit algorithm to data and compute scores for requested edges.
- predict(query: Union[int, Iterable, Tuple])
Compute similarity scores.
- Parameters
query (int, list, array or Tuple) –
If int i, return the similarities s(i, j) for all j.
If list or array integers, return s(i, j) for i in query, for all j as array.
If tuple (i, j), return the similarity s(i, j).
If list of tuples or array of shape (n_queries, 2), return s(i, j) for (i, j) in query as array.
- Returns
predictions – The prediction scores.
- Return type
int, float or array
Post-processing
- sknetwork.linkpred.is_edge(adjacency: scipy.sparse.csr.csr_matrix, query: Union[int, Iterable, Tuple]) Union[bool, numpy.ndarray] [source]
Given a query, return whether each edge is actually in the adjacency.
- Parameters
adjacency – Adjacency matrix of the graph.
query (int, Iterable or Tuple) –
If int i, queries (i, j) for all j.
If Iterable of integers, return queries (i, j) for i in query, for all j.
If tuple (i, j), queries (i, j).
If list of tuples or array of shape (n_queries, 2), queries (i, j) in for each line in query.
- Returns
y_true – For each element in the query, returns
True
if the edge exists in the adjacency andFalse
otherwise.- Return type
Union[bool, np.ndarray]
Examples
>>> from sknetwork.data import house >>> adjacency = house() >>> is_edge(adjacency, 0) array([False, True, False, False, True]) >>> is_edge(adjacency, [0, 1]) array([[False, True, False, False, True], [ True, False, True, False, True]]) >>> is_edge(adjacency, (0, 1)) True >>> is_edge(adjacency, [(0, 1), (0, 2)]) array([ True, False])
- sknetwork.linkpred.whitened_sigmoid(scores: numpy.ndarray)[source]
Map the entries of a score array to probabilities through
\(\dfrac{1}{1 + \exp(-(x - \mu)/\sigma)}\),
where \(\mu\) and \(\sigma\) are respectively the mean and standard deviation of x.
- Parameters
scores (np.ndarray) – The input array
- Returns
probas – Array with entries between 0 and 1.
- Return type
np.ndarray
Examples
>>> probas = whitened_sigmoid(np.array([1, 5, 0.25])) >>> probas.round(2) array([0.37, 0.8 , 0.29]) >>> probas = whitened_sigmoid(np.array([2, 2, 2])) >>> probas array([1, 1, 1])