Utils

Various tools for graph analysis.

Convert graphs

sknetwork.utils.bipartite2directed(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]

Adjacency matrix of the directed graph associated with a bipartite graph (with edges from one part to the other).

The returned adjacency matrix is:

\(A = \begin{bmatrix} 0 & B \\ 0 & 0 \end{bmatrix}\)

where \(B\) is the biadjacency matrix.

Parameters: biadjacency – Biadjacency matrix of the graph.
Returns: Adjacency matrix (same format as input).
Return type: adjacency

sknetwork.utils.bipartite2undirected(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]

Adjacency matrix of a bigraph defined by its biadjacency matrix.

The returned adjacency matrix is:

\(A = \begin{bmatrix} 0 & B \\ B^T & 0 \end{bmatrix}\)

where \(B\) is the biadjacency matrix of the bipartite graph.

Parameters: biadjacency – Biadjacency matrix of the graph.
Returns: Adjacency matrix (same format as input).
Return type: adjacency

sknetwork.utils.directed2undirected(adjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR], weighted: bool = True) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]

Adjacency matrix of the undirected graph associated with some directed graph.

The new adjacency matrix becomes either:

\(A+A^T\) (default)

or

\(\max(A,A^T)\)

If the initial adjacency matrix \(A\) is binary, bidirectional edges have weight 2 (first method, default) or 1 (second method).

Parameters

adjacency – Adjacency matrix.
weighted – If True, return the sum of the weights in both directions of each edge.

Returns

New adjacency matrix (same format as input).

Return type

new_adjacency

Get neighbors

sknetwork.utils.get_neighbors(input_matrix: scipy.sparse.csr.csr_matrix, node: int, transpose: bool = False) → numpy.ndarray[source]

Get the neighbors of a node.

Parameters

input_matrix (sparse.csr_matrix) – Adjacency or biadjacency matrix.
node (int) – Target node.
transpose – If True, transpose the input matrix.

Returns

neighbors – Array of neighbors of the target node (successors for directed graphs; set transpose=True for predecessors).

Return type

np.ndarray

Example

>>> from sknetwork.data import house
>>> adjacency = house()
>>> get_neighbors(adjacency, node=0)
array([1, 4], dtype=int32)

Clustering

sknetwork.utils.membership_matrix(labels: numpy.ndarray, dtype=<class 'bool'>, n_labels: Optional[int] = None) → scipy.sparse.csr.csr_matrix[source]

Build a n x k matrix of the label assignments, with k the number of labels. Negative labels are ignored.

Parameters

labels – Label of each node.
dtype – Type of the entries. Boolean by default.
n_labels (int) – Number of labels.

Returns

membership – Binary matrix of label assignments.

Return type

sparse.csr_matrix

Notes

The inverse operation is simply labels = membership.indices.

class sknetwork.utils.KMeansDense(n_clusters: int = 8, init: str = '++', n_init: int = 10, tol: float = 0.0001)[source]

Standard KMeansDense clustering based on SciPy function kmeans2.

Parameters

n_clusters – Number of desired clusters.
init – Method for initialization. Available methods are ‘random’, ‘points’, ‘++’ and ‘matrix’: * ‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data. * ‘points’: choose k observations (rows) at random from data for the initial centroids. * ‘++’: choose k observations accordingly to the kmeans++ method (careful seeding) * ‘matrix’: interpret the k parameter as a k by M (or length k array for one-dimensional data) array of initial centroids.
n_init – Number of iterations of the k-means algorithm to run.
tol – Relative tolerance with regards to inertia to declare convergence.

Variables

labels_ – Label of each sample.
cluster_centers_ – A ‘k’ by ‘N’ array of centroids found at the last iteration of k-means.

References

MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
Arthur, D., & Vassilvitskii, S. (2007, January). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027-1035). Society for Industrial and Applied Mathematics.

fit(x: numpy.ndarray) → sknetwork.utils.kmeans.KMeansDense[source]

Fit algorithm to the data.

Parameters: x – Data to cluster.
Returns: self
Return type: KMeansDense

fit_transform(x: numpy.ndarray) → numpy.ndarray[source]

Fit algorithm to the data and return the labels.

Parameters: x – Data to cluster.
Returns: labels
Return type: np.ndarray

class sknetwork.utils.WardDense[source]

Hierarchical clustering by the Ward method based on SciPy.

Variables: dendrogram_ (np.ndarray (n - 1, 4)) – Dendrogram.

References

Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-97.

fit(x: numpy.ndarray) → sknetwork.utils.ward.WardDense[source]

Apply algorithm to a dense matrix.

Parameters: x – Data to cluster.
Returns: self
Return type: WardDense

fit_transform(x: numpy.ndarray) → numpy.ndarray[source]

Apply algorithm to a dense matrix and return the dendrogram.

Parameters: x – Data to cluster.
Returns: dendrogram
Return type: np.ndarray

Nearest-neighbors

class sknetwork.utils.KNNDense(n_neighbors: int = 5, undirected: bool = False, leaf_size: int = 16, p=2, eps: float = 0.01, n_jobs=1)[source]

Extract adjacency from vector data through k-nearest-neighbor search with KD-Tree.

Parameters

n_neighbors – Number of neighbors for each sample in the transformed sparse graph.
undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to True forces the algorithm to return undirected graphs.
leaf_size – Leaf size passed to KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree.
p – Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance, 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance. A finite large p may cause a ValueError if overflow can occur.
eps – Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+tol_nn) times the distance to the real k-th nearest neighbor.
n_jobs – Number of jobs to schedule for parallel processing. If -1 is given all processors are used.

Variables

adjacency_ – Adjacency matrix of the graph.

References

Maneewongvatana, S., & Mount, D. M. (1999, December). It’s okay to be skinny, if your friends are fat. In Center for Geometric Computing 4th Annual Workshop on Computational Geometry (Vol. 2, pp. 1-8).

fit(x: numpy.ndarray) → sknetwork.utils.knn.KNNDense[source]

Fit algorithm to the data.

Parameters: x – Data to transform into adjacency.
Returns: self
Return type: KNNDense

fit_transform(x: numpy.ndarray) → scipy.sparse.csr.csr_matrix

Fit algorithm to the data and return the computed adjacency.

Parameters: x (np.ndarray) – Input data.
Returns: adjacency
Return type: sparse.csr_matrix

make_undirected(): Modifies the adjacency to match desired constrains.

class sknetwork.utils.CNNDense(n_neighbors: int = 1, undirected: bool = False)[source]

Extract adjacency from vector data through component-wise k-nearest-neighbor search. KNN is applied independently on each column of the input matrix.

Parameters

n_neighbors – Number of neighbors per dimension.
undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to True forces the algorithm to return undirected graphs.

Variables

adjacency_ – Adjacency matrix of the graph.

fit(x: numpy.ndarray) → sknetwork.utils.knn.CNNDense[source]

Fit algorithm to the data.

Parameters: x – Data to transform into adjacency.
Returns: self
Return type: CNNDense

fit_transform(x: numpy.ndarray) → scipy.sparse.csr.csr_matrix

Fit algorithm to the data and return the computed adjacency.

Parameters: x (np.ndarray) – Input data.
Returns: adjacency
Return type: sparse.csr_matrix

make_undirected(): Modifies the adjacency to match desired constrains.

Co-Neighborhood

sknetwork.utils.co_neighbor_graph(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], normalized: bool = True, method='knn', n_neighbors: int = 5, n_components: int = 8) → scipy.sparse.csr.csr_matrix[source]

Compute the co-neighborhood adjacency.

Graphs
Digraphs
Bigraphs

\(\tilde{A} = AF^{-1}A^T\),

where F is a weight matrix.

Parameters

adjacency – Adjacency of the input graph.
normalized – If True, F is the diagonal in-degree matrix \(F = \text{diag}(A^T1)\). Otherwise, F is the identity matrix.
method – Either 'exact' or 'knn'. If ‘exact’ the output is computed with matrix multiplication. However, the density can be much higher than in the input graph and this can trigger Memory errors. If 'knn', the co-neighborhood is approximated through KNNDense-search in an appropriate spectral embedding space.
n_neighbors – Number of neighbors for the KNNDense search. Only useful if method='knn'.
n_components – Dimension of the embedding space. Only useful if method='knn'.

Returns

adjacency – Adjacency of the co-neighborhood.

Return type

sparse.csr_matrix

Projection

sknetwork.utils.projection_simplex(x: Union[numpy.ndarray, scipy.sparse.csr.csr_matrix], scale: float = 1.0)[source]

Project each line of the input onto the Euclidean simplex i.e. solve

\(\underset{w}{min} ||w - x_i||_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).

Parameters

x – Data to project. Either one or two dimensional. Can be sparse or dense.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.

Returns

projection – Array with the same type and shape as the input.

Return type

np.ndarray or sparse.csr_matrix

Example

>>> X = np.array([[2, 2], [-0.75, 0.25]])
>>> projection_simplex(X)
array([[0.5, 0.5],
       [0. , 1. ]])
>>> X_csr = sparse.csr_matrix(X)
>>> X_proj = projection_simplex(X_csr)
>>> X_proj.nnz
3
>>> X_proj.toarray()
array([[0.5, 0.5],
       [0. , 1. ]])

References

Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008, July). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning (pp. 272-279). ACM.

sknetwork.utils.projection_simplex_array(array: numpy.ndarray, scale: float = 1) → numpy.ndarray[source]

Project each line of the input onto the Euclidean simplex i.e. solve

\(\underset{w}{min} ||w - x_i||_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).

Parameters

array (np.ndarray) – Data to project. Either one or two dimensional.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.

Returns

projection – Array with the same shape as the input.

Return type

np.ndarray

Example

>>> X = np.array([[2, 2], [-0.75, 0.25]])
>>> projection_simplex_array(X)
array([[0.5, 0.5],
       [0. , 1. ]])

sknetwork.utils.projection_simplex_csr(matrix: scipy.sparse.csr.csr_matrix, scale: float = 1)[source]

Project each line of the input onto the Euclidean simplex i.e. solve

\(\underset{w}{min} ||w - x_i||_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).

Parameters

matrix (sparse.csr_matrix) – Matrix whose rows must be projected.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.

Returns

projection – Matrix with the same shape as the input.

Return type

sparse.csr_matrix

Examples

>>> X = sparse.csr_matrix(np.array([[2, 2], [-0.75, 0.25]]))
>>> X_proj = projection_simplex_csr(X)
>>> X_proj.nnz
3
>>> X_proj.toarray()
array([[0.5, 0.5],
       [0. , 1. ]])