Utils
Various tools for graph analysis.
Convert graphs
- sknetwork.utils.bipartite2directed(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR] [source]
Adjacency matrix of the directed graph associated with a bipartite graph (with edges from one part to the other).
The returned adjacency matrix is:
\(A = \begin{bmatrix} 0 & B \\ 0 & 0 \end{bmatrix}\)
where \(B\) is the biadjacency matrix.
- Parameters
biadjacency – Biadjacency matrix of the graph.
- Returns
Adjacency matrix (same format as input).
- Return type
adjacency
- sknetwork.utils.bipartite2undirected(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR] [source]
Adjacency matrix of a bigraph defined by its biadjacency matrix.
The returned adjacency matrix is:
\(A = \begin{bmatrix} 0 & B \\ B^T & 0 \end{bmatrix}\)
where \(B\) is the biadjacency matrix of the bipartite graph.
- Parameters
biadjacency – Biadjacency matrix of the graph.
- Returns
Adjacency matrix (same format as input).
- Return type
adjacency
- sknetwork.utils.directed2undirected(adjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR], weighted: bool = True) Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR] [source]
Adjacency matrix of the undirected graph associated with some directed graph.
The new adjacency matrix becomes either:
\(A+A^T\) (default)
or
\(\max(A,A^T)\)
If the initial adjacency matrix \(A\) is binary, bidirectional edges have weight 2 (first method, default) or 1 (second method).
- Parameters
adjacency – Adjacency matrix.
weighted – If
True
, return the sum of the weights in both directions of each edge.
- Returns
New adjacency matrix (same format as input).
- Return type
new_adjacency
Get neighbors
- sknetwork.utils.get_neighbors(input_matrix: scipy.sparse.csr.csr_matrix, node: int, transpose: bool = False) numpy.ndarray [source]
Get the neighbors of a node.
- Parameters
input_matrix (sparse.csr_matrix) – Adjacency or biadjacency matrix.
node (int) – Target node.
transpose – If
True
, transpose the input matrix.
- Returns
neighbors – Array of neighbors of the target node (successors for directed graphs; set
transpose=True
for predecessors).- Return type
np.ndarray
Example
>>> from sknetwork.data import house >>> adjacency = house() >>> get_neighbors(adjacency, node=0) array([1, 4], dtype=int32)
Clustering
- sknetwork.utils.membership_matrix(labels: numpy.ndarray, dtype=<class 'bool'>, n_labels: Optional[int] = None) scipy.sparse.csr.csr_matrix [source]
Build a n x k matrix of the label assignments, with k the number of labels. Negative labels are ignored.
- Parameters
labels – Label of each node.
dtype – Type of the entries. Boolean by default.
n_labels (int) – Number of labels.
- Returns
membership – Binary matrix of label assignments.
- Return type
sparse.csr_matrix
Notes
The inverse operation is simply
labels = membership.indices
.
- class sknetwork.utils.KMeansDense(n_clusters: int = 8, init: str = '++', n_init: int = 10, tol: float = 0.0001)[source]
Standard KMeansDense clustering based on SciPy function
kmeans2
.- Parameters
n_clusters – Number of desired clusters.
init – Method for initialization. Available methods are ‘random’, ‘points’, ‘++’ and ‘matrix’: * ‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data. * ‘points’: choose k observations (rows) at random from data for the initial centroids. * ‘++’: choose k observations accordingly to the kmeans++ method (careful seeding) * ‘matrix’: interpret the k parameter as a k by M (or length k array for one-dimensional data) array of initial centroids.
n_init – Number of iterations of the k-means algorithm to run.
tol – Relative tolerance with regards to inertia to declare convergence.
- Variables
labels_ – Label of each sample.
cluster_centers_ – A ‘k’ by ‘N’ array of centroids found at the last iteration of k-means.
References
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
Arthur, D., & Vassilvitskii, S. (2007, January). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027-1035). Society for Industrial and Applied Mathematics.
- fit(x: numpy.ndarray) sknetwork.utils.kmeans.KMeansDense [source]
Fit algorithm to the data.
- Parameters
x – Data to cluster.
- Returns
self
- Return type
- class sknetwork.utils.WardDense[source]
Hierarchical clustering by the Ward method based on SciPy.
- Variables
dendrogram_ (np.ndarray (n - 1, 4)) – Dendrogram.
References
Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-97.
- fit(x: numpy.ndarray) sknetwork.utils.ward.WardDense [source]
Apply algorithm to a dense matrix.
- Parameters
x – Data to cluster.
- Returns
self
- Return type
Nearest-neighbors
- class sknetwork.utils.KNNDense(n_neighbors: int = 5, undirected: bool = False, leaf_size: int = 16, p=2, eps: float = 0.01, n_jobs=1)[source]
Extract adjacency from vector data through k-nearest-neighbor search with KD-Tree.
- Parameters
n_neighbors – Number of neighbors for each sample in the transformed sparse graph.
undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to
True
forces the algorithm to return undirected graphs.leaf_size – Leaf size passed to KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree.
p – Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance, 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance. A finite large p may cause a ValueError if overflow can occur.
eps – Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+tol_nn) times the distance to the real k-th nearest neighbor.
n_jobs – Number of jobs to schedule for parallel processing. If -1 is given all processors are used.
- Variables
adjacency_ – Adjacency matrix of the graph.
References
Maneewongvatana, S., & Mount, D. M. (1999, December). It’s okay to be skinny, if your friends are fat. In Center for Geometric Computing 4th Annual Workshop on Computational Geometry (Vol. 2, pp. 1-8).
- fit(x: numpy.ndarray) sknetwork.utils.knn.KNNDense [source]
Fit algorithm to the data.
- Parameters
x – Data to transform into adjacency.
- Returns
self
- Return type
- fit_transform(x: numpy.ndarray) scipy.sparse.csr.csr_matrix
Fit algorithm to the data and return the computed adjacency.
- Parameters
x (np.ndarray) – Input data.
- Returns
adjacency
- Return type
sparse.csr_matrix
- make_undirected()
Modifies the adjacency to match desired constrains.
- class sknetwork.utils.CNNDense(n_neighbors: int = 1, undirected: bool = False)[source]
Extract adjacency from vector data through component-wise k-nearest-neighbor search. KNN is applied independently on each column of the input matrix.
- Parameters
n_neighbors – Number of neighbors per dimension.
undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to
True
forces the algorithm to return undirected graphs.
- Variables
adjacency_ – Adjacency matrix of the graph.
- fit(x: numpy.ndarray) sknetwork.utils.knn.CNNDense [source]
Fit algorithm to the data.
- Parameters
x – Data to transform into adjacency.
- Returns
self
- Return type
- fit_transform(x: numpy.ndarray) scipy.sparse.csr.csr_matrix
Fit algorithm to the data and return the computed adjacency.
- Parameters
x (np.ndarray) – Input data.
- Returns
adjacency
- Return type
sparse.csr_matrix
- make_undirected()
Modifies the adjacency to match desired constrains.
Co-Neighborhood
- sknetwork.utils.co_neighbor_graph(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], normalized: bool = True, method='knn', n_neighbors: int = 5, n_components: int = 8) scipy.sparse.csr.csr_matrix [source]
Compute the co-neighborhood adjacency.
Graphs
Digraphs
Bigraphs
\(\tilde{A} = AF^{-1}A^T\),
where F is a weight matrix.
- Parameters
adjacency – Adjacency of the input graph.
normalized – If
True
, F is the diagonal in-degree matrix \(F = \text{diag}(A^T1)\). Otherwise, F is the identity matrix.method – Either
'exact'
or'knn'
. If ‘exact’ the output is computed with matrix multiplication. However, the density can be much higher than in the input graph and this can trigger Memory errors. If'knn'
, the co-neighborhood is approximated through KNNDense-search in an appropriate spectral embedding space.n_neighbors – Number of neighbors for the KNNDense search. Only useful if
method='knn'
.n_components – Dimension of the embedding space. Only useful if
method='knn'
.
- Returns
adjacency – Adjacency of the co-neighborhood.
- Return type
sparse.csr_matrix
Projection
- sknetwork.utils.projection_simplex(x: Union[numpy.ndarray, scipy.sparse.csr.csr_matrix], scale: float = 1.0)[source]
Project each line of the input onto the Euclidean simplex i.e. solve
\(\underset{w}{min} ||w - x_i||_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).
- Parameters
x – Data to project. Either one or two dimensional. Can be sparse or dense.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.
- Returns
projection – Array with the same type and shape as the input.
- Return type
np.ndarray or sparse.csr_matrix
Example
>>> X = np.array([[2, 2], [-0.75, 0.25]]) >>> projection_simplex(X) array([[0.5, 0.5], [0. , 1. ]]) >>> X_csr = sparse.csr_matrix(X) >>> X_proj = projection_simplex(X_csr) >>> X_proj.nnz 3 >>> X_proj.toarray() array([[0.5, 0.5], [0. , 1. ]])
References
Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008, July). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning (pp. 272-279). ACM.
- sknetwork.utils.projection_simplex_array(array: numpy.ndarray, scale: float = 1) numpy.ndarray [source]
Project each line of the input onto the Euclidean simplex i.e. solve
\(\underset{w}{min} ||w - x_i||_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).
- Parameters
array (np.ndarray) – Data to project. Either one or two dimensional.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.
- Returns
projection – Array with the same shape as the input.
- Return type
np.ndarray
Example
>>> X = np.array([[2, 2], [-0.75, 0.25]]) >>> projection_simplex_array(X) array([[0.5, 0.5], [0. , 1. ]])
- sknetwork.utils.projection_simplex_csr(matrix: scipy.sparse.csr.csr_matrix, scale: float = 1)[source]
Project each line of the input onto the Euclidean simplex i.e. solve
\(\underset{w}{min} ||w - x_i||_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).
- Parameters
matrix (sparse.csr_matrix) – Matrix whose rows must be projected.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.
- Returns
projection – Matrix with the same shape as the input.
- Return type
sparse.csr_matrix
Examples
>>> X = sparse.csr_matrix(np.array([[2, 2], [-0.75, 0.25]])) >>> X_proj = projection_simplex_csr(X) >>> X_proj.nnz 3 >>> X_proj.toarray() array([[0.5, 0.5], [0. , 1. ]])