Utils¶
Various tools.
Build graphs¶

sknetwork.utils.
edgelist2adjacency
(edgelist: list, undirected: bool = False) → scipy.sparse.csr.csr_matrix[source]¶ Build an adjacency matrix from a list of edges.
 Parameters
edgelist (list) – List of edges as pairs (i, j) or triplets (i, j, w) for weighted edges.
undirected (bool) – If
True
, return a symmetric adjacency.
 Returns
adjacency
 Return type
sparse.csr_matrix
Examples
>>> edgelist = [(0, 1), (1, 2), (2, 0)] >>> adjacency = edgelist2adjacency(edgelist) >>> adjacency.shape, adjacency.nnz ((3, 3), 3) >>> adjacency = edgelist2adjacency(edgelist, undirected=True) >>> adjacency.shape, adjacency.nnz ((3, 3), 6) >>> weighted_edgelist = [(0, 1, 0.2), (1, 2, 4), (2, 0, 1.3)] >>> adjacency = edgelist2adjacency(weighted_edgelist) >>> adjacency.dtype dtype('float64')

sknetwork.utils.
edgelist2biadjacency
(edgelist: list) → scipy.sparse.csr.csr_matrix[source]¶ Build a biadjacency matrix from a list of edges.
 Parameters
edgelist (list) – List of edges as pairs (i, j) or triplets (i, j, w) for weighted edges.
 Returns
biadjacency
 Return type
sparse.csr_matrix
Examples
>>> edgelist = [(0, 0), (1, 0), (1, 1), (2, 1)] >>> biadjacency = edgelist2biadjacency(edgelist) >>> biadjacency.shape, biadjacency.nnz ((3, 2), 4) >>> weighted_edgelist = [(0, 0, 0.5), (1, 0, 1), (1, 1, 1), (2, 1, 2)] >>> biadjacency = edgelist2biadjacency(weighted_edgelist) >>> biadjacency.dtype dtype('float64')
Convert graphs¶

sknetwork.utils.
bipartite2directed
(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]¶ Adjacency matrix of the directed graph associated with a bipartite graph (with edges from one part to the other).
The returned adjacency matrix is:
\(A = \begin{bmatrix} 0 & B \\ 0 & 0 \end{bmatrix}\)
where \(B\) is the biadjacency matrix.
 Parameters
biadjacency – Biadjacency matrix of the graph.
 Returns
Adjacency matrix (same format as input).
 Return type
adjacency

sknetwork.utils.
bipartite2undirected
(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]¶ Adjacency matrix of a bigraph defined by its biadjacency matrix.
The returned adjacency matrix is:
\(A = \begin{bmatrix} 0 & B \\ B^T & 0 \end{bmatrix}\)
where \(B\) is the biadjacency matrix of the bipartite graph.
 Parameters
biadjacency – Biadjacency matrix of the graph.
 Returns
Adjacency matrix (same format as input).
 Return type
adjacency

sknetwork.utils.
directed2undirected
(adjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR], weighted: bool = True) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]¶ Adjacency matrix of the undirected graph associated with some directed graph.
The new adjacency matrix becomes either:
\(A+A^T\) (default)
or
\(\max(A,A^T)\)
If the initial adjacency matrix \(A\) is binary, bidirectional edges have weight 2 (first method, default) or 1 (second method).
 Parameters
adjacency – Adjacency matrix.
weighted – If
True
, return the sum of the weights in both directions of each edge.
 Returns
New adjacency matrix (same format as input).
 Return type
new_adjacency
Clustering¶

sknetwork.utils.
membership_matrix
(labels: numpy.ndarray, dtype=<class 'bool'>, n_labels: Optional[int] = None) → scipy.sparse.csr.csr_matrix[source]¶ Build a n x k matrix of the label assignments, with k the number of labels. Negative labels are ignored.
 Parameters
labels – Label of each node.
dtype – Type of the entries. Boolean by default.
n_labels (int) – Number of labels.
 Returns
membership – Binary matrix of label assignments.
 Return type
sparse.csr_matrix
Notes
The inverse operation is simply
labels = membership.indices
.

class
sknetwork.utils.
KMeansDense
(n_clusters: int = 8, init: str = '++', n_init: int = 10, tol: float = 0.0001)[source]¶ Standard KMeansDense clustering based on SciPy function
kmeans2
. Parameters
n_clusters – Number of desired clusters.
init – Method for initialization. Available methods are ‘random’, ‘points’, ‘++’ and ‘matrix’: * ‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data. * ‘points’: choose k observations (rows) at random from data for the initial centroids. * ‘++’: choose k observations accordingly to the kmeans++ method (careful seeding) * ‘matrix’: interpret the k parameter as a k by M (or length k array for onedimensional data) array of initial centroids.
n_init – Number of iterations of the kmeans algorithm to run.
tol – Relative tolerance with regards to inertia to declare convergence.
 Variables
labels_ – Label of each sample.
cluster_centers_ – A ‘k’ by ‘N’ array of centroids found at the last iteration of kmeans.
References
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281297).
Arthur, D., & Vassilvitskii, S. (2007, January). kmeans++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms (pp. 10271035). Society for Industrial and Applied Mathematics.

class
sknetwork.utils.
WardDense
[source]¶ Hierarchical clustering by the Ward method based on SciPy.
 Variables
dendrogram_ (np.ndarray (n  1, 4)) – Dendrogram.
References
Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 8697.
Nearestneighbors¶

class
sknetwork.utils.
KNNDense
(n_neighbors: int = 5, undirected: bool = False, leaf_size: int = 16, p=2, eps: float = 0.01, n_jobs=1)[source]¶ Extract adjacency from vector data through knearestneighbor search with KDTree.
 Parameters
n_neighbors – Number of neighbors for each sample in the transformed sparse graph.
undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to
True
forces the algorithm to return undirected graphs.leaf_size – Leaf size passed to KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree.
p – Which Minkowski pnorm to use. 1 is the sumofabsolutevalues “Manhattan” distance, 2 is the usual Euclidean distance infinity is the maximumcoordinatedifference distance. A finite large p may cause a ValueError if overflow can occur.
eps – Return approximate nearest neighbors; the kth returned value is guaranteed to be no further than (1+tol_nn) times the distance to the real kth nearest neighbor.
n_jobs – Number of jobs to schedule for parallel processing. If 1 is given all processors are used.
 Variables
adjacency_ – Adjacency matrix of the graph.
References
Maneewongvatana, S., & Mount, D. M. (1999, December). It’s okay to be skinny, if your friends are fat. In Center for Geometric Computing 4th Annual Workshop on Computational Geometry (Vol. 2, pp. 18).

fit
(x: numpy.ndarray) → sknetwork.utils.knn.KNNDense[source]¶ Fit algorithm to the data.
 Parameters
x – Data to transform into adjacency.
 Returns
self
 Return type

fit_transform
(x: numpy.ndarray) → scipy.sparse.csr.csr_matrix¶ Fit algorithm to the data and return the computed adjacency.
 Parameters
x (np.ndarray) – Input data.
 Returns
adjacency
 Return type
sparse.csr_matrix

make_undirected
()¶ Modifies the adjacency to match desired constrains.

class
sknetwork.utils.
CNNDense
(n_neighbors: int = 1, undirected: bool = False)[source]¶ Extract adjacency from vector data through componentwise knearestneighbor search. KNN is applied independently on each column of the input matrix.
 Parameters
n_neighbors – Number of neighbors per dimension.
undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to
True
forces the algorithm to return undirected graphs.
 Variables
adjacency_ – Adjacency matrix of the graph.

fit
(x: numpy.ndarray) → sknetwork.utils.knn.CNNDense[source]¶ Fit algorithm to the data.
 Parameters
x – Data to transform into adjacency.
 Returns
self
 Return type

fit_transform
(x: numpy.ndarray) → scipy.sparse.csr.csr_matrix¶ Fit algorithm to the data and return the computed adjacency.
 Parameters
x (np.ndarray) – Input data.
 Returns
adjacency
 Return type
sparse.csr_matrix

make_undirected
()¶ Modifies the adjacency to match desired constrains.
CoNeighborhood¶

sknetwork.utils.
co_neighbor_graph
(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], normalized: bool = True, method='knn', n_neighbors: int = 5, n_components: int = 8) → scipy.sparse.csr.csr_matrix[source]¶ Compute the coneighborhood adjacency.
Graphs
Digraphs
Bigraphs
\(\tilde{A} = AF^{1}A^T\),
where F is a weight matrix.
 Parameters
adjacency – Adjacency of the input graph.
normalized – If
True
, F is the diagonal indegree matrix \(F = \text{diag}(A^T1)\). Otherwise, F is the identity matrix.method – Either
'exact'
or'knn'
. If ‘exact’ the output is computed with matrix multiplication. However, the density can be much higher than in the input graph and this can trigger Memory errors. If'knn'
, the coneighborhood is approximated through KNNDensesearch in an appropriate spectral embedding space.n_neighbors – Number of neighbors for the KNNDense search. Only useful if
method='knn'
.n_components – Dimension of the embedding space. Only useful if
method='knn'
.
 Returns
adjacency – Adjacency of the coneighborhood.
 Return type
sparse.csr_matrix
Projection¶

sknetwork.utils.
projection_simplex
(x: Union[numpy.ndarray, scipy.sparse.csr.csr_matrix], scale: float = 1.0)[source]¶ Project each line of the input onto the Euclidean simplex i.e. solve
\(\underset{w}{min} w  x_i_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).
 Parameters
x – Data to project. Either one or two dimensional. Can be sparse or dense.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.
 Returns
projection – Array with the same type and shape as the input.
 Return type
np.ndarray or sparse.csr_matrix
Example
>>> X = np.array([[2, 2], [0.75, 0.25]]) >>> projection_simplex(X) array([[0.5, 0.5], [0. , 1. ]]) >>> X_csr = sparse.csr_matrix(X) >>> X_proj = projection_simplex(X_csr) >>> X_proj.nnz 3 >>> X_proj.toarray() array([[0.5, 0.5], [0. , 1. ]])
References
Duchi, J., ShalevShwartz, S., Singer, Y., & Chandra, T. (2008, July). Efficient projections onto the l 1ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning (pp. 272279). ACM.

sknetwork.utils.
projection_simplex_array
(array: numpy.ndarray, scale: float = 1) → numpy.ndarray[source]¶ Project each line of the input onto the Euclidean simplex i.e. solve
\(\underset{w}{min} w  x_i_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).
 Parameters
array (np.ndarray) – Data to project. Either one or two dimensional.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.
 Returns
projection – Array with the same shape as the input.
 Return type
np.ndarray
Example
>>> X = np.array([[2, 2], [0.75, 0.25]]) >>> projection_simplex_array(X) array([[0.5, 0.5], [0. , 1. ]])

sknetwork.utils.
projection_simplex_csr
(matrix: scipy.sparse.csr.csr_matrix, scale: float = 1)[source]¶ Project each line of the input onto the Euclidean simplex i.e. solve
\(\underset{w}{min} w  x_i_2^2\) s.t. \(\sum w_j = z, w_j \ge 0\).
 Parameters
matrix (sparse.csr_matrix) – Matrix whose rows must be projected.
scale (float) – Scale of the simplex i.e. sums of the projected coefficients.
 Returns
projection – Matrix with the same shape as the input.
 Return type
sparse.csr_matrix
Examples
>>> X = sparse.csr_matrix(np.array([[2, 2], [0.75, 0.25]])) >>> X_proj = projection_simplex_csr(X) >>> X_proj.nnz 3 >>> X_proj.toarray() array([[0.5, 0.5], [0. , 1. ]])