# Utils¶

Various tools.

## Build graphs¶

sknetwork.utils.edgelist2adjacency(edgelist: list, undirected: bool = False) → scipy.sparse.csr.csr_matrix[source]

Build an adjacency matrix from a list of edges.

Parameters
• edgelist (list) – List of edges as pairs (i, j) or triplets (i, j, w) for weighted edges.

• undirected (bool) – If True, return a symmetric adjacency.

Returns

Return type

sparse.csr_matrix

Examples

>>> edgelist = [(0, 1), (1, 2), (2, 0)]
((3, 3), 3)
((3, 3), 6)
>>> weighted_edgelist = [(0, 1, 0.2), (1, 2, 4), (2, 0, 1.3)]
dtype('float64')

sknetwork.utils.edgelist2biadjacency(edgelist: list) → scipy.sparse.csr.csr_matrix[source]

Build a biadjacency matrix from a list of edges.

Parameters

edgelist (list) – List of edges as pairs (i, j) or triplets (i, j, w) for weighted edges.

Returns

Return type

sparse.csr_matrix

Examples

>>> edgelist = [(0, 0), (1, 0), (1, 1), (2, 1)]
((3, 2), 4)
>>> weighted_edgelist = [(0, 0, 0.5), (1, 0, 1), (1, 1, 1), (2, 1, 2)]
dtype('float64')


## Convert graphs¶

sknetwork.utils.bipartite2directed(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]

Adjacency matrix of the directed graph associated with a bipartite graph (with edges from one part to the other).

$$A = \begin{bmatrix} 0 & B \\ 0 & 0 \end{bmatrix}$$

where $$B$$ is the biadjacency matrix.

Parameters

Returns

Adjacency matrix (same format as input).

Return type

sknetwork.utils.bipartite2undirected(biadjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR]) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]

$$A = \begin{bmatrix} 0 & B \\ B^T & 0 \end{bmatrix}$$

where $$B$$ is the biadjacency matrix of the bipartite graph.

Parameters

Returns

Adjacency matrix (same format as input).

Return type

sknetwork.utils.directed2undirected(adjacency: Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR], weighted: bool = True) → Union[scipy.sparse.csr.csr_matrix, sknetwork.linalg.sparse_lowrank.SparseLR][source]

Adjacency matrix of the undirected graph associated with some directed graph.

The new adjacency matrix becomes either:

$$A+A^T$$ (default)

or

$$\max(A,A^T)$$

If the initial adjacency matrix $$A$$ is binary, bidirectional edges have weight 2 (first method, default) or 1 (second method).

Parameters

• weighted – If True, return the sum of the weights in both directions of each edge.

Returns

New adjacency matrix (same format as input).

Return type

## Clustering¶

sknetwork.utils.membership_matrix(labels: numpy.ndarray, dtype=<class 'bool'>, n_labels: Optional[int] = None) → scipy.sparse.csr.csr_matrix[source]

Build a n x k matrix of the label assignments, with k the number of labels. Negative labels are ignored.

Parameters
• labels – Label of each node.

• dtype – Type of the entries. Boolean by default.

• n_labels (int) – Number of labels.

Returns

membership – Binary matrix of label assignments.

Return type

sparse.csr_matrix

Notes

The inverse operation is simply labels = membership.indices.

class sknetwork.utils.KMeansDense(n_clusters: int = 8, init: str = '++', n_init: int = 10, tol: float = 0.0001)[source]

Standard KMeansDense clustering based on SciPy function kmeans2.

Parameters
• n_clusters – Number of desired clusters.

• init – Method for initialization. Available methods are ‘random’, ‘points’, ‘++’ and ‘matrix’: * ‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data. * ‘points’: choose k observations (rows) at random from data for the initial centroids. * ‘++’: choose k observations accordingly to the kmeans++ method (careful seeding) * ‘matrix’: interpret the k parameter as a k by M (or length k array for one-dimensional data) array of initial centroids.

• n_init – Number of iterations of the k-means algorithm to run.

• tol – Relative tolerance with regards to inertia to declare convergence.

Variables
• labels_ – Label of each sample.

• cluster_centers_ – A ‘k’ by ‘N’ array of centroids found at the last iteration of k-means.

References

• MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).

• Arthur, D., & Vassilvitskii, S. (2007, January). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027-1035). Society for Industrial and Applied Mathematics.

fit(x: numpy.ndarray) → sknetwork.utils.kmeans.KMeansDense[source]

Fit algorithm to the data.

Parameters

x – Data to cluster.

Returns

self

Return type

KMeansDense

fit_transform(x: numpy.ndarray) → numpy.ndarray[source]

Fit algorithm to the data and return the labels.

Parameters

x – Data to cluster.

Returns

labels

Return type

np.ndarray

class sknetwork.utils.WardDense[source]

Hierarchical clustering by the Ward method based on SciPy.

Variables

dendrogram_ (np.ndarray (n - 1, 4)) – Dendrogram.

References

• Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.

• Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-97.

fit(x: numpy.ndarray) → sknetwork.utils.ward.WardDense[source]

Apply algorithm to a dense matrix.

Parameters

x – Data to cluster.

Returns

self

Return type

WardDense

fit_transform(x: numpy.ndarray) → numpy.ndarray[source]

Apply algorithm to a dense matrix and return the dendrogram.

Parameters

x – Data to cluster.

Returns

dendrogram

Return type

np.ndarray

## Nearest-neighbors¶

class sknetwork.utils.KNNDense(n_neighbors: int = 5, undirected: bool = False, leaf_size: int = 16, p=2, eps: float = 0.01, n_jobs=1)[source]

Extract adjacency from vector data through k-nearest-neighbor search with KD-Tree.

Parameters
• n_neighbors – Number of neighbors for each sample in the transformed sparse graph.

• undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to True forces the algorithm to return undirected graphs.

• leaf_size – Leaf size passed to KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree.

• p – Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance, 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance. A finite large p may cause a ValueError if overflow can occur.

• eps – Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+tol_nn) times the distance to the real k-th nearest neighbor.

• n_jobs – Number of jobs to schedule for parallel processing. If -1 is given all processors are used.

Variables

References

Maneewongvatana, S., & Mount, D. M. (1999, December). It’s okay to be skinny, if your friends are fat. In Center for Geometric Computing 4th Annual Workshop on Computational Geometry (Vol. 2, pp. 1-8).

fit(x: numpy.ndarray) → sknetwork.utils.knn.KNNDense[source]

Fit algorithm to the data.

Parameters

x – Data to transform into adjacency.

Returns

self

Return type

KNNDense

fit_transform(x: numpy.ndarray) → scipy.sparse.csr.csr_matrix

Fit algorithm to the data and return the computed adjacency.

Parameters

x (np.ndarray) – Input data.

Returns

Return type

sparse.csr_matrix

make_undirected()

Modifies the adjacency to match desired constrains.

class sknetwork.utils.CNNDense(n_neighbors: int = 1, undirected: bool = False)[source]

Extract adjacency from vector data through component-wise k-nearest-neighbor search. KNN is applied independently on each column of the input matrix.

Parameters
• n_neighbors – Number of neighbors per dimension.

• undirected – As the nearest neighbor relationship is not symmetric, the graph is directed by default. Setting this parameter to True forces the algorithm to return undirected graphs.

Variables

fit(x: numpy.ndarray) → sknetwork.utils.knn.CNNDense[source]

Fit algorithm to the data.

Parameters

x – Data to transform into adjacency.

Returns

self

Return type

CNNDense

fit_transform(x: numpy.ndarray) → scipy.sparse.csr.csr_matrix

Fit algorithm to the data and return the computed adjacency.

Parameters

x (np.ndarray) – Input data.

Returns

Return type

sparse.csr_matrix

make_undirected()

Modifies the adjacency to match desired constrains.

## Co-Neighborhood¶

sknetwork.utils.co_neighbor_graph(adjacency: Union[scipy.sparse.csr.csr_matrix, numpy.ndarray], normalized: bool = True, method='knn', n_neighbors: int = 5, n_components: int = 8) → scipy.sparse.csr.csr_matrix[source]

• Graphs

• Digraphs

• Bigraphs

$$\tilde{A} = AF^{-1}A^T$$,

where F is a weight matrix.

Parameters

• normalized – If True, F is the diagonal in-degree matrix $$F = \text{diag}(A^T1)$$. Otherwise, F is the identity matrix.

• method – Either 'exact' or 'knn'. If ‘exact’ the output is computed with matrix multiplication. However, the density can be much higher than in the input graph and this can trigger Memory errors. If 'knn', the co-neighborhood is approximated through KNNDense-search in an appropriate spectral embedding space.

• n_neighbors – Number of neighbors for the KNNDense search. Only useful if method='knn'.

• n_components – Dimension of the embedding space. Only useful if method='knn'.

Returns

Return type

sparse.csr_matrix

## Projection¶

sknetwork.utils.projection_simplex(x: Union[numpy.ndarray, scipy.sparse.csr.csr_matrix], scale: float = 1.0)[source]

Project each line of the input onto the Euclidean simplex i.e. solve

$$\underset{w}{min} ||w - x_i||_2^2$$ s.t. $$\sum w_j = z, w_j \ge 0$$.

Parameters
• x – Data to project. Either one or two dimensional. Can be sparse or dense.

• scale (float) – Scale of the simplex i.e. sums of the projected coefficients.

Returns

projection – Array with the same type and shape as the input.

Return type

np.ndarray or sparse.csr_matrix

Example

>>> X = np.array([[2, 2], [-0.75, 0.25]])
>>> projection_simplex(X)
array([[0.5, 0.5],
[0. , 1. ]])
>>> X_csr = sparse.csr_matrix(X)
>>> X_proj = projection_simplex(X_csr)
>>> X_proj.nnz
3
>>> X_proj.toarray()
array([[0.5, 0.5],
[0. , 1. ]])


References

Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008, July). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning (pp. 272-279). ACM.

sknetwork.utils.projection_simplex_array(array: numpy.ndarray, scale: float = 1) → numpy.ndarray[source]

Project each line of the input onto the Euclidean simplex i.e. solve

$$\underset{w}{min} ||w - x_i||_2^2$$ s.t. $$\sum w_j = z, w_j \ge 0$$.

Parameters
• array (np.ndarray) – Data to project. Either one or two dimensional.

• scale (float) – Scale of the simplex i.e. sums of the projected coefficients.

Returns

projection – Array with the same shape as the input.

Return type

np.ndarray

Example

>>> X = np.array([[2, 2], [-0.75, 0.25]])
>>> projection_simplex_array(X)
array([[0.5, 0.5],
[0. , 1. ]])

sknetwork.utils.projection_simplex_csr(matrix: scipy.sparse.csr.csr_matrix, scale: float = 1)[source]

Project each line of the input onto the Euclidean simplex i.e. solve

$$\underset{w}{min} ||w - x_i||_2^2$$ s.t. $$\sum w_j = z, w_j \ge 0$$.

Parameters
• matrix (sparse.csr_matrix) – Matrix whose rows must be projected.

• scale (float) – Scale of the simplex i.e. sums of the projected coefficients.

Returns

projection – Matrix with the same shape as the input.

Return type

sparse.csr_matrix

Examples

>>> X = sparse.csr_matrix(np.array([[2, 2], [-0.75, 0.25]]))
>>> X_proj = projection_simplex_csr(X)
>>> X_proj.nnz
3
>>> X_proj.toarray()
array([[0.5, 0.5],
[0. , 1. ]])