Data

Tools for loading and saving graphs.

Edge list

sknetwork.data.from_edge_list(edge_list: ndarray | List[Tuple], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = False, shape: tuple | None = None, sum_duplicates: bool = True, matrix_only: bool | None = None) Dataset | csr_matrix[source]

Load a graph from an edge list.

Parameters:
  • edge_list (Union[np.ndarray, List[Tuple]]) – The edge list to convert, given as a NumPy array of size (n, 2) or (n, 3) or a list of tuples of length 2 or 3.

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix.

  • weighted (bool) – If True, returns a weighted graph.

  • reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.

  • shape (tuple) – Shape of the adjacency or biadjacency matrix. If not specified or if nodes are reindexed, the shape is the smallest compatible with node indices.

  • sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.

  • matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns:

graph

Return type:

Dataset (including node names) or sparse matrix

Examples

>>> edges = [(0, 1), (1, 2), (2, 0)]
>>> adjacency = from_edge_list(edges)
>>> adjacency.shape
(3, 3)
>>> edges = [('Alice', 'Bob'), ('Bob', 'Carol'), ('Carol', 'Alice')]
>>> graph = from_edge_list(edges)
>>> adjacency = graph.adjacency
>>> adjacency.shape
(3, 3)
>>> print(graph.names)
['Alice' 'Bob' 'Carol']

Adjacency list

sknetwork.data.from_adjacency_list(adjacency_list: List[List] | Dict[str, List], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = False, shape: tuple | None = None, sum_duplicates: bool = True, matrix_only: bool | None = None) Dataset | csr_matrix[source]

Load a graph from an adjacency list.

Parameters:
  • adjacency_list (Union[List[List], Dict[str, List]]) – Adjacency list (neighbors of each node) or dictionary (node: neighbors).

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix.

  • weighted (bool) – If True, returns a weighted graph.

  • reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.

  • shape (tuple) – Shape of the adjacency or biadjacency matrix. If not specified or if nodes are reindexed, the shape is the smallest compatible with node indices.

  • sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.

  • matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns:

graph

Return type:

Dataset or sparse matrix

Example

>>> edges = [[1, 2], [0, 2, 3], [0, 1]]
>>> adjacency = from_adjacency_list(edges)
>>> adjacency.shape
(4, 4)

Files

Check the tutorial for importing graphs from dataframes.

sknetwork.data.from_csv(file_path: str, delimiter: str | None = None, sep: str | None = None, comments: str = '#%', data_structure: str | None = None, directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = False, shape: tuple | None = None, sum_duplicates: bool = True, matrix_only: bool | None = None) Dataset | csr_matrix[source]

Load a graph from a CSV or TSV file. The delimiter can be specified (e.g., ‘ ‘ for space-separated values).

Parameters:
  • file_path (str) – Path to the CSV file.

  • delimiter (str) – Delimiter used in the file. Guessed if not specified.

  • sep (str) – Alias for delimiter.

  • comments (str) – Characters for comment lines.

  • data_structure (str) – If ‘edge_list’, consider each row of the file as an edge (tuple of size 2 or 3). If ‘adjacency_list’, consider each row of the file as an adjacency list (list of neighbors, in the order of node indices; an empty line means no neighbor). If ‘adjacency_dict’, consider each row of the file as an adjacency dictionary with key given by the first column (node: list of neighbors). If None (default), data_structure is guessed from the first rows of the file.

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix of shape (n1, n2).

  • weighted (bool) – If True, returns a weighted graph (e.g., counts the number of occurrences of each edge).

  • reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.

  • shape (tuple) – Shape of the adjacency or biadjacency matrix. If not specified or if nodes are reindexed, the shape is the smallest compatible with node indices.

  • sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.

  • matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns:

graph

Return type:

Dataset or sparse matrix

sknetwork.data.from_graphml(file_path: str, weight_key: str = 'weight', max_string_size: int = 512) Dataset[source]

Load graph from GraphML file.

Hyperedges and nested graphs are not supported.

Parameters:
  • file_path (str) – Path to the GraphML file.

  • weight_key (str) – The key to be used as a value for edge weights

  • max_string_size (int) – The maximum size for string features of the data

Returns:

data – The dataset in a Dataset with the adjacency as a CSR matrix.

Return type:

Dataset

Datasets

sknetwork.data.load_netset(name: str | None = None, data_home: str | Path | None = None, verbose: bool = True) Dataset | None[source]

Load a dataset from the NetSet collection.

Parameters:
  • name (str) – Name of the dataset (all low-case). Examples include ‘openflights’, ‘cinema’ and ‘wikivitals’.

  • data_home (str or pathlib.Path) – Folder to be used for dataset storage. This folder must be empty or contain other folders (datasets); files will be removed.

  • verbose (bool) – Enable verbosity.

Returns:

dataset – Returned dataset.

Return type:

Dataset

You can also find some datasets on NetRep.

Toy graphs

sknetwork.data.house(metadata: bool = False) csr_matrix | Dataset[source]

House graph.

  • Undirected graph

  • 5 nodes, 6 edges

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import house
>>> adjacency = house()
>>> adjacency.shape
(5, 5)
sknetwork.data.bow_tie(metadata: bool = False) csr_matrix | Dataset[source]

Bow tie graph.

  • Undirected graph

  • 5 nodes, 6 edges

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import bow_tie
>>> adjacency = bow_tie()
>>> adjacency.shape
(5, 5)
sknetwork.data.karate_club(metadata: bool = False) csr_matrix | Dataset[source]

Karate club graph.

  • Undirected graph

  • 34 nodes, 78 edges

  • 2 labels

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (labels, positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import karate_club
>>> adjacency = karate_club()
>>> adjacency.shape
(34, 34)

References

Zachary’s karate club graph https://en.wikipedia.org/wiki/Zachary%27s_karate_club

sknetwork.data.miserables(metadata: bool = False) csr_matrix | Dataset[source]

Co-occurrence graph of the characters in the novel Les miserables by Victor Hugo.

  • Undirected graph

  • 77 nodes, 508 edges

  • Names of characters

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (names, positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import miserables
>>> adjacency = miserables()
>>> adjacency.shape
(77, 77)
sknetwork.data.painters(metadata: bool = False) csr_matrix | Dataset[source]

Graph of links between some famous painters on Wikipedia.

  • Directed graph

  • 14 nodes, 50 edges

  • Names of painters

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (names, positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import painters
>>> adjacency = painters()
>>> adjacency.shape
(14, 14)
sknetwork.data.star_wars(metadata: bool = False) csr_matrix | Dataset[source]

Bipartite graph connecting some Star Wars villains to the movies in which they appear.

  • Bipartite graph

  • 7 nodes (4 villains, 3 movies), 8 edges

  • Names of villains and movies

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

biadjacency or graph – Biadjacency matrix or graph with metadata (names).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import star_wars
>>> biadjacency = star_wars()
>>> biadjacency.shape
(4, 3)
sknetwork.data.movie_actor(metadata: bool = False) csr_matrix | Dataset[source]

Bipartite graph connecting movies to some actors starring in them.

  • Bipartite graph

  • 32 nodes (15 movies, 17 actors), 43 edges

  • Names of movies (rows) and actors (columns)

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

biadjacency or dataset – Biadjacency matrix or dataset with metadata (names of movies and actors).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import movie_actor
>>> biadjacency = movie_actor()
>>> biadjacency.shape
(15, 17)
sknetwork.data.art_philo_science(metadata: bool = False) csr_matrix | Dataset[source]

Wikipedia links between 30 articles (10 artists, 10 philosophers, 10 scientists).

  • Directed graph

  • 30 nodes, 240 edges

  • Names of articles

Metadata includes the occurence of 11 words in the abstract of these articles.

Parameters:

metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (names, positions, labels, names_labels, biadjacency, names_col).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import art_philo_science
>>> adjacency = art_philo_science()
>>> adjacency.shape
(30, 30)

Models

sknetwork.data.linear_graph(n: int = 3, metadata: bool = False) csr_matrix | Dataset[source]

Linear graph (undirected).

Parameters:
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import linear_graph
>>> adjacency = linear_graph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.linear_digraph(n: int = 3, metadata: bool = False) csr_matrix | Dataset[source]

Linear graph (directed).

Parameters:
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import linear_digraph
>>> adjacency = linear_digraph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.cyclic_graph(n: int = 3, metadata: bool = False) csr_matrix | Dataset[source]

Cyclic graph (undirected).

Parameters:
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import cyclic_graph
>>> adjacency = cyclic_graph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.cyclic_digraph(n: int = 3, metadata: bool = False) csr_matrix | Dataset[source]

Cyclic graph (directed).

Parameters:
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import cyclic_digraph
>>> adjacency = cyclic_digraph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.grid(n1: int = 10, n2: int = 10, metadata: bool = False) csr_matrix | Dataset[source]

Grid (undirected).

Parameters:
  • n1 (int) – Grid dimension.

  • n2 (int) – Grid dimension.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import grid
>>> adjacency = grid(10, 5)
>>> adjacency.shape
(50, 50)
sknetwork.data.erdos_renyi(n: int = 20, p: float = 0.3, directed: bool = False, self_loops: bool = False, seed: int | None = None) csr_matrix[source]

Erdos-Renyi graph.

Parameters:
  • n – Number of nodes.

  • p – Probability of connection between nodes.

  • directed – If True, return a directed graph.

  • self_loops – If True, allow self-loops.

  • seed – Seed of the random generator (optional).

Returns:

adjacency – Adjacency matrix.

Return type:

sparse.csr_matrix

Example

>>> from sknetwork.data import erdos_renyi
>>> adjacency = erdos_renyi(7)
>>> adjacency.shape
(7, 7)

References

Erdős, P., Rényi, A. (1959). On Random Graphs. Publicationes Mathematicae.

sknetwork.data.block_model(sizes: Iterable, p_in: float | list | ndarray = 0.2, p_out: float = 0.05, directed: bool = False, self_loops: bool = False, metadata: bool = False, seed: int | None = None) csr_matrix | Dataset[source]

Stochastic block model.

Parameters:
  • sizes – Block sizes.

  • p_in – Probability of connection within blocks.

  • p_out – Probability of connection across blocks.

  • directed – If True, return a directed graph.

  • self_loops – If True, allow self-loops.

  • metadata – If True, return a Dataset object with labels.

  • seed – Seed of the random generator (optional).

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (labels).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import block_model
>>> sizes = np.array([4, 5])
>>> adjacency = block_model(sizes)
>>> adjacency.shape
(9, 9)

References

Airoldi, E., Blei, D., Feinberg, S., Xing, E. (2007). Mixed membership stochastic blockmodels. Journal of Machine Learning Research.

sknetwork.data.albert_barabasi(n: int = 100, degree: int = 3, directed: bool = False, seed: int | None = None) csr_matrix[source]

Albert-Barabasi model.

Parameters:
  • n (int) – Number of nodes.

  • degree (int) – Degree of incoming nodes (less than n).

  • directed (bool) – If True, return a directed graph.

  • seed – Seed of the random generator (optional).

Returns:

adjacency – Adjacency matrix.

Return type:

sparse.csr_matrix

Example

>>> from sknetwork.data import albert_barabasi
>>> adjacency = albert_barabasi(30, 3)
>>> adjacency.shape
(30, 30)

References

Albert, R., Barabási, L. (2002). Statistical mechanics of complex networks Reviews of Modern Physics.

sknetwork.data.watts_strogatz(n: int = 100, degree: int = 6, prob: float = 0.05, seed: int | None = None, metadata: bool = False) csr_matrix | Dataset[source]

Watts-Strogatz model.

Parameters:
  • n – Number of nodes.

  • degree – Initial degree of nodes.

  • prob – Probability of edge modification.

  • seed – Seed of the random generator (optional).

  • metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import watts_strogatz
>>> adjacency = watts_strogatz(30, 4, 0.02)
>>> adjacency.shape
(30, 30)

References

Watts, D., Strogatz, S. (1998). Collective dynamics of small-world networks, Nature.

Save

sknetwork.data.save(folder: str | Path, data: csr_matrix | Dataset)[source]

Save a dataset or a CSR matrix in the current directory to a collection of Numpy and Pickle files for faster subsequent loads. Supported attribute types include sparse matrices, NumPy arrays, strings and objects Dataset.

Parameters:
  • folder (str or pathlib.Path) – Name of the bundle folder.

  • data (Union[sparse.csr_matrix, Dataset]) – Data to save.

Example

>>> from sknetwork.data import save
>>> dataset = Dataset()
>>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5)
>>> dataset.names = np.array(['a', 'b', 'c'])
>>> save('dataset', dataset)
>>> 'dataset' in listdir('.')
True
sknetwork.data.load(folder: str | Path)[source]

Load a dataset from a previously created bundle from the current directory (inverse function of save).

Parameters:

folder (str) – Name of the bundle folder.

Returns:

data – Data.

Return type:

Dataset

Example

>>> from sknetwork.data import save
>>> dataset = Dataset()
>>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5)
>>> dataset.names = np.array(['a', 'b', 'c'])
>>> save('dataset', dataset)
>>> dataset = load('dataset')
>>> print(dataset.names)
['a' 'b' 'c']