Data

Tools for loading and saving graphs.

Edge list

sknetwork.data.from_edge_list(edge_list: ndarray | List[Tuple], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = False, shape: tuple | None = None, sum_duplicates: bool = True, matrix_only: bool | None = None) → Bunch | csr_matrix[source]

Load a graph from an edge list.

Parameters:

edge_list (Union[np.ndarray, List[Tuple]]) – The edge list to convert, given as a NumPy array of size (n, 2) or (n, 3) or a list of tuples of length 2 or 3.
directed (bool) – If True, considers the graph as directed.
bipartite (bool) – If True, returns a biadjacency matrix.
weighted (bool) – If True, returns a weighted graph.
reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.
shape (tuple) – Shape of the adjacency or biadjacency matrix. If not specified or if nodes are reindexed, the shape is the smallest compatible with node indices.
sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.
matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns:

graph

Return type:

Dataset (including node names) or sparse matrix

Examples

>>> edges = [(0, 1), (1, 2), (2, 0)]
>>> adjacency = from_edge_list(edges)
>>> adjacency.shape
(3, 3)
>>> edges = [('Alice', 'Bob'), ('Bob', 'Carol'), ('Carol', 'Alice')]
>>> graph = from_edge_list(edges)
>>> adjacency = graph.adjacency
>>> adjacency.shape
(3, 3)
>>> print(graph.names)
['Alice' 'Bob' 'Carol']

Adjacency list

sknetwork.data.from_adjacency_list(adjacency_list: List[List] | Dict[str, List], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = False, shape: tuple | None = None, sum_duplicates: bool = True, matrix_only: bool | None = None) → Bunch | csr_matrix[source]

Load a graph from an adjacency list.

Parameters:

adjacency_list (Union[List[List], Dict[str, List]]) – Adjacency list (neighbors of each node) or dictionary (node: neighbors).
directed (bool) – If True, considers the graph as directed.
bipartite (bool) – If True, returns a biadjacency matrix.
weighted (bool) – If True, returns a weighted graph.
reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.
shape (tuple) – Shape of the adjacency or biadjacency matrix. If not specified or if nodes are reindexed, the shape is the smallest compatible with node indices.
sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.
matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns:

graph

Return type:

Dataset or sparse matrix

Example

>>> edges = [[1, 2], [0, 2, 3], [0, 1]]
>>> adjacency = from_adjacency_list(edges)
>>> adjacency.shape
(4, 4)

Files

Check the tutorial for importing graphs from dataframes.

sknetwork.data.from_csv(file_path: str, delimiter: str | None = None, sep: str | None = None, comments: str = '#%', data_structure: str | None = None, directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = False, shape: tuple | None = None, sum_duplicates: bool = True, matrix_only: bool | None = None) → Bunch | csr_matrix[source]

Load a graph from a CSV or TSV file. The delimiter can be specified (e.g., ‘ ‘ for space-separated values).

Parameters:

file_path (str) – Path to the CSV file.
delimiter (str) – Delimiter used in the file. Guessed if not specified.
sep (str) – Alias for delimiter.
comments (str) – Characters for comment lines.
data_structure (str) – If ‘edge_list’, consider each row of the file as an edge (tuple of size 2 or 3). If ‘adjacency_list’, consider each row of the file as an adjacency list (list of neighbors, in the order of node indices; an empty line means no neighbor). If ‘adjacency_dict’, consider each row of the file as an adjacency dictionary with key given by the first column (node: list of neighbors). If None (default), data_structure is guessed from the first rows of the file.
directed (bool) – If True, considers the graph as directed.
bipartite (bool) – If True, returns a biadjacency matrix of shape (n1, n2).
weighted (bool) – If True, returns a weighted graph (e.g., counts the number of occurrences of each edge).
reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.
shape (tuple) – Shape of the adjacency or biadjacency matrix. If not specified or if nodes are reindexed, the shape is the smallest compatible with node indices.
sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.
matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns:

graph

Return type:

Dataset or sparse matrix

sknetwork.data.from_graphml(file_path: str, weight_key: str = 'weight', max_string_size: int = 512) → Bunch[source]

Load graph from GraphML file.

Hyperedges and nested graphs are not supported.

Parameters:

file_path (str) – Path to the GraphML file.
weight_key (str) – The key to be used as a value for edge weights
max_string_size (int) – The maximum size for string features of the data

Returns:

data – The dataset in a Dataset with the adjacency as a CSR matrix.

Return type:

Bunch

Datasets

sknetwork.data.load_netset(name: str | None = None, data_home: str | Path | None = None, verbose: bool = True) → Bunch | None[source]

Load a dataset from the NetSet collection.

Parameters:

name (str) – Name of the dataset (all low-case). Examples include ‘openflights’, ‘cinema’ and ‘wikivitals’.
data_home (str or pathlib.Path) – Folder to be used for dataset storage. This folder must be empty or contain other folders (datasets); files will be removed.
verbose (bool) – Enable verbosity.

Returns:

dataset – Returned dataset.

Return type:

Bunch

sknetwork.data.load_konect(name: str, data_home: str | Path | None = None, auto_numpy_bundle: bool = True, verbose: bool = True) → Bunch[source]

Load a dataset from the Konect database.

Parameters:

name (str) – Name of the dataset as specified on the Konect website (e.g. for the Zachary Karate club dataset, the corresponding name is 'ucidata-zachary').
data_home (str or pathlib.Path) – Folder to be used for dataset storage.
auto_numpy_bundle (bool) – Whether the dataset should be stored in its default format (False) or using Numpy files for faster subsequent access to the dataset (True).
verbose (bool) – Enable verbosity.

Returns:

dataset –

Object with the following attributes:

adjacency or biadjacency: the adjacency/biadjacency matrix for the dataset

meta: a dictionary containing the metadata as specified by Konect

each attribute specified by Konect (ent.* file)

Return type:

Bunch

Notes

An attribute meta of the Dataset class is used to store information about the dataset if present. In any case, meta has the attribute name which, if not given, is equal to the name of the dataset as passed to this function.

References

Kunegis, J. (2013, May). Konect: the Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web (pp. 1343-1350).

You can also find some datasets on NetRep.

Toy graphs

sknetwork.data.house(metadata: bool = False) → csr_matrix | Bunch[source]

House graph.

Undirected graph
5 nodes, 6 edges

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: adjacency or graph – Adjacency matrix or graph with metadata (positions).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import house
>>> adjacency = house()
>>> adjacency.shape
(5, 5)

sknetwork.data.bow_tie(metadata: bool = False) → csr_matrix | Bunch[source]

Bow tie graph.

Undirected graph
5 nodes, 6 edges

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: adjacency or graph – Adjacency matrix or graph with metadata (positions).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import bow_tie
>>> adjacency = bow_tie()
>>> adjacency.shape
(5, 5)

sknetwork.data.karate_club(metadata: bool = False) → csr_matrix | Bunch[source]

Karate club graph.

Undirected graph
34 nodes, 78 edges
2 labels

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: adjacency or graph – Adjacency matrix or graph with metadata (labels, positions).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import karate_club
>>> adjacency = karate_club()
>>> adjacency.shape
(34, 34)

References

Zachary’s karate club graph https://en.wikipedia.org/wiki/Zachary%27s_karate_club

sknetwork.data.miserables(metadata: bool = False) → csr_matrix | Bunch[source]

Co-occurrence graph of the characters in the novel Les miserables by Victor Hugo.

Undirected graph
77 nodes, 508 edges
Names of characters

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: adjacency or graph – Adjacency matrix or graph with metadata (names, positions).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import miserables
>>> adjacency = miserables()
>>> adjacency.shape
(77, 77)

sknetwork.data.painters(metadata: bool = False) → csr_matrix | Bunch[source]

Graph of links between some famous painters on Wikipedia.

Directed graph
14 nodes, 50 edges
Names of painters

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: adjacency or graph – Adjacency matrix or graph with metadata (names, positions).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import painters
>>> adjacency = painters()
>>> adjacency.shape
(14, 14)

sknetwork.data.star_wars(metadata: bool = False) → csr_matrix | Bunch[source]

Bipartite graph connecting some Star Wars villains to the movies in which they appear.

Bipartite graph
7 nodes (4 villains, 3 movies), 8 edges
Names of villains and movies

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: biadjacency or graph – Biadjacency matrix or graph with metadata (names).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import star_wars
>>> biadjacency = star_wars()
>>> biadjacency.shape
(4, 3)

sknetwork.data.movie_actor(metadata: bool = False) → csr_matrix | Bunch[source]

Bipartite graph connecting movies to some actors starring in them.

Bipartite graph
31 nodes (15 movies, 16 actors), 42 edges
9 labels (rows)
Names of movies (rows) and actors (columns)
Names of movies production company (rows)

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: biadjacency or graph – Biadjacency matrix or graph with metadata (names).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import movie_actor
>>> biadjacency = movie_actor()
>>> biadjacency.shape
(15, 16)

sknetwork.data.art_philo_science(metadata: bool = False) → csr_matrix | Bunch[source]

Wikipedia links between 30 articles (10 artists, 10 philosophers, 10 scientists).

Directed graph
30 nodes, 240 edges
Names of articles

Metadata includes the occurence of 11 words in the abstract of these articles.

Parameters:: metadata – If True, return a Dataset object with metadata.
Returns:: adjacency or graph – Adjacency matrix or graph with metadata (names, positions, labels, names_labels, biadjacency, names_col).
Return type:: Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import art_philo_science
>>> adjacency = art_philo_science()
>>> adjacency.shape
(30, 30)

Models

sknetwork.data.linear_graph(n: int = 3, metadata: bool = False) → csr_matrix | Bunch[source]

Linear graph (undirected).

Parameters:

n (int) – Number of nodes.
metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import linear_graph
>>> adjacency = linear_graph(5)
>>> adjacency.shape
(5, 5)

sknetwork.data.linear_digraph(n: int = 3, metadata: bool = False) → csr_matrix | Bunch[source]

Linear graph (directed).

Parameters:

n (int) – Number of nodes.
metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import linear_digraph
>>> adjacency = linear_digraph(5)
>>> adjacency.shape
(5, 5)

sknetwork.data.cyclic_graph(n: int = 3, metadata: bool = False) → csr_matrix | Bunch[source]

Cyclic graph (undirected).

Parameters:

n (int) – Number of nodes.
metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import cyclic_graph
>>> adjacency = cyclic_graph(5)
>>> adjacency.shape
(5, 5)

sknetwork.data.cyclic_digraph(n: int = 3, metadata: bool = False) → csr_matrix | Bunch[source]

Cyclic graph (directed).

Parameters:

n (int) – Number of nodes.
metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import cyclic_digraph
>>> adjacency = cyclic_digraph(5)
>>> adjacency.shape
(5, 5)

sknetwork.data.grid(n1: int = 10, n2: int = 10, metadata: bool = False) → csr_matrix | Bunch[source]

Grid (undirected).

Parameters:

n1 (int) – Grid dimension.
n2 (int) – Grid dimension.
metadata (bool) – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import grid
>>> adjacency = grid(10, 5)
>>> adjacency.shape
(50, 50)

sknetwork.data.erdos_renyi(n: int = 20, p: float = 0.3, directed: bool = False, self_loops: bool = False, seed: int | None = None) → csr_matrix[source]

Erdos-Renyi graph.

Parameters:

n – Number of nodes.
p – Probability of connection between nodes.
directed – If True, return a directed graph.
self_loops – If True, allow self-loops.
seed – Seed of the random generator (optional).

Returns:

adjacency – Adjacency matrix.

Return type:

sparse.csr_matrix

Example

>>> from sknetwork.data import erdos_renyi
>>> adjacency = erdos_renyi(7)
>>> adjacency.shape
(7, 7)

References

Erdős, P., Rényi, A. (1959). On Random Graphs. Publicationes Mathematicae.

sknetwork.data.block_model(sizes: Iterable, p_in: float | list | ndarray = 0.2, p_out: float = 0.05, directed: bool = False, self_loops: bool = False, metadata: bool = False, seed: int | None = None) → csr_matrix | Bunch[source]

Stochastic block model.

Parameters:

sizes – Block sizes.
p_in – Probability of connection within blocks.
p_out – Probability of connection across blocks.
directed – If True, return a directed graph.
self_loops – If True, allow self-loops.
metadata – If True, return a Dataset object with labels.
seed – Seed of the random generator (optional).

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (labels).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import block_model
>>> sizes = np.array([4, 5])
>>> adjacency = block_model(sizes)
>>> adjacency.shape
(9, 9)

References

Airoldi, E., Blei, D., Feinberg, S., Xing, E. (2007). Mixed membership stochastic blockmodels. Journal of Machine Learning Research.

sknetwork.data.albert_barabasi(n: int = 100, degree: int = 3, directed: bool = False, seed: int | None = None) → csr_matrix[source]

Albert-Barabasi model.

Parameters:

n (int) – Number of nodes.
degree (int) – Degree of incoming nodes (less than n).
directed (bool) – If True, return a directed graph.
seed – Seed of the random generator (optional).

Returns:

adjacency – Adjacency matrix.

Return type:

sparse.csr_matrix

Example

>>> from sknetwork.data import albert_barabasi
>>> adjacency = albert_barabasi(30, 3)
>>> adjacency.shape
(30, 30)

References

Albert, R., Barabási, L. (2002). Statistical mechanics of complex networks Reviews of Modern Physics.

sknetwork.data.watts_strogatz(n: int = 100, degree: int = 6, prob: float = 0.05, seed: int | None = None, metadata: bool = False) → csr_matrix | Bunch[source]

Watts-Strogatz model.

Parameters:

n – Number of nodes.
degree – Initial degree of nodes.
prob – Probability of edge modification.
seed – Seed of the random generator (optional).
metadata – If True, return a Dataset object with metadata.

Returns:

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type:

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import watts_strogatz
>>> adjacency = watts_strogatz(30, 4, 0.02)
>>> adjacency.shape
(30, 30)

References

Watts, D., Strogatz, S. (1998). Collective dynamics of small-world networks, Nature.

Save

sknetwork.data.save(folder: str | Path, data: csr_matrix | Bunch)[source]

Save a dataset or a CSR matrix in the current directory to a collection of Numpy and Pickle files for faster subsequent loads. Supported attribute types include sparse matrices, NumPy arrays, strings and objects Dataset.

Parameters:

folder (str or pathlib.Path) – Name of the bundle folder.
data (Union[sparse.csr_matrix, Bunch]) – Data to save.

Example

>>> from sknetwork.data import save
>>> dataset = Bunch()
>>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5)
>>> dataset.names = np.array(['a', 'b', 'c'])
>>> save('dataset', dataset)
>>> 'dataset' in listdir('.')
True

sknetwork.data.load(folder: str | Path)[source]

Load a dataset from a previously created bundle from the current directory (inverse function of save).

Parameters:: folder (str) – Name of the bundle folder.
Returns:: data – Data.
Return type:: Bunch

Example

>>> from sknetwork.data import save
>>> dataset = Bunch()
>>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5)
>>> dataset.names = np.array(['a', 'b', 'c'])
>>> save('dataset', dataset)
>>> dataset = load('dataset')
>>> print(dataset.names)
['a' 'b' 'c']