Data

Tools for loading and saving graphs.

Edge list

sknetwork.data.from_edge_list(edge_list: Union[numpy.ndarray, List[Tuple]], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = True, sum_duplicates: bool = True, matrix_only: Optional[bool] = None) Union[sknetwork.data.base.Bunch, scipy.sparse._csr.csr_matrix][source]

Load a graph from an edge list.

Parameters
  • edge_list (Union[np.ndarray, List[Tuple]]) – The edge list to convert, given as a NumPy array of size (n, 2) or (n, 3) or a list of tuples of length 2 or 3.

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix.

  • weighted (bool) – If True, returns a weighted graph.

  • reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.

  • sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.

  • matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns

graph

Return type

Dataset (including node names) or sparse matrix

Examples

>>> edges = [(0, 1), (1, 2), (2, 0)]
>>> adjacency = from_edge_list(edges)
>>> adjacency.shape
(3, 3)
>>> edges = [('Alice', 'Bob'), ('Bob', 'Carol'), ('Carol', 'Alice')]
>>> graph = from_edge_list(edges)
>>> adjacency = graph.adjacency
>>> adjacency.shape
(3, 3)
>>> print(graph.names)
['Alice' 'Bob' 'Carol']

Adjacency list

sknetwork.data.from_adjacency_list(adjacency_list: Union[List[List], Dict[str, List]], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = True, sum_duplicates: bool = True, matrix_only: Optional[bool] = None) Union[sknetwork.data.base.Bunch, scipy.sparse._csr.csr_matrix][source]

Load a graph from an adjacency list.

Parameters
  • adjacency_list (Union[List[List], Dict[str, List]]) – Adjacency list (neighbors of each node) or dictionary (node: neighbors).

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix.

  • weighted (bool) – If True, returns a weighted graph.

  • reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.

  • sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.

  • matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns

graph

Return type

Dataset or sparse matrix

Example

>>> edges = [[1, 2], [0, 2, 3], [0, 1]]
>>> adjacency = from_adjacency_list(edges)
>>> adjacency.shape
(4, 4)

Files

Check the tutorial for importing graphs from dataframes.

sknetwork.data.from_csv(file_path: str, delimiter: Optional[str] = None, sep: Optional[str] = None, comments: str = '#%', data_structure: Optional[str] = None, directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = True, sum_duplicates: bool = True, matrix_only: Optional[bool] = None) Union[sknetwork.data.base.Bunch, scipy.sparse._csr.csr_matrix][source]

Load a graph from a CSV or TSV file. The delimiter can be specified (e.g., ‘ ‘ for space-separated values).

Parameters
  • file_path (str) – Path to the CSV file.

  • delimiter (str) – Delimiter used in the file. Guessed if not specified.

  • sep (str) – Alias for delimiter.

  • comments (str) – Characters for comment lines.

  • data_structure (str) – If ‘edge_list’, considers each row of the file as an edge (tuple of size 2 or 3). If ‘adjacency_list’, considers each row of the file as an adjacency list (list of neighbors). If ‘adjacency_dict’, considers each row of the file as an adjacency dictionary with key given by the first column (node: list of neighbors). If None (default), data_structure is guessed from the first rows of the file.

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix of shape (n1, n2).

  • weighted (bool) – If True, returns a weighted graph (e.g., counts the number of occurrences of each edge).

  • reindex (bool) – If True, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.

  • sum_duplicates (bool) – If True (default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.

  • matrix_only (bool) – If True, returns only the adjacency or biadjacency matrix. Otherwise, returns a Dataset object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.

Returns

graph

Return type

Dataset or sparse matrix

sknetwork.data.from_graphml(file_path: str, weight_key: str = 'weight', max_string_size: int = 512) sknetwork.data.base.Bunch[source]

Load graph from GraphML file.

Hyperedges and nested graphs are not supported.

Parameters
  • file_path (str) – Path to the GraphML file.

  • weight_key (str) – The key to be used as a value for edge weights

  • max_string_size (int) – The maximum size for string features of the data

Returns

data – The dataset in a Dataset with the adjacency as a CSR matrix.

Return type

Bunch

Datasets

sknetwork.data.load_netset(name: Optional[str] = None, data_home: Optional[Union[str, pathlib.Path]] = None, verbose: bool = True) Optional[sknetwork.data.base.Bunch][source]

Load a dataset from the NetSet collection.

Parameters
  • name (str) – Name of the dataset (all low-case). Examples include ‘openflights’, ‘cinema’ and ‘wikivitals’.

  • data_home (str or pathlib.Path) – Folder to be used for dataset storage. This folder must be empty or contain other folders (datasets); files will be removed.

  • verbose (bool) – Enable verbosity.

Returns

dataset – Returned dataset.

Return type

Bunch

sknetwork.data.load_konect(name: str, data_home: Optional[Union[str, pathlib.Path]] = None, auto_numpy_bundle: bool = True, verbose: bool = True) sknetwork.data.base.Bunch[source]

Load a dataset from the Konect database.

Parameters
  • name (str) – Name of the dataset as specified on the Konect website (e.g. for the Zachary Karate club dataset, the corresponding name is 'ucidata-zachary').

  • data_home (str or pathlib.Path) – Folder to be used for dataset storage.

  • auto_numpy_bundle (bool) – Whether the dataset should be stored in its default format (False) or using Numpy files for faster subsequent access to the dataset (True).

  • verbose (bool) – Enable verbosity.

Returns

dataset

Object with the following attributes:

  • adjacency or biadjacency: the adjacency/biadjacency matrix for the dataset

  • meta: a dictionary containing the metadata as specified by Konect

  • each attribute specified by Konect (ent.* file)

Return type

Bunch

Notes

An attribute meta of the Dataset class is used to store information about the dataset if present. In any case, meta has the attribute name which, if not given, is equal to the name of the dataset as passed to this function.

References

Kunegis, J. (2013, May). Konect: the Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web (pp. 1343-1350).

You can also find some datasets on NetRep.

Toy graphs

sknetwork.data.house(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

House graph.

  • Undirected graph

  • 5 nodes, 6 edges

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import house
>>> adjacency = house()
>>> adjacency.shape
(5, 5)
sknetwork.data.bow_tie(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Bow tie graph.

  • Undirected graph

  • 5 nodes, 6 edges

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import bow_tie
>>> adjacency = bow_tie()
>>> adjacency.shape
(5, 5)
sknetwork.data.karate_club(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Karate club graph.

  • Undirected graph

  • 34 nodes, 78 edges

  • 2 labels

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (labels, positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import karate_club
>>> adjacency = karate_club()
>>> adjacency.shape
(34, 34)

References

Zachary’s karate club graph https://en.wikipedia.org/wiki/Zachary%27s_karate_club

sknetwork.data.miserables(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Co-occurrence graph of the characters in the novel Les miserables by Victor Hugo.

  • Undirected graph

  • 77 nodes, 508 edges

  • Names of characters

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (names, positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import miserables
>>> adjacency = miserables()
>>> adjacency.shape
(77, 77)
sknetwork.data.painters(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Graph of links between some famous painters on Wikipedia.

  • Directed graph

  • 14 nodes, 50 edges

  • Names of painters

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (names, positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import painters
>>> adjacency = painters()
>>> adjacency.shape
(14, 14)
sknetwork.data.star_wars(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Bipartite graph connecting some Star Wars villains to the movies in which they appear.

  • Bipartite graph

  • 7 nodes (4 villains, 3 movies), 8 edges

  • Names of villains and movies

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

biadjacency or graph – Biadjacency matrix or graph with metadata (names).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import star_wars
>>> biadjacency = star_wars()
>>> biadjacency.shape
(4, 3)
sknetwork.data.movie_actor(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Bipartite graph connecting movies to some actors starring in them.

  • Bipartite graph

  • 31 nodes (15 movies, 16 actors), 42 edges

  • 9 labels (rows)

  • Names of movies (rows) and actors (columns)

  • Names of movies production company (rows)

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

biadjacency or graph – Biadjacency matrix or graph with metadata (names).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import movie_actor
>>> biadjacency = movie_actor()
>>> biadjacency.shape
(15, 16)
sknetwork.data.art_philo_science(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Wikipedia links between 30 articles (10 artists, 10 philosophers, 10 scientists).

  • Directed graph

  • 30 nodes, 240 edges

  • Names of articles

Metadata includes the occurence of 11 words in the abstract of these articles.

Parameters

metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (names, positions, labels, names_labels, biadjacency, names_col).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import art_philo_science
>>> adjacency = art_philo_science()
>>> adjacency.shape
(30, 30)

Models

sknetwork.data.linear_graph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Linear graph (undirected).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import linear_graph
>>> adjacency = linear_graph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.linear_digraph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Linear graph (directed).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import linear_digraph
>>> adjacency = linear_digraph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.cyclic_graph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Cyclic graph (undirected).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import cyclic_graph
>>> adjacency = cyclic_graph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.cyclic_digraph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Cyclic graph (directed).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import cyclic_digraph
>>> adjacency = cyclic_digraph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.grid(n1: int = 10, n2: int = 10, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Grid (undirected).

Parameters
  • n1 (int) – Grid dimension.

  • n2 (int) – Grid dimension.

  • metadata (bool) – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import grid
>>> adjacency = grid(10, 5)
>>> adjacency.shape
(50, 50)
sknetwork.data.erdos_renyi(n: int = 20, p: float = 0.3, directed: bool = False, self_loops: bool = False, seed: Optional[int] = None) scipy.sparse._csr.csr_matrix[source]

Erdos-Renyi graph.

Parameters
  • n – Number of nodes.

  • p – Probability of connection between nodes.

  • directed – If True, return a directed graph.

  • self_loops – If True, allow self-loops.

  • seed – Seed of the random generator (optional).

Returns

adjacency – Adjacency matrix.

Return type

sparse.csr_matrix

Example

>>> from sknetwork.data import erdos_renyi
>>> adjacency = erdos_renyi(7)
>>> adjacency.shape
(7, 7)

References

Erdős, P., Rényi, A. (1959). On Random Graphs. Publicationes Mathematicae.

sknetwork.data.block_model(sizes: Iterable, p_in: Union[float, list, numpy.ndarray] = 0.2, p_out: float = 0.05, directed: bool = False, self_loops: bool = False, metadata: bool = False, seed: Optional[int] = None) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Stochastic block model.

Parameters
  • sizes – Block sizes.

  • p_in – Probability of connection within blocks.

  • p_out – Probability of connection across blocks.

  • directed – If True, return a directed graph.

  • self_loops – If True, allow self-loops.

  • metadata – If True, return a Dataset object with labels.

  • seed – Seed of the random generator (optional).

Returns

adjacency or graph – Adjacency matrix or graph with metadata (labels).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import block_model
>>> sizes = np.array([4, 5])
>>> adjacency = block_model(sizes)
>>> adjacency.shape
(9, 9)

References

Airoldi, E., Blei, D., Feinberg, S., Xing, E. (2007). Mixed membership stochastic blockmodels. Journal of Machine Learning Research.

sknetwork.data.albert_barabasi(n: int = 100, degree: int = 3, directed: bool = False, seed: Optional[int] = None) scipy.sparse._csr.csr_matrix[source]

Albert-Barabasi model.

Parameters
  • n (int) – Number of nodes.

  • degree (int) – Degree of incoming nodes (less than n).

  • directed (bool) – If True, return a directed graph.

  • seed – Seed of the random generator (optional).

Returns

adjacency – Adjacency matrix.

Return type

sparse.csr_matrix

Example

>>> from sknetwork.data import albert_barabasi
>>> adjacency = albert_barabasi(30, 3)
>>> adjacency.shape
(30, 30)

References

Albert, R., Barabási, L. (2002). Statistical mechanics of complex networks Reviews of Modern Physics.

sknetwork.data.watts_strogatz(n: int = 100, degree: int = 6, prob: float = 0.05, seed: Optional[int] = None, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch][source]

Watts-Strogatz model.

Parameters
  • n – Number of nodes.

  • degree – Initial degree of nodes.

  • prob – Probability of edge modification.

  • seed – Seed of the random generator (optional).

  • metadata – If True, return a Dataset object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Dataset]

Example

>>> from sknetwork.data import watts_strogatz
>>> adjacency = watts_strogatz(30, 4, 0.02)
>>> adjacency.shape
(30, 30)

References

Watts, D., Strogatz, S. (1998). Collective dynamics of small-world networks, Nature.

Save

sknetwork.data.save(folder: Union[str, pathlib.Path], data: Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch])[source]

Save a dataset or a CSR matrix in the current directory to a collection of Numpy and Pickle files for faster subsequent loads. Supported attribute types include sparse matrices, NumPy arrays, strings and objects Dataset.

Parameters
  • folder (str or pathlib.Path) – Name of the bundle folder.

  • data (Union[sparse.csr_matrix, Bunch]) – Data to save.

Example

>>> from sknetwork.data import save
>>> dataset = Bunch()
>>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5)
>>> dataset.names = np.array(['a', 'b', 'c'])
>>> save('dataset', dataset)
>>> 'dataset' in listdir('.')
True
sknetwork.data.load(folder: Union[str, pathlib.Path])[source]

Load a dataset from a previously created bundle from the current directory (inverse function of save).

Parameters

folder (str) – Name of the bundle folder.

Returns

data – Data.

Return type

Bunch

Example

>>> from sknetwork.data import save
>>> dataset = Bunch()
>>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5)
>>> dataset.names = np.array(['a', 'b', 'c'])
>>> save('dataset', dataset)
>>> dataset = load('dataset')
>>> print(dataset.names)
['a' 'b' 'c']