Data
Tools for loading and saving graphs.
Edge list
- sknetwork.data.from_edge_list(edge_list: Union[numpy.ndarray, List[Tuple]], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = True, sum_duplicates: bool = True, matrix_only: Optional[bool] = None) Union[sknetwork.data.base.Bunch, scipy.sparse._csr.csr_matrix] [source]
Load a graph from an edge list.
- Parameters
edge_list (Union[np.ndarray, List[Tuple]]) – The edge list to convert, given as a NumPy array of size (n, 2) or (n, 3) or a list of tuples of length 2 or 3.
directed (bool) – If
True
, considers the graph as directed.bipartite (bool) – If
True
, returns a biadjacency matrix.weighted (bool) – If
True
, returns a weighted graph.reindex (bool) – If
True
, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.sum_duplicates (bool) – If
True
(default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.matrix_only (bool) – If
True
, returns only the adjacency or biadjacency matrix. Otherwise, returns aDataset
object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.
- Returns
graph
- Return type
Dataset
(including node names) or sparse matrix
Examples
>>> edges = [(0, 1), (1, 2), (2, 0)] >>> adjacency = from_edge_list(edges) >>> adjacency.shape (3, 3) >>> edges = [('Alice', 'Bob'), ('Bob', 'Carol'), ('Carol', 'Alice')] >>> graph = from_edge_list(edges) >>> adjacency = graph.adjacency >>> adjacency.shape (3, 3) >>> print(graph.names) ['Alice' 'Bob' 'Carol']
Adjacency list
- sknetwork.data.from_adjacency_list(adjacency_list: Union[List[List], Dict[str, List]], directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = True, sum_duplicates: bool = True, matrix_only: Optional[bool] = None) Union[sknetwork.data.base.Bunch, scipy.sparse._csr.csr_matrix] [source]
Load a graph from an adjacency list.
- Parameters
adjacency_list (Union[List[List], Dict[str, List]]) – Adjacency list (neighbors of each node) or dictionary (node: neighbors).
directed (bool) – If
True
, considers the graph as directed.bipartite (bool) – If
True
, returns a biadjacency matrix.weighted (bool) – If
True
, returns a weighted graph.reindex (bool) – If
True
, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.sum_duplicates (bool) – If
True
(default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.matrix_only (bool) – If
True
, returns only the adjacency or biadjacency matrix. Otherwise, returns aDataset
object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.
- Returns
graph
- Return type
Dataset
or sparse matrix
Example
>>> edges = [[1, 2], [0, 2, 3], [0, 1]] >>> adjacency = from_adjacency_list(edges) >>> adjacency.shape (4, 4)
Files
Check the tutorial for importing graphs from dataframes.
- sknetwork.data.from_csv(file_path: str, delimiter: Optional[str] = None, sep: Optional[str] = None, comments: str = '#%', data_structure: Optional[str] = None, directed: bool = False, bipartite: bool = False, weighted: bool = True, reindex: bool = True, sum_duplicates: bool = True, matrix_only: Optional[bool] = None) Union[sknetwork.data.base.Bunch, scipy.sparse._csr.csr_matrix] [source]
Load a graph from a CSV or TSV file. The delimiter can be specified (e.g., ‘ ‘ for space-separated values).
- Parameters
file_path (str) – Path to the CSV file.
delimiter (str) – Delimiter used in the file. Guessed if not specified.
sep (str) – Alias for delimiter.
comments (str) – Characters for comment lines.
data_structure (str) – If ‘edge_list’, considers each row of the file as an edge (tuple of size 2 or 3). If ‘adjacency_list’, considers each row of the file as an adjacency list (list of neighbors). If ‘adjacency_dict’, considers each row of the file as an adjacency dictionary with key given by the first column (node: list of neighbors). If
None
(default), data_structure is guessed from the first rows of the file.directed (bool) – If
True
, considers the graph as directed.bipartite (bool) – If
True
, returns a biadjacency matrix of shape (n1, n2).weighted (bool) – If
True
, returns a weighted graph (e.g., counts the number of occurrences of each edge).reindex (bool) – If
True
, reindex nodes and returns the original node indices as names. Reindexing is enforced if nodes are not integers.sum_duplicates (bool) – If
True
(default), sums weights of duplicate edges. Otherwise, the weight of each edge is that of the first occurrence of this edge.matrix_only (bool) – If
True
, returns only the adjacency or biadjacency matrix. Otherwise, returns aDataset
object with graph attributes (e.g., node names). If not specified (default), selects the most appropriate format.
- Returns
graph
- Return type
Dataset
or sparse matrix
- sknetwork.data.from_graphml(file_path: str, weight_key: str = 'weight', max_string_size: int = 512) sknetwork.data.base.Bunch [source]
Load graph from GraphML file.
Hyperedges and nested graphs are not supported.
- Parameters
file_path (str) – Path to the GraphML file.
weight_key (str) – The key to be used as a value for edge weights
max_string_size (int) – The maximum size for string features of the data
- Returns
data – The dataset in a Dataset with the adjacency as a CSR matrix.
- Return type
Bunch
Datasets
- sknetwork.data.load_netset(name: Optional[str] = None, data_home: Optional[Union[str, pathlib.Path]] = None, verbose: bool = True) Optional[sknetwork.data.base.Bunch] [source]
Load a dataset from the NetSet collection.
- Parameters
name (str) – Name of the dataset (all low-case). Examples include ‘openflights’, ‘cinema’ and ‘wikivitals’.
data_home (str or
pathlib.Path
) – Folder to be used for dataset storage. This folder must be empty or contain other folders (datasets); files will be removed.verbose (bool) – Enable verbosity.
- Returns
dataset – Returned dataset.
- Return type
Bunch
- sknetwork.data.load_konect(name: str, data_home: Optional[Union[str, pathlib.Path]] = None, auto_numpy_bundle: bool = True, verbose: bool = True) sknetwork.data.base.Bunch [source]
Load a dataset from the Konect database.
- Parameters
name (str) – Name of the dataset as specified on the Konect website (e.g. for the Zachary Karate club dataset, the corresponding name is
'ucidata-zachary'
).data_home (str or
pathlib.Path
) – Folder to be used for dataset storage.auto_numpy_bundle (bool) – Whether the dataset should be stored in its default format (False) or using Numpy files for faster subsequent access to the dataset (True).
verbose (bool) – Enable verbosity.
- Returns
dataset –
Object with the following attributes:
adjacency or biadjacency: the adjacency/biadjacency matrix for the dataset
meta: a dictionary containing the metadata as specified by Konect
each attribute specified by Konect (ent.* file)
- Return type
Bunch
Notes
An attribute meta of the Dataset class is used to store information about the dataset if present. In any case, meta has the attribute name which, if not given, is equal to the name of the dataset as passed to this function.
References
Kunegis, J. (2013, May). Konect: the Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web (pp. 1343-1350).
You can also find some datasets on NetRep.
Toy graphs
- sknetwork.data.house(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
House graph.
Undirected graph
5 nodes, 6 edges
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import house >>> adjacency = house() >>> adjacency.shape (5, 5)
- sknetwork.data.bow_tie(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Bow tie graph.
Undirected graph
5 nodes, 6 edges
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import bow_tie >>> adjacency = bow_tie() >>> adjacency.shape (5, 5)
- sknetwork.data.karate_club(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Karate club graph.
Undirected graph
34 nodes, 78 edges
2 labels
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
adjacency or graph – Adjacency matrix or graph with metadata (labels, positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import karate_club >>> adjacency = karate_club() >>> adjacency.shape (34, 34)
References
Zachary’s karate club graph https://en.wikipedia.org/wiki/Zachary%27s_karate_club
- sknetwork.data.miserables(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Co-occurrence graph of the characters in the novel Les miserables by Victor Hugo.
Undirected graph
77 nodes, 508 edges
Names of characters
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
adjacency or graph – Adjacency matrix or graph with metadata (names, positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import miserables >>> adjacency = miserables() >>> adjacency.shape (77, 77)
- sknetwork.data.painters(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Graph of links between some famous painters on Wikipedia.
Directed graph
14 nodes, 50 edges
Names of painters
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
adjacency or graph – Adjacency matrix or graph with metadata (names, positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import painters >>> adjacency = painters() >>> adjacency.shape (14, 14)
- sknetwork.data.star_wars(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Bipartite graph connecting some Star Wars villains to the movies in which they appear.
Bipartite graph
7 nodes (4 villains, 3 movies), 8 edges
Names of villains and movies
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
biadjacency or graph – Biadjacency matrix or graph with metadata (names).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import star_wars >>> biadjacency = star_wars() >>> biadjacency.shape (4, 3)
- sknetwork.data.movie_actor(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Bipartite graph connecting movies to some actors starring in them.
Bipartite graph
31 nodes (15 movies, 16 actors), 42 edges
9 labels (rows)
Names of movies (rows) and actors (columns)
Names of movies production company (rows)
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
biadjacency or graph – Biadjacency matrix or graph with metadata (names).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import movie_actor >>> biadjacency = movie_actor() >>> biadjacency.shape (15, 16)
- sknetwork.data.art_philo_science(metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Wikipedia links between 30 articles (10 artists, 10 philosophers, 10 scientists).
Directed graph
30 nodes, 240 edges
Names of articles
Metadata includes the occurence of 11 words in the abstract of these articles.
- Parameters
metadata – If
True
, return a Dataset object with metadata.- Returns
adjacency or graph – Adjacency matrix or graph with metadata (names, positions, labels, names_labels, biadjacency, names_col).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import art_philo_science >>> adjacency = art_philo_science() >>> adjacency.shape (30, 30)
Models
- sknetwork.data.linear_graph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Linear graph (undirected).
- Parameters
n (int) – Number of nodes.
metadata (bool) – If
True
, return a Dataset object with metadata.
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import linear_graph >>> adjacency = linear_graph(5) >>> adjacency.shape (5, 5)
- sknetwork.data.linear_digraph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Linear graph (directed).
- Parameters
n (int) – Number of nodes.
metadata (bool) – If
True
, return a Dataset object with metadata.
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import linear_digraph >>> adjacency = linear_digraph(5) >>> adjacency.shape (5, 5)
- sknetwork.data.cyclic_graph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Cyclic graph (undirected).
- Parameters
n (int) – Number of nodes.
metadata (bool) – If
True
, return a Dataset object with metadata.
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import cyclic_graph >>> adjacency = cyclic_graph(5) >>> adjacency.shape (5, 5)
- sknetwork.data.cyclic_digraph(n: int = 3, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Cyclic graph (directed).
- Parameters
n (int) – Number of nodes.
metadata (bool) – If
True
, return a Dataset object with metadata.
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import cyclic_digraph >>> adjacency = cyclic_digraph(5) >>> adjacency.shape (5, 5)
- sknetwork.data.grid(n1: int = 10, n2: int = 10, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Grid (undirected).
- Parameters
n1 (int) – Grid dimension.
n2 (int) – Grid dimension.
metadata (bool) – If
True
, return a Dataset object with metadata.
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import grid >>> adjacency = grid(10, 5) >>> adjacency.shape (50, 50)
- sknetwork.data.erdos_renyi(n: int = 20, p: float = 0.3, directed: bool = False, self_loops: bool = False, seed: Optional[int] = None) scipy.sparse._csr.csr_matrix [source]
Erdos-Renyi graph.
- Parameters
n – Number of nodes.
p – Probability of connection between nodes.
directed – If
True
, return a directed graph.self_loops – If
True
, allow self-loops.seed – Seed of the random generator (optional).
- Returns
adjacency – Adjacency matrix.
- Return type
sparse.csr_matrix
Example
>>> from sknetwork.data import erdos_renyi >>> adjacency = erdos_renyi(7) >>> adjacency.shape (7, 7)
References
Erdős, P., Rényi, A. (1959). On Random Graphs. Publicationes Mathematicae.
- sknetwork.data.block_model(sizes: Iterable, p_in: Union[float, list, numpy.ndarray] = 0.2, p_out: float = 0.05, directed: bool = False, self_loops: bool = False, metadata: bool = False, seed: Optional[int] = None) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Stochastic block model.
- Parameters
sizes – Block sizes.
p_in – Probability of connection within blocks.
p_out – Probability of connection across blocks.
directed – If
True
, return a directed graph.self_loops – If
True
, allow self-loops.metadata – If
True
, return a Dataset object with labels.seed – Seed of the random generator (optional).
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (labels).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import block_model >>> sizes = np.array([4, 5]) >>> adjacency = block_model(sizes) >>> adjacency.shape (9, 9)
References
Airoldi, E., Blei, D., Feinberg, S., Xing, E. (2007). Mixed membership stochastic blockmodels. Journal of Machine Learning Research.
- sknetwork.data.albert_barabasi(n: int = 100, degree: int = 3, directed: bool = False, seed: Optional[int] = None) scipy.sparse._csr.csr_matrix [source]
Albert-Barabasi model.
- Parameters
n (int) – Number of nodes.
degree (int) – Degree of incoming nodes (less than n).
directed (bool) – If
True
, return a directed graph.seed – Seed of the random generator (optional).
- Returns
adjacency – Adjacency matrix.
- Return type
sparse.csr_matrix
Example
>>> from sknetwork.data import albert_barabasi >>> adjacency = albert_barabasi(30, 3) >>> adjacency.shape (30, 30)
References
Albert, R., Barabási, L. (2002). Statistical mechanics of complex networks Reviews of Modern Physics.
- sknetwork.data.watts_strogatz(n: int = 100, degree: int = 6, prob: float = 0.05, seed: Optional[int] = None, metadata: bool = False) Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch] [source]
Watts-Strogatz model.
- Parameters
n – Number of nodes.
degree – Initial degree of nodes.
prob – Probability of edge modification.
seed – Seed of the random generator (optional).
metadata – If
True
, return a Dataset object with metadata.
- Returns
adjacency or graph – Adjacency matrix or graph with metadata (positions).
- Return type
Union[sparse.csr_matrix, Dataset]
Example
>>> from sknetwork.data import watts_strogatz >>> adjacency = watts_strogatz(30, 4, 0.02) >>> adjacency.shape (30, 30)
References
Watts, D., Strogatz, S. (1998). Collective dynamics of small-world networks, Nature.
Save
- sknetwork.data.save(folder: Union[str, pathlib.Path], data: Union[scipy.sparse._csr.csr_matrix, sknetwork.data.base.Bunch])[source]
Save a dataset or a CSR matrix in the current directory to a collection of Numpy and Pickle files for faster subsequent loads. Supported attribute types include sparse matrices, NumPy arrays, strings and objects Dataset.
- Parameters
folder (str or
pathlib.Path
) – Name of the bundle folder.data (Union[sparse.csr_matrix, Bunch]) – Data to save.
Example
>>> from sknetwork.data import save >>> dataset = Bunch() >>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5) >>> dataset.names = np.array(['a', 'b', 'c']) >>> save('dataset', dataset) >>> 'dataset' in listdir('.') True
- sknetwork.data.load(folder: Union[str, pathlib.Path])[source]
Load a dataset from a previously created bundle from the current directory (inverse function of
save
).- Parameters
folder (str) – Name of the bundle folder.
- Returns
data – Data.
- Return type
Bunch
Example
>>> from sknetwork.data import save >>> dataset = Bunch() >>> dataset.adjacency = sparse.csr_matrix(np.random.random((3, 3)) < 0.5) >>> dataset.names = np.array(['a', 'b', 'c']) >>> save('dataset', dataset) >>> dataset = load('dataset') >>> print(dataset.names) ['a' 'b' 'c']