Data

Tools for importing and exporting data.

Toy graphs

sknetwork.data.house(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

House graph.

  • Undirected graph

  • 5 nodes, 6 edges

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import house
>>> adjacency = house()
>>> adjacency.shape
(5, 5)
sknetwork.data.bow_tie(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Bow tie graph.

  • Undirected graph

  • 5 nodes, 6 edges

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import bow_tie
>>> adjacency = bow_tie()
>>> adjacency.shape
(5, 5)
sknetwork.data.karate_club(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Karate club graph.

  • Undirected graph

  • 34 nodes, 78 edges

  • 2 labels

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (labels, positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import karate_club
>>> adjacency = karate_club()
>>> adjacency.shape
(34, 34)

References

Zachary’s karate club graph https://en.wikipedia.org/wiki/Zachary%27s_karate_club

sknetwork.data.miserables(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Co-occurrence graph of the characters in the novel Les miserables by Victor Hugo.

  • Undirected graph

  • 77 nodes, 508 edges

  • Names of characters

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (names, positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import miserables
>>> adjacency = miserables()
>>> adjacency.shape
(77, 77)
sknetwork.data.painters(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Graph of links between some famous painters on Wikipedia.

  • Directed graph

  • 14 nodes, 50 edges

  • Names of painters

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (names, positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import painters
>>> adjacency = painters()
>>> adjacency.shape
(14, 14)
sknetwork.data.star_wars(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Bipartite graph connecting some Star Wars villains to the movies in which they appear.

  • Bipartite graph

  • 7 nodes (4 villains, 3 movies), 8 edges

  • Names of villains and movies

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

biadjacency or graph – Biadjacency matrix or graph with metadata (names).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import star_wars
>>> biadjacency = star_wars()
>>> biadjacency.shape
(4, 3)
sknetwork.data.movie_actor(metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Bipartite graph connecting movies to some actors starring in them.

  • Bipartite graph

  • 31 nodes (15 movies, 16 actors), 42 edges

  • 9 labels (rows)

  • Names of movies (rows) and actors (columns)

  • Names of movies production company (rows)

Parameters

metadata – If True, return a Bunch object with metadata.

Returns

biadjacency or graph – Biadjacency matrix or graph with metadata (names).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import movie_actor
>>> biadjacency = movie_actor()
>>> biadjacency.shape
(15, 16)

Models

sknetwork.data.linear_graph(n: int = 3, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Linear graph (undirected).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import linear_graph
>>> adjacency = linear_graph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.linear_digraph(n: int = 3, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Linear graph (directed).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import linear_digraph
>>> adjacency = linear_digraph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.cyclic_graph(n: int = 3, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Cyclic graph (undirected).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import cyclic_graph
>>> adjacency = cyclic_graph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.cyclic_digraph(n: int = 3, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Cyclic graph (directed).

Parameters
  • n (int) – Number of nodes.

  • metadata (bool) – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import cyclic_digraph
>>> adjacency = cyclic_digraph(5)
>>> adjacency.shape
(5, 5)
sknetwork.data.grid(n1: int = 10, n2: int = 10, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Grid (undirected).

Parameters
  • n1 (int) – Grid dimension.

  • n2 (int) – Grid dimension.

  • metadata (bool) – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import grid
>>> adjacency = grid(10, 5)
>>> adjacency.shape
(50, 50)
sknetwork.data.erdos_renyi(n: int = 20, p: float = 0.3, random_state: Optional[int] = None) → scipy.sparse.csr.csr_matrix[source]

Erdos-Renyi graph.

Parameters
  • n – Number of nodes.

  • p – Probability of connection between nodes.

  • random_state – Seed of the random generator (optional).

Returns

adjacency – Adjacency matrix.

Return type

sparse.csr_matrix

Example

>>> from sknetwork.data import erdos_renyi
>>> adjacency = erdos_renyi(7)
>>> adjacency.shape
(7, 7)

References

Erdős, P., Rényi, A. (1959). On Random Graphs. Publicationes Mathematicae.

sknetwork.data.block_model(sizes: Iterable, p_in: Union[float, list, numpy.ndarray] = 0.2, p_out: float = 0.05, random_state: Optional[int] = None, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Stochastic block model.

Parameters
  • sizes – Block sizes.

  • p_in – Probability of connection within blocks.

  • p_out – Probability of connection across blocks.

  • random_state – Seed of the random generator (optional).

  • metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (labels).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import block_model
>>> sizes = np.array([4, 5])
>>> adjacency = block_model(sizes)
>>> adjacency.shape
(9, 9)

References

Airoldi, E., Blei, D., Feinberg, S., Xing, E. (2007). Mixed membership stochastic blockmodels. Journal of Machine Learning Research.

sknetwork.data.albert_barabasi(n: int = 100, degree: int = 3, undirected: bool = True, seed: Optional[int] = None) → scipy.sparse.csr.csr_matrix[source]

Albert-Barabasi model.

Parameters
  • n (int) – Number of nodes.

  • degree (int) – Degree of incoming nodes (less than n).

  • undirected (bool) – If True, return an undirected graph.

  • seed – Seed of the random generator (optional).

Returns

adjacency – Adjacency matrix.

Return type

sparse.csr_matrix

Example

>>> from sknetwork.data import albert_barabasi
>>> adjacency = albert_barabasi(30, 3)
>>> adjacency.shape
(30, 30)

References

Albert, R., Barabási, L. (2002). Statistical mechanics of complex networks Reviews of Modern Physics.

sknetwork.data.watts_strogatz(n: int = 100, degree: int = 6, prob: float = 0.05, seed: Optional[int] = None, metadata: bool = False) → Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch][source]

Watts-Strogatz model.

Parameters
  • n – Number of nodes.

  • degree – Initial degree of nodes.

  • prob – Probability of edge modification.

  • seed – Seed of the random generator (optional).

  • metadata – If True, return a Bunch object with metadata.

Returns

adjacency or graph – Adjacency matrix or graph with metadata (positions).

Return type

Union[sparse.csr_matrix, Bunch]

Example

>>> from sknetwork.data import watts_strogatz
>>> adjacency = watts_strogatz(30, 4, 0.02)
>>> adjacency.shape
(30, 30)

References

Watts, D., Strogatz, S. (1998). Collective dynamics of small-world networks, Nature.

Load

You can find some datasets on NetRep.

sknetwork.data.load_edge_list(file: str, directed: bool = False, bipartite: bool = False, weighted: Optional[bool] = None, named: Optional[bool] = None, comment: str = '%#', delimiter: str = None, reindex: bool = True, fast_format: bool = True) → sknetwork.utils.Bunch[source]

Parse Tabulation-Separated, Comma-Separated or Space-Separated (or other) Values datasets in the form of edge lists.

Parameters
  • file (str) – The path to the dataset in TSV format

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix of shape (n1, n2).

  • weighted (Optional[bool]) – Retrieves the weights in the third field of the file. None makes a guess based on the first lines.

  • named (Optional[bool]) – Retrieves the names given to the nodes and renumbers them. Returns an additional array. None makes a guess based on the first lines.

  • comment (str) – Set of characters denoting lines to ignore.

  • delimiter (str) – delimiter used in the file. None makes a guess

  • reindex (bool) – If True and the graph nodes have numeric values, the size of the returned adjacency will be determined by the maximum of those values. Does not work for bipartite graphs.

  • fast_format (bool) –

    If True, assumes that the file is well-formatted:

    • no comments except for the header

    • only 2 or 3 columns

    • only int or float values

Returns

graph

Return type

Bunch

sknetwork.data.load_adjacency_list(file: str, bipartite: bool = False, comment: str = '%#', delimiter: str = None) → sknetwork.utils.Bunch[source]

Parse Tabulation-Separated, Comma-Separated or Space-Separated (or other) Values datasets in the form of adjacency lists.

Parameters
  • file (str) – The path to the dataset in TSV format

  • bipartite (bool) – If True, returns a biadjacency matrix of shape (n1, n2).

  • comment (str) – Set of characters denoting lines to ignore.

  • delimiter (str) – delimiter used in the file. None makes a guess

Returns

graph

Return type

Bunch

sknetwork.data.load_graphml(file: str, weight_key: str = 'weight', max_string_size: int = 512) → sknetwork.utils.Bunch[source]

Parse GraphML datasets.

Hyperedges and nested graphs are not supported.

Parameters
  • file (str) – The path to the dataset

  • weight_key (str) – The key to be used as a value for edge weights

  • max_string_size (int) – The maximum size for string features of the data

Returns

data – The dataset in a bunch with the adjacency as a CSR matrix.

Return type

Bunch

sknetwork.data.load_netset(dataset: Optional[str] = None, data_home: Optional[Union[str, pathlib.Path]] = None) → sknetwork.utils.Bunch[source]

Load a dataset from the NetSet database.

Parameters
  • dataset (str) – The name of the dataset (all low-case). Examples include ‘openflights’, ‘cinema’ and ‘wikivitals’.

  • data_home (str or pathlib.Path) – The folder to be used for dataset storage.

Returns

graph

Return type

Bunch

sknetwork.data.load_konect(dataset: str, data_home: Optional[Union[str, pathlib.Path]] = None, auto_numpy_bundle: bool = True) → sknetwork.utils.Bunch[source]

Load a dataset from the Konect database.

Parameters
  • dataset (str) – The internal name of the dataset as specified on the Konect website (e.g. for the Zachary Karate club dataset, the corresponding name is 'ucidata-zachary').

  • data_home (str or pathlib.Path) – The folder to be used for dataset storage

  • auto_numpy_bundle (bool) – Denotes if the dataset should be stored in its default format (False) or using Numpy files for faster subsequent access to the dataset (True).

Returns

graph

An object with the following attributes:

  • adjacency or biadjacency: the adjacency/biadjacency matrix for the dataset

  • meta: a dictionary containing the metadata as specified by Konect

  • each attribute specified by Konect (ent.* file)

Return type

Bunch

Notes

An attribute meta of the Bunch class is used to store information about the dataset if present. In any case, meta has the attribute name which, if not given, is equal to the name of the dataset as passed to this function.

References

Kunegis, J. (2013, May). Konect: the Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web (pp. 1343-1350).

sknetwork.data.convert_edge_list(edge_list: Union[numpy.ndarray, List[Tuple], List[List]], directed: bool = False, bipartite: bool = False, reindex: bool = True, named: Optional[bool] = None) → sknetwork.utils.Bunch[source]

Turn an edge list into a Bunch.

Parameters
  • edge_list (Union[np.ndarray, List[Tuple], List[List]]) – The edge list to convert, given as a NumPy array of size (n, 2) or (n, 3) or a list of either lists or tuples of length 2 or 3.

  • directed (bool) – If True, considers the graph as directed.

  • bipartite (bool) – If True, returns a biadjacency matrix of shape (n1, n2).

  • reindex (bool) – If True and the graph nodes have numeric values, the size of the returned adjacency will be determined by the maximum of those values. Does not work for bipartite graphs.

  • named (Optional[bool]) – Retrieves the names given to the nodes and renumbers them. Returns an additional array. None makes a guess based on the first lines.

Returns

graph

Return type

Bunch

Save

sknetwork.data.save(folder: Union[str, pathlib.Path], data: Union[scipy.sparse.csr.csr_matrix, sknetwork.utils.Bunch])[source]

Save a Bunch or a CSR matrix in the current directory to a collection of Numpy and Pickle files for faster subsequent loads.

Parameters
  • folder (str or pathlib.Path) – The name to be used for the bundle folder

  • data (Union[sparse.csr_matrix, Bunch]) – The data to save

Example

>>> from sknetwork.data import save
>>> graph = Bunch()
>>> graph.adjacency = sparse.csr_matrix(np.random.random((10, 10)) < 0.2)
>>> graph.names = np.array(list('abcdefghij'))
>>> save('random_data', graph)
>>> 'random_data' in listdir('.')
True
sknetwork.data.load(folder: Union[str, pathlib.Path])[source]

Load a Bunch from a previously created bundle from the current directory (inverse function of save).

Parameters

folder (str) – The name used for the bundle folder

Returns

data – The original data

Return type

Bunch

Example

>>> from sknetwork.data import save
>>> graph = Bunch()
>>> graph.adjacency = sparse.csr_matrix(np.random.random((10, 10)) < 0.2)
>>> graph.names = np.array(list('abcdefghij'))
>>> save('random_data', graph)
>>> loaded_graph = load('random_data')
>>> loaded_graph.names[0]
'a'