{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Load your data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In scikit-network, a graph is represented by its [adjacency matrix](https://en.wikipedia.org/wiki/Adjacency_matrix) (or biadjacency matrix for a bipartite graph) in the [Compressed Sparse Row](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) format of SciPy.\n",
    "\n",
    "In this tutorial, we present a few methods to instantiate a graph in this format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from IPython.display import SVG\n",
    "\n",
    "import numpy as np\n",
    "from scipy import sparse\n",
    "import pandas as pd\n",
    "\n",
    "from sknetwork.data import from_edge_list, from_adjacency_list, from_graphml, from_csv\n",
    "from sknetwork.visualization import visualize_graph, visualize_bigraph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From a NumPy array\n",
    "For small graphs, you can instantiate the adjacency matrix as a dense NumPy array and convert it into a sparse matrix in CSR format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" width=\"440\" height=\"340\">\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 131 320 167 171\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 131 320 420 278\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 167 171 131 320\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 167 171 420 278\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 167 171 20 20\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 420 278 131 320\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 420 278 167 171\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 20 20 167 171\"/>\n",
       "<circle cx=\"131\" cy=\"320\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"167\" cy=\"171\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"420\" cy=\"278\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"20\" cy=\"20\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 0]])\n",
    "adjacency = sparse.csr_matrix(adjacency)\n",
    "\n",
    "image = visualize_graph(adjacency)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From an edge list\n",
    "Another natural way to build a graph is from a list of edges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" width=\"440\" height=\"340\">\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 204 20 20 247\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 204 20 243 320\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 204 20 420 94\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 20 247 204 20\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 20 247 243 320\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 243 320 204 20\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 243 320 20 247\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 243 320 420 94\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 420 94 204 20\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 420 94 243 320\"/>\n",
       "<circle cx=\"204\" cy=\"20\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"20\" cy=\"247\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"243\" cy=\"320\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"420\" cy=\"94\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "edge_list = [(0, 1), (1, 2), (2, 3), (3, 0), (0, 2)]\n",
    "adjacency = from_edge_list(edge_list)\n",
    "\n",
    "image = visualize_graph(adjacency)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "By default, the graph is undirected, but you can easily make it directed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" width=\"440\" height=\"340\">\n",
       "<defs><marker id=\"arrow-black\" markerWidth=\"10\" markerHeight=\"10\" refX=\"9\" refY=\"3\" orient=\"auto\">\n",
       "<path d=\"M0,0 L0,6 L9,3 z\" fill=\"black\"/></marker></defs>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 420 184 135 318\" marker-end=\"url(#arrow-black)\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 420 184 26 155\" marker-end=\"url(#arrow-black)\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 129 320 23 160\" marker-end=\"url(#arrow-black)\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 20 155 304 22\" marker-end=\"url(#arrow-black)\"/>\n",
       "<path stroke-width=\"1\" stroke=\"black\" d=\"M 310 20 417 179\" marker-end=\"url(#arrow-black)\"/>\n",
       "<circle cx=\"420\" cy=\"184\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"129\" cy=\"320\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"20\" cy=\"155\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "<circle cx=\"310\" cy=\"20\" r=\"7.0\" style=\"fill:gray;stroke:black;stroke-width:1.0\"/>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adjacency = from_edge_list(edge_list, directed=True)\n",
    "\n",
    "image = visualize_graph(adjacency)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "You might also want to add weights to your edges. Just use triplets instead of pairs!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = [(0, 1, 1), (1, 2, 0.5), (2, 3, 1), (3, 0, 0.5), (0, 2, 2)]\n",
    "adjacency = from_edge_list(edge_list)\n",
    "\n",
    "image = visualize_graph(adjacency)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "You can instantiate a bipartite graph as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = [(0, 0), (1, 0), (1, 1), (2, 1)]\n",
    "biadjacency = from_edge_list(edge_list, bipartite=True)\n",
    "\n",
    "image = visualize_bigraph(biadjacency)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "If nodes are not indexed, you get an object of type ``Bunch`` with graph attributes (node names)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = [(\"Alice\", \"Bob\"), (\"Bob\", \"Carey\"), (\"Alice\", \"David\"), (\"Carey\", \"David\"), (\"Bob\", \"David\")]\n",
    "graph = from_edge_list(edge_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency, names=names)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "By default, the weight of each edge is the number of occurrences of the corresponding link:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list_new = edge_list + [(\"Alice\", \"Bob\"), (\"Alice\", \"David\"), (\"Alice\", \"Bob\")]\n",
    "graph = from_edge_list(edge_list_new)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency, names=names)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "You can make the graph unweighted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_edge_list(edge_list_new, weighted=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency, names=names)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Again, you can make the graph directed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_edge_list(edge_list, directed=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency, names=names)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The graph can also have explicit weights:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = [(\"Alice\", \"Bob\", 3), (\"Bob\", \"Carey\", 2), (\"Alice\", \"David\", 1), (\"Carey\", \"David\", 2), (\"Bob\", \"David\", 3)]\n",
    "graph = from_edge_list(edge_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency, names=names, display_edge_weight=True, display_node_weight=True)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "For a bipartite graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = [(\"Alice\", \"Football\"), (\"Bob\", \"Tennis\"), (\"David\", \"Football\"), (\"Carey\", \"Tennis\"), (\"Carey\", \"Football\")]\n",
    "graph = from_edge_list(edge_list, bipartite=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "biadjacency = graph.biadjacency\n",
    "names = graph.names\n",
    "names_col = graph.names_col"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_bigraph(biadjacency, names_row=names, names_col=names_col)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From an adjacency list\n",
    "\n",
    "You can also load a graph from an adjacency list, given as a list of lists or a dictionary of lists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency_list =[[0, 1, 2], [2, 3]]\n",
    "adjacency = from_adjacency_list(adjacency_list, directed=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency_dict = {\"Alice\": [\"Bob\", \"David\"], \"Bob\": [\"Carey\", \"David\"]}\n",
    "graph = from_adjacency_list(adjacency_dict, directed=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "image = visualize_graph(adjacency, names=names)\n",
    "SVG(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From a dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Your dataframe might consist of a list of edges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('miserables.tsv', sep='\\t', names=['character_1', 'character_2'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = list(df.itertuples(index=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_edge_list(edge_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('movie_actor.tsv', sep='\\t', names=['movie', 'actor'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "edge_list = list(df.itertuples(index=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_edge_list(edge_list, bipartite=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "For categorical data, you can use ``pandas`` to get a bipartite graph between samples and features. We show an example taken from the [Adult Income](https://archive.ics.uci.edu/ml/datasets/adult) dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('adult-income.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>workclass</th>\n",
       "      <th>occupation</th>\n",
       "      <th>relationship</th>\n",
       "      <th>gender</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>40-49</td>\n",
       "      <td>State-gov</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>Male</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50-59</td>\n",
       "      <td>Self-emp-not-inc</td>\n",
       "      <td>Exec-managerial</td>\n",
       "      <td>Husband</td>\n",
       "      <td>Male</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40-49</td>\n",
       "      <td>Private</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>Male</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>50-59</td>\n",
       "      <td>Private</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>Husband</td>\n",
       "      <td>Male</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>30-39</td>\n",
       "      <td>Private</td>\n",
       "      <td>Prof-specialty</td>\n",
       "      <td>Wife</td>\n",
       "      <td>Female</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     age          workclass          occupation    relationship   gender  \\\n",
       "0  40-49          State-gov        Adm-clerical   Not-in-family     Male   \n",
       "1  50-59   Self-emp-not-inc     Exec-managerial         Husband     Male   \n",
       "2  40-49            Private   Handlers-cleaners   Not-in-family     Male   \n",
       "3  50-59            Private   Handlers-cleaners         Husband     Male   \n",
       "4  30-39            Private      Prof-specialty            Wife   Female   \n",
       "\n",
       "   income  \n",
       "0   <=50K  \n",
       "1   <=50K  \n",
       "2   <=50K  \n",
       "3   <=50K  \n",
       "4   <=50K  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "df_binary = pd.get_dummies(df, sparse=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age_20-29</th>\n",
       "      <th>age_30-39</th>\n",
       "      <th>age_40-49</th>\n",
       "      <th>age_50-59</th>\n",
       "      <th>age_60-69</th>\n",
       "      <th>age_70-79</th>\n",
       "      <th>age_80-89</th>\n",
       "      <th>age_90-99</th>\n",
       "      <th>workclass_ ?</th>\n",
       "      <th>workclass_ Federal-gov</th>\n",
       "      <th>...</th>\n",
       "      <th>relationship_ Husband</th>\n",
       "      <th>relationship_ Not-in-family</th>\n",
       "      <th>relationship_ Other-relative</th>\n",
       "      <th>relationship_ Own-child</th>\n",
       "      <th>relationship_ Unmarried</th>\n",
       "      <th>relationship_ Wife</th>\n",
       "      <th>gender_ Female</th>\n",
       "      <th>gender_ Male</th>\n",
       "      <th>income_ &lt;=50K</th>\n",
       "      <th>income_ &gt;50K</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 42 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   age_20-29  age_30-39  age_40-49  age_50-59  age_60-69  age_70-79  \\\n",
       "0          0          0          1          0          0          0   \n",
       "1          0          0          0          1          0          0   \n",
       "2          0          0          1          0          0          0   \n",
       "3          0          0          0          1          0          0   \n",
       "4          0          1          0          0          0          0   \n",
       "\n",
       "   age_80-89  age_90-99  workclass_ ?  workclass_ Federal-gov  ...  \\\n",
       "0          0          0             0                       0  ...   \n",
       "1          0          0             0                       0  ...   \n",
       "2          0          0             0                       0  ...   \n",
       "3          0          0             0                       0  ...   \n",
       "4          0          0             0                       0  ...   \n",
       "\n",
       "   relationship_ Husband  relationship_ Not-in-family  \\\n",
       "0                      0                            1   \n",
       "1                      1                            0   \n",
       "2                      0                            1   \n",
       "3                      1                            0   \n",
       "4                      0                            0   \n",
       "\n",
       "   relationship_ Other-relative  relationship_ Own-child  \\\n",
       "0                             0                        0   \n",
       "1                             0                        0   \n",
       "2                             0                        0   \n",
       "3                             0                        0   \n",
       "4                             0                        0   \n",
       "\n",
       "   relationship_ Unmarried  relationship_ Wife  gender_ Female  gender_ Male  \\\n",
       "0                        0                   0               0             1   \n",
       "1                        0                   0               0             1   \n",
       "2                        0                   0               0             1   \n",
       "3                        0                   0               0             1   \n",
       "4                        0                   1               1             0   \n",
       "\n",
       "   income_ <=50K  income_ >50K  \n",
       "0              1             0  \n",
       "1              1             0  \n",
       "2              1             0  \n",
       "3              1             0  \n",
       "4              1             0  \n",
       "\n",
       "[5 rows x 42 columns]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_binary.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "biadjacency = df_binary.sparse.to_coo()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "biadjacency = sparse.csr_matrix(biadjacency)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<32561x42 sparse matrix of type '<class 'numpy.uint8'>'\n",
       "\twith 195366 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# biadjacency matrix of the bipartite graph\n",
    "biadjacency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# names of columns\n",
    "names_col = list(df_binary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "42"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(names_col)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['age_20-29',\n",
       " 'age_30-39',\n",
       " 'age_40-49',\n",
       " 'age_50-59',\n",
       " 'age_60-69',\n",
       " 'age_70-79',\n",
       " 'age_80-89',\n",
       " 'age_90-99']"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names_col[:8]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From a CSV file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "You can directly load a graph from a CSV or TSV file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_csv('miserables.tsv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_csv('movie_actor.tsv', bipartite=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The graph can also be given in the form of adjacency lists (check the function ``from_csv``)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From a GraphML file\n",
    "\n",
    "You can also load a graph stored in the [GraphML](https://en.wikipedia.org/wiki/GraphML) format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "graph = from_graphml('miserables.graphml')\n",
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Directed graph\n",
    "graph = from_graphml('painters.graphml')\n",
    "adjacency = graph.adjacency\n",
    "names = graph.names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## From NetworkX\n",
    "\n",
    "NetworkX has [import](https://networkx.github.io/documentation/stable/reference/generated/networkx.convert_matrix.from_scipy_sparse_matrix.html#networkx.convert_matrix.from_scipy_sparse_matrix) and [export](https://networkx.github.io/documentation/stable/reference/generated/networkx.convert_matrix.to_scipy_sparse_matrix.html#networkx.convert_matrix.to_scipy_sparse_matrix) functions from and towards the CSR format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Other options\n",
    "\n",
    "* You want to test our toy graphs\n",
    "* You want to generate a graph from a model\n",
    "* You want to load a graph from existing repositories (see [NetSet](http://netset.telecom-paris.fr/) and [KONECT](http://konect.cc))\n",
    "\n",
    "Take a look at the other tutorials of the **data** section!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}