{ "cells": [ { "cell_type": "markdown", "id": "3e4eb758", "metadata": {}, "source": [ "## Connecting a local database" ] }, { "cell_type": "markdown", "id": "2460ecfc", "metadata": {}, "source": [ "In this demo notebook we will showcase connecting a local database" ] }, { "cell_type": "markdown", "id": "8e0dbccf", "metadata": {}, "source": [ "### 1. Create an API session" ] }, { "cell_type": "code", "execution_count": 132, "id": "d96cfcaf", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"res\":\"ok\",\"data\":\"pong 1674727453166\"}\n" ] } ], "source": [ "import pandas as pd\n", "\n", "from weaveapi.records import *\n", "from weaveapi.options import *\n", "from weaveapi.filter import *\n", "from weaveapi.weaveh import *\n", "\n", "WEAVE_CONFIG = \"config/demo_client_local.config\"\n", "nodeApi, session = connect_weave_api(WEAVE_CONFIG)\n", "\n", "data_collection = \"localdb\"\n", "table = \"oncology_data\"" ] }, { "cell_type": "markdown", "id": "ae15388d", "metadata": {}, "source": [ "### 2. Install a local database (if not already having one)" ] }, { "cell_type": "markdown", "id": "adca4620", "metadata": {}, "source": [ "- it can be any database or file storage from the ones [supported](https://www.weavechain.com/integrations)\n", "- for the example we will assume a local postgres server is installed\n", "- or you can start a new postgres instance as a docker following the \"How to use this image\" step from [here](https://hub.docker.com/_/postgres). Sample:\n", "```\n", " docker run --name some-postgres -p 0.0.0.0:5432:5432 -e POSTGRES_PASSWORD=mysecretpassword -d postgres\n", "```" ] }, { "cell_type": "markdown", "id": "2c71c819", "metadata": {}, "source": [ "#### Create a table and populate it with data" ] }, { "cell_type": "markdown", "id": "6e0f0073", "metadata": {}, "source": [ "- go to the folder where the node is installed\n", "- download a sample file\n", "```\n", " curl -O https://public.weavechain.com/file/sample.csv\n", "```\n", "- start the psql command prompt\n", "```\n", " psql -U postgres -d postgres -h localhost -p 5432\n", "```\n", "- if psql is not available, install it (see [this](https://www.timescale.com/blog/how-to-install-psql-on-mac-ubuntu-debian-windows/) guide from timescale) or run it from docker\n", "```\n", " docker run -it postgres /bin/bash -c \"psql -U postgres -d postgres -h host.docker.internal -p 5432\"\n", "```\n", "- ideally create a new database and user, but to simplify the steps we'll use the default *postgres* database and user\n", "- in the psql prompt, create a new table\n", "```\n", " CREATE TABLE \"oncology_data\" (\"id\" BIGINT NOT NULL,\"name\" TEXT,\"age\" NUMERIC,\"gender\" NUMERIC,\"air_pollution\" NUMERIC,\"alcohol_use\" NUMERIC,\"dust_allergy\" NUMERIC,\"occupational_hazards\" NUMERIC,\"genetic_risk\" NUMERIC,\"chronic_lung_disease\" NUMERIC,\"balanced_diet\" NUMERIC,\"obesity\" NUMERIC,\"smoking\" NUMERIC,\"passive_smoker\" NUMERIC,\"chest_pain\" NUMERIC,\"coughing_of_blood\" NUMERIC,\"fatigue\" NUMERIC,\"weight_loss\" NUMERIC,\"shortness_of_breath\" NUMERIC,\"wheezing\" NUMERIC,\"swallowing_difficulty\" NUMERIC,\"clubbing_of_fingernails\" NUMERIC,\"frequent_cold\" NUMERIC,\"dry_cough\" NUMERIC,\"snoring\" NUMERIC,\"level\" NUMERIC, CONSTRAINT pk_oncology_data PRIMARY KEY (\"id\"));\n", "```\n", "- and, in the same psql prompt, populate it with data (keep the \\ at the beginning)\n", "```\n", " \\COPY oncology_data FROM 'sample.csv' DELIMITER ',' CSV HEADER;\n", "```" ] }, { "cell_type": "markdown", "id": "ac907012", "metadata": {}, "source": [ "### 3. Add a new connection in the configuration file" ] }, { "cell_type": "markdown", "id": "7b4a5e29", "metadata": {}, "source": [ "- go to the node installation folder and edit config/demo.config\n", "- add a new item in the **databases** section, right before the existing **shared** subsection:\n", "```\n", " 'localdb': {\n", " 'connectionAdapterType': 'pgsql',\n", " 'replication': {\n", " 'type': 'none',\n", " 'allowedCachingIntervalSec': 604800\n", " },\n", " 'jdbcConfig': {\n", " 'host': 'host.docker.internal',\n", " 'port': 5432,\n", " 'schema': 'public',\n", " 'database': 'postgres',\n", " 'user': 'postgres',\n", " 'pass': 'mysecretpassword'\n", " }\n", " },\n", "```\n", "- If the node was started from docker, the address must point to the docker host machine, **host.docker.internal** will work on MacOS and windows. Use **localhost** if the node was started as standalone rather than a docker. Or the internal IP obtained via ipconfig/ifconfig. On linux 172.17.0.1 can also be used most of the time (usually assigned for the host machine to be visible from the docker if it's the single docker network interface)" ] }, { "cell_type": "markdown", "id": "512c99f6", "metadata": {}, "source": [ "#### Flag the node to reload the configuration file from disk" ] }, { "cell_type": "code", "execution_count": 133, "id": "17b212cf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'res': 'ok', 'data': 'weaveconfig/demo.config'}\n" ] } ], "source": [ "reply = nodeApi.resetConfig(session).get()\n", "print(reply)" ] }, { "cell_type": "markdown", "id": "1e9b4f93", "metadata": {}, "source": [ "#### Restart the node" ] }, { "cell_type": "markdown", "id": "317e474b", "metadata": {}, "source": [ "- run from the command prompt\n", "```\n", " docker stop weave_node\n", " docker start weave_node\n", "```" ] }, { "cell_type": "markdown", "id": "b37753f1", "metadata": {}, "source": [ "### 4. Read data from the newly added table" ] }, { "cell_type": "code", "execution_count": 138, "id": "b6ebc27d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"res\":\"ok\",\"data\":\"pong 1674727467726\"}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagegenderair_pollutionalcohol_usedust_allergyoccupational_hazardsgenetic_riskchronic_lung_disease...fatigueweight_lossshortness_of_breathwheezingswallowing_difficultyclubbing_of_fingernailsfrequent_colddry_coughsnoringlevel
01Lorenzo Rasmussen331245432...3422312341
12Zechariah Gallegos171315342...1378621722
23Lukas Jenkins351456554...8792146723
34Trey Holden371777767...4231456753
45Branson Rivera461687776...3241424233
\n", "

5 rows × 26 columns

\n", "
" ], "text/plain": [ " id name age gender air_pollution alcohol_use \\\n", "0 1 Lorenzo Rasmussen 33 1 2 4 \n", "1 2 Zechariah Gallegos 17 1 3 1 \n", "2 3 Lukas Jenkins 35 1 4 5 \n", "3 4 Trey Holden 37 1 7 7 \n", "4 5 Branson Rivera 46 1 6 8 \n", "\n", " dust_allergy occupational_hazards genetic_risk chronic_lung_disease \\\n", "0 5 4 3 2 \n", "1 5 3 4 2 \n", "2 6 5 5 4 \n", "3 7 7 6 7 \n", "4 7 7 7 6 \n", "\n", " ... fatigue weight_loss shortness_of_breath wheezing \\\n", "0 ... 3 4 2 2 \n", "1 ... 1 3 7 8 \n", "2 ... 8 7 9 2 \n", "3 ... 4 2 3 1 \n", "4 ... 3 2 4 1 \n", "\n", " swallowing_difficulty clubbing_of_fingernails frequent_cold dry_cough \\\n", "0 3 1 2 3 \n", "1 6 2 1 7 \n", "2 1 4 6 7 \n", "3 4 5 6 7 \n", "4 4 2 4 2 \n", "\n", " snoring level \n", "0 4 1 \n", "1 2 2 \n", "2 2 3 \n", "3 5 3 \n", "4 3 3 \n", "\n", "[5 rows x 26 columns]" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nodeApi, session = connect_weave_api(WEAVE_CONFIG)\n", "\n", "reply = nodeApi.read(session, data_collection, table, None, READ_DEFAULT_NO_CHAIN).get()\n", "#print(reply)\n", "df = pd.DataFrame(reply[\"data\"])\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "id": "03d82eca", "metadata": {}, "source": [ "### 5. Make the table private" ] }, { "cell_type": "code", "execution_count": 145, "id": "18be11b4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'res': 'ok', 'target': {'operationType': 'UPDATE_LAYOUT', 'organization': 'weavedemo', 'account': 'weaveyh5R1ytoUCZnr3JjqMDfhUrXwqWC2EWnZX3q7krKLPcg', 'scope': 'localdb', 'table': 'oncology_data'}}\n" ] } ], "source": [ "reply = nodeApi.getTableDefinition(session, data_collection, table).get()\n", "#print(reply)\n", "layout = json.loads(reply[\"data\"])[\"layout\"]\n", "layout[\"isLocal\"] = True\n", "reply = nodeApi.updateLayout(session, data_collection, table, json.dumps({ \"layout\": layout})).get()\n", "print(reply)" ] }, { "cell_type": "markdown", "id": "3142dc08", "metadata": {}, "source": [ "#### and fail to read the data" ] }, { "cell_type": "markdown", "id": "55c4b878", "metadata": {}, "source": [ "- we expect a **Not authorized** reply here" ] }, { "cell_type": "code", "execution_count": 146, "id": "4d5b6988", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'res': 'err', 'target': {'operationType': 'READ', 'organization': 'weavedemo', 'account': 'weaveyh5R1ytoUCZnr3JjqMDfhUrXwqWC2EWnZX3q7krKLPcg', 'scope': 'localdb', 'table': 'oncology_data'}, 'message': 'Not authorized'}\n" ] } ], "source": [ "reply = nodeApi.read(session, data_collection, table, None, READ_DEFAULT_NO_CHAIN).get()\n", "print(reply)" ] }, { "cell_type": "markdown", "id": "0b45141e", "metadata": {}, "source": [ "### 6. Compute a Merkle Tree from a subset of columns in the private table" ] }, { "cell_type": "code", "execution_count": 154, "id": "1e344c4e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Generated at 1674727577162\n", "Root Hash iax9Vpupq3bXG3DnBTQyKdWr2s3Uh4Q6qcZAPNVJAgb\n", "Signature 4RdBCQgKWfJqCvD7PrEkYyWSR3KtT8S8eYTdyveDyLFA6RNiwp28R1LvuEtMPAw8ceRzjNddQvQhKFQiRFy9nZoL\n", "\n", "iax9Vpupq3bXG3DnBTQyKdWr2s3Uh4Q6qcZAPNVJAgb;SH3Ch1Jwxcs363PwaCYvREgmPVfqr8NcPovZNgXa4js,FE8mTPPJ3uDXTJNxppNa4CeHvfu2Zeo9sjeL71YS12zK;HKgnz9cLoWn4nDapvnGm6K6gH6H5Y2rrgMguaMoKgyek,7s52zKzog488LkgMdEKrkFRoF8opVB9AdrDKtSdSF92j,9tnJTNhSKpZmDViGXtPB2SpoiozEpGkFVNk7FgtLtBQv,A5tNHAHQs61pSzqLJbT9v2ca8ix53kNzzDXKSZ9b6HFy;7oyQ7nq5bLX4J9Es6GQdZcxUpc8FetPZLeyJVFpEnydd,BybqudmZBw54NJhebTWt8yJMf4ar6jnxLRhD2MFsZ7...DhvjP86uY7x4eK7QRrcocBct5XJNGsVUWWREc6zd,E9QLrZqb5M6oXoEQpQ3B5KMvmHJ13Xdb9jUxRgJ3FaJA,F7dfae6vFLs3a2a2w9uET87Euw2uxE2qssvqYcmHPUeP,5Am75Eu1NYQbLuEfTTGDUiH5sWhPcumC3deC4yVYpYkn,HrsDMUrxxgjoXrfeMfSaks8mGK6hbDnyDQjhZwuXnCT7,DL96tTVtfEyRCZCv3EeNd4cStYqX4hR8r5anUi8eAMcC,Gaocem33txtM2KMJbUaMGrEDFrYH1H1oJB25Qb565Ao2,7u4EXgwGPmoqgz6cvAFyfvjAP7cvJcBeHwrK9XpYz9RZ,FpT72W53qoU3YXAZg1DKMCUVSpyLmjA6j52nSXYR9yRn\n" ] } ], "source": [ "salt = \"salt1234\" # Same salt used for records hashes, this can be improved to have different salts for *each distinct writer*\n", "\n", "filter = Filter(None, None, None, None, [ \"name\", \"age\" ])\n", "reply = nodeApi.merkleTree(session, data_collection, table, filter, salt, READ_DEFAULT_NO_CHAIN).get()\n", "tree = reply[\"data\"][\"tree\"]\n", "rootHash = reply[\"data\"][\"rootHash\"]\n", "ts = reply[\"data\"][\"timestamp\"]\n", "rootHashSignature = reply[\"data\"][\"signature\"]\n", "\n", "print(\"Generated at\", ts)\n", "print(\"Root Hash\", rootHash)\n", "print(\"Signature\", rootHashSignature)\n", "print(\"\")\n", "print(tree[:400] + \"...\" + tree[-400:])" ] }, { "cell_type": "markdown", "id": "d226393a", "metadata": {}, "source": [ "#### Check root hash signature" ] }, { "cell_type": "code", "execution_count": 156, "id": "be428c83", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Check signature: True\n" ] } ], "source": [ "toSign = rootHash + \" \" + ts\n", "check = nodeApi.verifySignature(rootHashSignature, toSign)\n", "print(\"Check signature:\", check)" ] }, { "cell_type": "markdown", "id": "c22aee9b", "metadata": {}, "source": [ "#### Verify the presence of a known record in the dataset" ] }, { "cell_type": "code", "execution_count": 157, "id": "bee9abaf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ARMe28cMdZvxBCYgBSALyncPnec4ijERrn2cDJgwgEHA\n" ] } ], "source": [ "row = [ 'Lorenzo Rasmussen', 33.0 ]\n", "recordHash = nodeApi.hashRecord(row, salt)\n", "print(recordHash)" ] }, { "cell_type": "code", "execution_count": 158, "id": "6cb83e9c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "true\n" ] } ], "source": [ "reply = nodeApi.verifyMerkleHash(session, tree, recordHash).get()\n", "print(reply[\"data\"])" ] }, { "cell_type": "markdown", "id": "f7adfcf7", "metadata": {}, "source": [ "### 7. Train a ML model on the private data hosted in our local database" ] }, { "cell_type": "markdown", "id": "d49d0dde", "metadata": {}, "source": [ "- we do it similarly how it is done in the [Compute Sample](sample-compute.md), where we also check the model lineage\n", "- run on the node machine\n", "```\n", " docker pull gcr.io/weavechain/oncology_xgboost:latest\n", "```\n", "- use latest-arm64 if your machine is ARM\n", "- the data owner needs to purposely enable running a certain image\n", "- the node needs to be able to connect to the local docker instance\n", "- in the default configuration file installed with the node, the sample script is pre-authorized with the following line\n", "```\n", " 'allowedImages': [ 'gcr.io/weavechain/oncology_xgboost' ]\n", "```\n", "- in case of error, uncomment the #print(reply) below to see details\n", "- (compute to data is just one of the patterns of confidential computing supported, MPC and Homomorphic Encryption could also be used)" ] }, { "cell_type": "code", "execution_count": 159, "id": "3a276beb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"model\": \"YmluZgAAAD8XAAAAAwAAAAEAAAAAAAAAAQAAAAcAAAABAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOAAAAAAAAAG11bHRpOnNvZnRwcm9iBgAAAAAAAABnYnRyZWUsAQAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAsAAAAAAAAAAAAAABcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8BAAAAAgAAAAIAAIAAAJBAAAAAgAMAAAAEAAAACgAAgAAAYEAAAAAABQAAAAYAAAAJAACAAADwQAEAAIAHAAAACAAAABYAAIAAAKBAAQAAAAkAAAAKAAAACQAAgAAAIEACAACA//////////8AAAAA1mVlvgIAAAD//////////wAAAADmFLw+AwAAgP//////////AAAAAEGE5D4DAAAA//////////8AAAAA5RlPvgQAAID//////////wAAAADkGc8+BAAAAP//////////AAAAAG0+Y779nEdD4zjeQ1p2i70AAAAASfgoQ3EcU0P6hyI/AAAAAGRhoUFUVWlDjUE0vwAAAADk9SNC4zgCQycHqz8AAAAAz+gWQhzHoUJlQ/6+AAAAAAAAAACN42RDMio/vwAAAAAAAAAA4ziOQBW8nD8AAAAAAAAAAKqq8kI2br4/AAAAAAAAAADjOA5BlJUsvwAAAAAAAAAA4zgOQZOVrD8AAAAAAAAAAP//j0KwXj2/AAAAAAEAAAAVAAAAAAAAA...\n" ] } ], "source": [ "reply = nodeApi.compute(session, \"gcr.io/weavechain/oncology_xgboost\", COMPUTE_DEFAULT).get()\n", "#print(reply)\n", "print(reply[\"data\"][\"output\"][:1200] + \"...\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 5 }