How-To Guides
Here you can find a collection of recipes for various tasks and settings you might want to try out.
What do you want to do?
1. Dependencies and Environment
-
Create a
conda
environment and install all dependencies to run our code and reproduce our results:- Create a new
conda
environment with python 3.7.1. Other pythons are not supported! - Run
make install
to recreate the environment make update
can be used to update the conda environment from the.yml
file.- The environment will be called
torch4p
- Disclaimer: We have only tested our code on Ubuntu 18.04 and 20.04. We cannot guarantee compatibility with any other operating system.
- macOS users: PriMIA should work on macOS, however only using
pip
as described below
- Create a new
-
Use
pip
to set up an environment:- We discourage this method and do not support it, so proceed at your own risk
- Make a virtual environment using the tool of your choice and using python 3.7.1. Other pythons are not supported!
- Run
pip install -r requirements.txt
to install the basic requirements - After these have installed, run
pip install -U syft==0.2.9
to install PySyft pip
will complain (quite loudly and in red as of September 2020) about inconsistencies withtorchdp
. These can be ignored
-
Clean up everything and restore the environment to its default state:
- Run
make clean_all
. Please be careful when running this, as it DELETES folders, including themodel_weights
folder which perhaps stores your training weights! For a detailed list of what this command destroys, check theMakefile
underCleanup
.
- Run
2. Work with the Paediatric Pneumonia Dataset
- Distribute the data to individual worker folders:
- Run
make server_folders
. This will actually copy the files todata/server_simulation/worker<i>
. It can be useful if you want to use the data on remote machines instead of locally - Run
make symbolic_server_folders
if you intend to work locally only. This creates symbolic links, saves a lot of space and is faster - Run
make minimal_server_folders
to create a minimal dataset of 4 images (this can be modified in theMakefile
) per worker (times therepetitions_dataset
parameter from theconfig.ini
file) for quickly trying out something after e.g. making changes to the code.
- Run
3. Work with the MNIST dataset
This does not happen from the Makefile but rather is passed as a flag to the training scripts.
4. Training using VirtualWorkers
-
Train using VirtualWorkers using the Paediatric Pneumonia dataset (quick way):
- Run
make symbolic_server_folders
to randomly split the Paediatric Pneumonia Dataset into three worker folders and a validation set. You can also runmake minimal_server_folders
if you are in a real hurry, but results will be predictably poor. - Run
make federated_secure
to train a model using federated learning with secure aggregation.make federated_insecure
can be used to suppress secure aggregation - The model weights and a
.csv
file with metadata will be saved undermodel_weights
- Run
-
Train using VirtualWorkers using the Paediatric Pneumonia dataset (slow way):
- Make whichever modifications you need to the configuration file (documentation here)
- Run
python train.py --config <path/to/your/config.ini> --train_federated --data_dir data/server_simulation
- Pass the
--unencrypted_aggregation
flag to suppress secure aggregation.
-
Train on your own paediatric pneumonia data:
- pass
--data_dir <path/to/your/data>
to the CLI
- pass
5. Training using PyGrid Nodes
-
Set up PyGrid Nodes on your local machine and run training with them
- Run
make gridnode
. This assumes you are using the Paediatric Pneumonia Dataset and the pre-madepneumonia-resnet-pretrained.ini
file and is a convenience function. Make adjustments to theMakefile
or directly run the following if you require more flexibility:python torchlib/run_websocket_server.py --data_dir data/server_simulation --config <path/to/your/config.ini>
- Run
make federated_gridnode_secure
to train on the GridNodes with secure aggregation ormake federated_gridnode_insecure
to eschew secure aggregation
- Run
-
Set up a PyGrid Node on a local or remote server for federated training
- Run
python -m Node --id <desired id> --port <desired port> --data_dir <path/to/data> --config <path/to/config.ini>
. The configuration file must be identical on all remote servers and the central server. The name of the node, the IP address and the port must be changed inwebsetting/config.csv
- When training, the training coordinator/ central server must pass the
--websockets
flag to train.py which will read the settings fromwebsetting/config.csv
and configure the connections automatically.
- Run
Sidenote: The number of workers can be changed by omitting workers from the configuration csv
file.
6. Run training locally using GPUs
Run make local
. Alternatively, run python train.py --config <path/to/your/config.ini> --data_dir <path/to/data> --cuda
.
Note that macOS has no CUDA support. In this case, --cuda
will do nothing. It can also be omitted if training on CPU is desired.
7. Miscellaneous Training
-
If you want to adapt PriMIA to another use-case altogether, go here
-
Monitor your training with Visdom
- From the command line, run
visdom
to start a visdom server. It will be located onlocalhost:xxxx
. Navigate to this page with your browser. - Add the
--visdom
flag to train.py - For more information on Visdom, see here. Scroll down to see configuration options, e.g. port selection or authentication for Visdom and many tutorials!
- From the command line, run
-
Use MNIST, VGG16 etc.
- These are handled using command line arguments.
--data dir mnist
will use MNIST fromtorchvision
. The model can be switched in the configuration file. Check this page for details.
- These are handled using command line arguments.
-
Run a hyperparameter optimisation trial
- Run
python torchlib/find_config.py
. This assumes the system is set up for training (as described above). PriMIA uses Optuna. The system defaults to local training. If VirtualWorkers are required, pass the--federated
flag. If PyGrid nodes are running, you can pass--websockets
(which will be passed on totrain.py
). A database file can be specified here, otherwise a default SQLite file will be used. - Results can be visualised running the script with the
--visualize
flag, which will read the database file and open an Optuna server to show the results - The results of the hyperparameter run will be located inside
model_weights
. If running many trials, make sure you have enough space available since this folder will become very large.
- Run
-
Differential privacy
- PriMIA includes bindings for the
torchdp
library (now called Opacus). Differential privacy is only implemented for simple models at the moment and is in an experimental stage.
- PriMIA includes bindings for the
8. Inference
-
Run inference with VirtualWorkers
- Put data to classify in a directory
- Have a trained model ready (in
.pt
format) - Run
python inference.py --data_dir <path/to/data> --model_weights <path/to/model> --encrypted_inference
. TheMakefile
also provides some premade recipes which need to be adapted to your data and models. - CAUTION: Encrypted inference is extremely resource intensive and can cause your computer to become unresponsive or the process to be killed by your operating system. Omit the
--encrypted_inference
flag to perform regular remote inference - On compatible systems, inference can be accelerated with
--cuda
- Do not confuse inference with using the
test.py
file. This is a convenience script that will only work with the pneumonia dataset used in our publication.
-
Run inference over the network
- If the
--websockets_config
flag is passed alongside the path to aconfiguration.ini
file (a template can be found inconfigs/websetting
), inference will be performed over the network. The ports and IP addresses must match the ports and IP addresses of your machines. - This requires PyGrid nodes to be set up as a data owner, a model owner and a crypto provider. The
Makefile
provides some templates for this. If you want to simulate this process locally, you can runmake inference_setup
in one terminal, then runinference.py
in a different terminal. - For best results, you should pass a
mean_std_file
for inference, which contains the mean and standard deviation of the training data which is used for re-scaling the incoming data and is generated automatically during the training process. If this is omitted, sensible defaults are used. - Encrypted inference over the network is non-trivial, since the underlying WebSocket implementation has an issue where kernel TCP buffers can overflow (see issue here). Provided you really know what you are doing, you can tune the buffers using this guide. If you experience lag, delays or performance degradation, this is likely a problem with either your network settings or hardware. PriMIA does not interact directly with any networking layer.
- Alternatively, we provide the option of using HTTP exclusively for inference. This is slightly slower (as HTTP is not full duplex) and requires more I/O as it uses base-64 encoding. It is rock-stable though and can be enabled with the
--http_protocol
flag. - TLS is handled by PyGrid, not PriMIA. If you have certificates and want to use WSS or HTTPS, these need to be loaded onto the Nodes manually. Furthermore, the Nodes produce a warning related to the secret key, which should not be left at its default setting for security purposes, but passed as an environment variable. More info can be found here.
- Encrypted inference is very resource and I/O intensive.
- If the