Introduction to MSI, TensorFlow, and PyTorch

Outline

Below is a short introdcution some tools used in deep learning applications. We will go over

  • Connecting and running jobs at MSI
  • Using custom kernels in Jupyter
  • Construction of a basic neural network with TensorFlow
  • Saving/Restoring models
  • Basic example of PyTorch

z.umn.edu/colab-5980

Introduction to MSI

Connecting to MSI

Storage

  • Home directory ~/
  • Global scratch /scratch.global
  • Local scratch /scratch.local
  • S3 storage (s3.msi.umn.edu)
    ssh login.msi.umn.edu
    s3cmd ls 
    s3cmd mb s3://$USER
    s3cmd put some.file s3://$USER/some.file

Software

  • environment modules
    • used to control environment variables and prevent conflicts between the hundreds of installed software packages
      module avail python
      module load python/3.6.3
      module unload python

Compute Hardware

MSI Servers

  • Mesabi (2015)

    • Cores: 19,040 Intel Haswell
    • Memory: 83 TB
    • Accelerators: 80 K40 Nvidia gpGPUs
    • Peak: 860 TFlops
  • Mangi (2019 Mesabi upgrade)

    • Cores: 20,888 AMD Rome
    • Memory: 56 TB
    • Accelerators: 40 Nvidia V100s
    • Peak: 1150 TFlops

Singularity

  • Run most containers on MSI resources without sudo permission
    module load singularity

Singularity can be used to run most Docker containers. You can create a container on an Ubuntu laptop, transfer it to MSI, and then execute it using Singularity on the CentOS7 compute nodes.

Note: you CAN NOT currently build singularity images on MSI login nodes
You can

  • create images elsewhere
  • download from trusted sources
  • remote build your continaers using the singularity public servers (if you trust them)
blynch@ln0006 [~/] cat test.spec
Bootstrap: docker
From: ubuntu:xenial-20191108

%post
apt-get update
apt-get upgrade -y

blynch@ln0006 [~/] singularity build --remote mytest.img test.spec

Batch Computing

  1. Create a batch script
  2. Submit script to a queue
  3. Scheduler runs script at some point on the reosurces requested

A PBS submission script has 2 components

  1. #PBS directives to tell the scheduler what resources you want
  2. a set of commands to run

An example script would look like:

#!/usr/bin/bash
#PBS -l nodes=1:ppn=24
#PBS -l walltime=5:00:00
#PBS -l mem=60gb
#PBS -e myjob.e
#PBS -o myjob.o
#PBS -q mesabi

module load python
source activate myenvironment
cd some/directory
python something.py

and can be submitted like:

qsub myscript.sh

Job Queues

https://www.msi.umn.edu/queues

Interactive computing

  • Jupyter (https://notebooks.msi.umn.edu)
  • qsub
    From the command line:
    ssh login.msi.umn.edu
    ssh mesabi
    qsub -l nodes=1:ppn=2,mem=2gb,walltime=1:00:00 -q interactive -I

The interactive queue is more available than other queues, but you can always request resources for interactive use. E.g.;

qsub -l nodes=1:ppn=24,gpus=2,walltime=1:00:00 -q k40 -I

or

qsub -l nodes=1:ppn=24,gpus=2,walltime=1:00:00 -q v100 -I
In [0]:
 

Using Custom Kernels in Jupyter

  1. ssh into mesabi
  2. load python module
  3. create a new python environment
  4. customize environment
ssh login.msi.umn.edu
ssh mesabi

module load python
conda create -y --name myproject
source activate myproject
mkdir .local/share/jupyter/kernels/mynewkernel
vi .local/share/jupyter/kernels/mynewkernel/kernel.json
{
 "argv": [
  "/home/support/blynch/.conda/envs/mynewkernel/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python 3 - My Special Kernel",
 "language": "python"
}

customize your kernels

{
 "argv": [
  "/opt/singularity/singularity",
  "exec",
  "-B",
  "/panfs/roc/groups/2/support/blynch:/panfs/roc/groups/2/support/blynch",
  "/home/support/blynch/singularity/tf.simg",
  "/opt/anaconda3/bin/python",
  "-m",
  "ipykernel",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python 3.6 Singularity Tensorflow r1.12",
 "language": "python"
}
In [0]:
import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()

from google.colab import drive
drive.mount('/content/gdrive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive
In [0]:
 
  • Tensorflow as started by Google, released in November 2015
  • Written in C++
  • Typically used from within Python directly or through the Keras module in Python

TensorFlow basics

  1. Define a directed graph
  2. execute

alt text

In [0]:
import tensorflow as tf
x = tf.Variable(3, name="x")
y = tf.Variable(7, name="y")
f = x*100 + y*3 - 7
print(f)
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
print(result)

The default version of TensorFlow in Colab will soon switch to TensorFlow 2.x.
We recommend you upgrade now or ensure your notebook will continue to use TensorFlow 1.x via the %tensorflow_version 1.x magic: more info.

Tensor("sub:0", shape=(), dtype=int32)
314

Keras

  • developed as a high-level interface to create neural networks with Tensorflow and Theano.
  • now it also supports Microsoft CNTK

alt text

In [0]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 6s 93us/sample - loss: 0.2860 - acc: 0.9172
Epoch 2/5
60000/60000 [==============================] - 4s 66us/sample - loss: 0.1411 - acc: 0.9582
Epoch 3/5
60000/60000 [==============================] - 4s 65us/sample - loss: 0.1061 - acc: 0.9674
Epoch 4/5
60000/60000 [==============================] - 4s 65us/sample - loss: 0.0880 - acc: 0.9731
Epoch 5/5
60000/60000 [==============================] - 4s 65us/sample - loss: 0.0740 - acc: 0.9771
10000/10000 [==============================] - 0s 47us/sample - loss: 0.0703 - acc: 0.9786
Out[0]:
[0.07031472590686753, 0.9786]
In [0]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
In [0]:
from keras.callbacks import ModelCheckpoint

!mkdir /content/scratch
output_basename   = 'blynch-job1.hdf5'
output_model_name = '/content/scratch/' + output_basename

checkpointer = ModelCheckpoint(output_model_name, monitor='val_loss', verbose=1, mode='auto')

model.fit(x_train, y_train, epochs=5, callbacks=[checkpointer])

!ls -l /content/scratch


#model = load_model(input_model_name)
Using TensorFlow backend.
Train on 60000 samples
Epoch 1/5
59712/60000 [============================>.] - ETA: 0s - loss: 0.0648 - acc: 0.9796
Epoch 00001: saving model to /content/scratch/blynch-job1.hdf5
60000/60000 [==============================] - 4s 65us/sample - loss: 0.0648 - acc: 0.9796
Epoch 2/5
59648/60000 [============================>.] - ETA: 0s - loss: 0.0590 - acc: 0.9806
Epoch 00002: saving model to /content/scratch/blynch-job1.hdf5
60000/60000 [==============================] - 4s 64us/sample - loss: 0.0590 - acc: 0.9806
Epoch 3/5
59296/60000 [============================>.] - ETA: 0s - loss: 0.0524 - acc: 0.9825
Epoch 00003: saving model to /content/scratch/blynch-job1.hdf5
60000/60000 [==============================] - 4s 64us/sample - loss: 0.0528 - acc: 0.9823
Epoch 4/5
59584/60000 [============================>.] - ETA: 0s - loss: 0.0469 - acc: 0.9849
Epoch 00004: saving model to /content/scratch/blynch-job1.hdf5
60000/60000 [==============================] - 4s 65us/sample - loss: 0.0469 - acc: 0.9849
Epoch 5/5
59328/60000 [============================>.] - ETA: 0s - loss: 0.0452 - acc: 0.9849
Epoch 00005: saving model to /content/scratch/blynch-job1.hdf5
60000/60000 [==============================] - 4s 64us/sample - loss: 0.0452 - acc: 0.9850
total 1224
-rw-r--r-- 1 root root 1249616 Feb 18 21:30 blynch-job1.hdf5
In [0]:
# copy a trained model to somewhere more permanent

!cp /content/scratch/blynch-job1.hdf5 '/content/gdrive/My Drive/Tensorflow'

TensorBoard

  • Tensorboard is a tool to vizualize TensorFlow graphs and output.
  • TensorFlow can output metrics to track the optimization process for a model
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

alt text

Stacking layers

Instea of defining the entire model in 1 line, we can add layers like this:

In [0]:
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()

Then we can add layers

In [0]:
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))

and then compile the model with a loss function and optimizer

In [0]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
In [0]:
 

Drawing

  inputs = Input((IMG_WIDTH,IMG_WIDTH, 1))

  layers = [0] * (U_DEPTH*4+2)
  crops  = [0] *  U_DEPTH
  layers[0] = Lambda(lambda x: x / 255) (inputs)

  for i in range(U_DEPTH):
      features = MIN_FEATURES*2**i
      layers[2*i+1] = Conv2D(features, (3, 3), activation='elu', kernel_initializer='he_normal', padding=padding) (layers[2*i])
      layers[2*i+1] = Conv2D(features, (3, 3), activation='elu', kernel_initializer='he_normal', padding=padding) (layers[2*i+1])
      layers[2*i+2] = MaxPooling2D((2, 2)) (layers[2*i+1])
  features = MIN_FEATURES*2**U_DEPTH
  layers[U_DEPTH*2+1] = Conv2D(features, (3, 3), activation='elu', kernel_initializer='he_normal', padding=padding) (layers[U_DEPTH*2])
  layers[U_DEPTH*2+1] = Conv2D(features, (3, 3), activation='elu', kernel_initializer='he_normal', padding=padding) (layers[U_DEPTH*2+1])

  for i in range(U_DEPTH):
      edge = 2**(i+2) + 2**(i+3) - 2**3
      features = MIN_FEATURES*2**(U_DEPTH-i-1)
      crops[i] = Cropping2D((edge, edge))(layers[U_DEPTH*2-1-2*i])
      layers[U_DEPTH*2+2+i*2] = Conv2DTranspose(features, (2, 2), strides=(2, 2), padding=padding) (layers[U_DEPTH*2+1+i*2])
      layers[U_DEPTH*2+2+i*2] = concatenate([layers[U_DEPTH*2+2+i*2], crops[i]], axis=3)
      layers[U_DEPTH*2+3+i*2] = Conv2D(features, (3, 3), activation='elu', kernel_initializer='he_normal', padding=padding) (layers[U_DEPTH*2+2+i*2])
      layers[U_DEPTH*2+3+i*2] = Conv2D(features, (3, 3), activation='elu', kernel_initializer='he_normal', padding=padding) (layers[U_DEPTH*2+3+i*2])

  outputs = Conv2D(1, (1, 1), activation='sigmoid') (layers[U_DEPTH*4+1])
  model = Model(inputs=[inputs], outputs=[outputs])

PyTorch

In [0]:
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

Using GPUs and Parallel Training

Tensorflow

When using a GPU-enabled version of Tensorflow on hardware with GPUs, TensorFlow will try to make use of the hardware. This is a good place to start until you have a very thorough understanding of your problem. After that, you can look into:

tf.distribute.Strategy - use multiple GPUs
tf.distribute.Strategy - use multiple NODEs

Pytorch

DataParallel - multiple GPUs torch.distributed - multiple nodes

Horovod

Another framework for running Tensorflow or Pytorch over multiple nodes.

 cp -r /home/dhp/public/deep_learning/horovad/use_8_gpus .
 cd use_8_gpus
 qsub pbs_run
In [0]: