PyTorch Operator

Kubeflow PyTorch-Job Training Operator

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration

  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed. More information at https://github.com/kubeflow/pytorch-operator _**_or the PyTorch site https://pytorch.org/

Quick Start

As usual, let's deploy PyTorch with one single line command

If you leverage CPU only

curl -sfL https://get.k3ai.in | bash -s -- --cpu  --plugin_pytorch-operator

if you like to use PyTorch with GPU

curl -sfL https://get.k3ai.in | bash -s -- --gpu --plugin_pytorch-operator

Test You PyTorch-Job installation

We will use the MNISE example from the Kubeflow PyTorch-Job repo at https://github.com/kubeflow/pytorch-operator/tree/master/examples/mnist****

As usual, we want to avoid complexity so we re-worked a bit the sample and make it way much more easier.

Step 1

You'll see tha in the example a container need to be created before running the sample, we merged the container commands directly in the YAML file so now it's one-click job.

For CPU only

If you have GPU enabled you may run it this way

Step 2

Check if pod are deployed correctly with

It should ouput something like this

Step 3

Check logs result of your training job

You should observe an output similar to this (since we are using 1 Master and 1 worker in this case)

Last updated

Was this helpful?