As usual, we want to avoid complexity so we re-worked a bit the sample and make it way much more easier.
Step 1
You'll see tha in the example a container need to be created before running the sample, we merged the container commands directly in the YAML file so now it's one-click job.
For CPU only
If you have GPU enabled you may run it this way
Step 2
Check if pod are deployed correctly with
It should ouput something like this
Step 3
Check logs result of your training job
You should observe an output similar to this (since we are using 1 Master and 1 worker in this case)