Tensorflow Operator

Kubeflow Tensorflow-Job Training Operator

TFJob provides a Kubernetes custom resource that makes it easy to run distributed or non-distributed TensorFlow jobs on Kubernetes.

More on the Tensorflow Operator at https://github.com/kubeflow/tf-operator****

Quick Start

All you have to run is with CPU support

curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_tf-operator

to run with GPU support

curl -sfL https://get.k3ai.in | bash -s -- --gpu--plugin_tf-operator

Test your installation

We present here a sample from Tensorflow Operator on https://github.com/kubeflow/tf-operator****

Step 1

We first need to add a persistent volume and claim, to do so let's add the two YAML file we need, copy and paste each command in order.

k3s kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: tfevent-volume
  labels:
    type: local
    app: tfjob
spec:
  capacity:
    storage: 10Gi
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/data
EOF

now we add the PVC.

Note: Because we are using local-path as storage volume and we are on a single node cluster we can't use ReadWriteMany as per Rancher local-path provisioner issue https://github.com/rancher/local-path-provisioner/issues/70#issuecomment-574390050__

Step 2

Now we deploy the example

You can observe the result of the example with

It should output something similar to this (we show just partially the output here)

Last updated

Was this helpful?