Tensorflow Operator
Kubeflow Tensorflow-Job Training Operator
TFJob provides a Kubernetes custom resource that makes it easy to run distributed or non-distributed TensorFlow jobs on Kubernetes.
More on the Tensorflow Operator at https://github.com/kubeflow/tf-operator****
Quick Start
All you have to run is with CPU support
curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_tf-operatorto run with GPU support
curl -sfL https://get.k3ai.in | bash -s -- --gpu--plugin_tf-operatorTest your installation
We present here a sample from Tensorflow Operator on https://github.com/kubeflow/tf-operator****
Step 1
We first need to add a persistent volume and claim, to do so let's add the two YAML file we need, copy and paste each command in order.
k3s kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: tfevent-volume
labels:
type: local
app: tfjob
spec:
capacity:
storage: 10Gi
storageClassName: local-path
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/data
EOFnow we add the PVC.
Note: Because we are using local-path as storage volume and we are on a single node cluster we can't use ReadWriteMany as per Rancher local-path provisioner issue https://github.com/rancher/local-path-provisioner/issues/70#issuecomment-574390050__
Step 2
Now we deploy the example
You can observe the result of the example with
It should output something similar to this (we show just partially the output here)
Last updated
Was this helpful?