1 of 17

K3ai V1 Docs

K3ai (keɪ3ai)

This is the Version 1 of k3ai. This version is now deprecated so it should not be used anymore

K3ai is a lightweight infrastructure-in-a-box specifically built to install and configure AI tools and platforms to quickly experiment and/or run in production over edge devices.

Ready to experiment?

curl -sfL https://get-core.k3ai.in | bash -

Looking for more interaction? join our Slack channel here****

Components of K3ai

Currently, we install the following components (the list is changing and growing):

Kubernetes based on K3s from Rancher: https://k3s.io/
Kubeflow pipelines: https://github.com/kubeflow/pipelines
Argo Workflows: https://github.com/argoproj/argo
Kubeflow: https://www.kubeflow.org/ - (coming soon)
NVIDIA GPU support: https://docs.nvidia.com/datacenter/cloud-native/index.html
NVIDIA Triton inference server: https://github.com/triton-inference-server/server/tree/master/deploy/single_server (coming soon)
Tensorflow Serving: https://www.tensorflow.org/tfx/serving/serving_kubernetes:
- ResNet
- Mnist (coming soon)

What we are trying to solve (a.k.a Our Goals)

"The great danger for most of us lies not in setting our aim too high and falling short, but in setting our aim too low, and achieving our mark." –Michelangelo

Identify the problem

Artificial Intelligence platforms are complex. They combine a multitude of tools and frameworks that help Data Scientist and Data Engineers to solve the problem of building end-to-end pipelines.

But those AI platforms, by inheritance, have a degree of complexity. Let take at the use case of some of them:

The end goal of every organization that utilizes machine learning (ML) is to have their ML models successfully run in production and generate value to the business. But what does it take to reach that point?
Before a model ends up in production, there are potentially many steps required to build and deploy an ML model: data loading, verification, splitting, processing, feature engineering, model training and verification, hyperparameter tuning, and model serving.
In addition, ML models can require more observation than traditional applications, because your data inputs can drift over time. Manually rebuilding models and data sets is time consuming and error prone.
Kubeflow project -

See the elephant in the room? We all have to struggle with the complexity of a process that looks like the one below

So here the first problem we identified (yes I said first): Remove the complexity and give you a straight solution.

Now there are plenty of alternatives when it comes to the infrastructure (local infrastructure) like:

Minikube
Kind
Docker for Windows (Kubernetes)
MicroK8s

And some of them even allow you to install some platforms like Kubeflow but.. could you cherry-picking AI tools and/or solutions and running them on top of an infrastructure that does not suck up your entire laptop RAM? Let say you start from learning the basics of training a model on different platforms and later move to learn serving models. You won't have everything running but move from one configuration to the other quickly.

Identify the other problem

If experimentation is one face of the coin the other is using K3ai in the context of CI/CD.

Data Engineers, DevOps or in a more fancy definition AIOps have to face the challenge of building infrastructure pipelines that satisfy the following requirements:

Must be FAST to be built and EASY to be destroyed
Must be AVAILABLE everywhere no matter if it's on-prem, on-cloud, or in the remote universe
Must be REPRODUCIBLE you want to be able to replicate the scenario again and again without having every time to re-configure things from scratch

Solving the problem

K3ai goal is to provide a micro-infrastructure that removes the complexity of the installation, configuration, and execution of any AI platform so that the user may focus on experimentation.

We want to satisfy the need for AI citizens and Corporate Scientists to be able to focus on what matters to them and forget the complexity attached to it.

To do so we have to satisfy a few requirements:

Everything we code has to be SIMPLE enough that anybody can contribute back
Everything must live within ONE single command. This way may easily be integrated within any automation script
Everything must be MODULAR. We want to provide the greatest list of AI tools/solution ever so people may cherry-picking and create their own AI infrastructure combinations
We DO NOT install anything client-side (aka we don't want to be invasive) if not the minimal tools needed to run the solution (i.e.: k3s)
We want to FAST
We want to be LIGHTWEIGHT

K3ai is for the community by the community we want to be the reference to learn, grow for AI professionals, students and researchers.

Quick Start

First things First

Start by installing K3Ai from the V1 channel with this:

curl -sfL https://get-core.k3ai.in | bash -

Note: sometimes things take longer than expected resulting in the error below:

error: timed out waiting for the condition on xxxxxxx

Don't worry! Sometimes the installation takes a few minutes, especially the Vagrant version or if you have limited bandwidth.

Still curious? Here's a short demo:

Remove k3ai

In order to uninstall k3ai we provide a simple command to remove all components. All you have to do is launch the following command:

k3s-uninstall.sh

Contributing

Welcome to the K3ai project! We took the freedom to take these rules from other great OSS projects like Kubeflow, Kubernetes, and so on.‌

Getting started as a K3ai contributor‌

This document is the single source of truth for how to contribute to the code base. We'd love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.‌

As you will notice we do not, currently, require any CLA signature. This may change in the future anyway but if so even that change will follow the contributing guidelines and processes.‌

Follow the code of conduct‌

Please make sure to read and observe our Code of Conduct and inclusivity document.‌

Joining the community‌

Follow these instructions if you want to‌

Become a member of the K3ai GitHub org (see below)
Be recognized as an individual or organization contributing to K3ai

‌

Joining the K3ai GitHub Org‌

Before asking to join the community, we ask that you first make a small number of contributions to demonstrate your intent to continue contributing to K3ai.‌

There are a number of ways to contribute to K3ai‌

Submit PRs
File issues reporting bugs or providing feedback
Answer questions on Slack or GitHub issues

‌

When you are ready to join‌

Send a PR adding yourself as a member in org.yaml
After the PR is merged an admin will send you an invitation
- This is a manual process we are a very small team so please be patient
- If a week passes without receiving an invitation reach out on k3ai#community

‌

Your first contribution‌

Find something to work on‌

Help is always welcome! For example, documentation (like the text you are reading now) can always use improvement. There's always code that can be clarified and variables or functions that can be renamed or commented on. There's always a need for more test coverage. You get the idea - if you ever see something you think should be fixed, you should own it. Here is how you get started.‌

Starter issues‌

To find K3ai issues that make good entry points:‌

Start with issues labeled good first issue.
For issues that require deeper knowledge of one or more technical aspects,
look at issues labeled help wanted.
Examine the issues in any of the
K3ai repositories.

‌Owners files and PR workflow‌

Our PR workflow goal is to become almost nearly identical to Kubernetes'. Most of these instructions are a modified version of Kubernetes' contributors and owners guides.‌

Overview of OWNERS files‌

Nov. 2020 We are not yet to the point where we use OWNERS and/or REVIEWERS but we plan things in advance so the below represents the idea of future workflows.‌

OWNERS files are used to designate responsibility for different parts of the K3ai codebase. Today, we use them to assign the reviewer and approver roles used in our two-phase code review process.‌

The velocity of a project that uses code review is limited by the number of people capable of reviewing code. The quality of a person's code review is limited by their familiarity with the code under review. Our goal is to address both of these concerns through the prudent use and maintenance of OWNERS files‌

OWNERS‌

Each directory that contains a unit of independent code or content may also contain an OWNERS file. This file applies to everything within the directory, including the OWNERS file itself, sibling files, and child directories.‌

OWNERS files are in YAML format and support the following keys:‌

approvers: a list of GitHub usernames or aliases that can /approve a PR
labels: a list of GitHub labels to automatically apply to a PR
options: a map of options for how to interpret this OWNERS file, currently only one:
- no_parent_owners: defaults to false if not present; if true, exclude parent OWNERS files.
  Allows the use case where a/deep/nested/OWNERS file prevents a/OWNERS file from having any
  effect on a/deep/nested/bit/of/code
reviewers: a list of GitHub usernames or aliases that are good candidates to /lgtm a PR

‌All users are expected to be assignable. In GitHub terms, this means they are either collaborators of the repo, or members of the organization to which the repo belongs.‌

A typical OWNERS file looks like:

approvers:  
    - alice  
    - bob     
# this is a comment
reviewers:  
    - alice  
    - carol   
# this is another comment  
    - sig-foo # this is an alias

‌OWNERS_ALIASES‌

Each repo may contain at its root an OWNERS_ALIAS file.‌

OWNERS_ALIAS files are in YAML format and support the following keys:‌

aliases: a mapping of alias name to a list of GitHub usernames

‌We use aliases for groups instead of GitHub Teams, because changes to GitHub Teams are not publicly auditable.‌

A sample OWNERS_ALIASES file looks like:

aliases:  
    sig-foo:    
        - david    
        - erin  
    sig-bar:    
        - bob    
        - frank

‌GitHub usernames and aliases listed in OWNERS files are case-insensitive.‌

The code review process

‌The author submits a PR

[FUTURE]~~Phase 0: Automation suggests~~ ~~reviewers~~ ~~and~~ ~~approvers~~ ~~for the PR~~
- ~~Determine the set of OWNERS files nearest to the code being changed~~
- ~~Choose at least two suggested~~ ~~reviewers, trying to find a unique reviewer for every leaf~~
  ~~OWNERS file, and request their reviews on the PR~~
- ~~Choose suggested~~ ~~approvers, one from each OWNERS file, and list them in a comment on the PR~~
Phase 1: Humans review the PR
- Reviewers look for general code quality, correctness, sane software engineering, style, etc.
- Anyone in the organization can act as a reviewer with the exception of the individual who
  opened the PR
- If the code changes look good to them, a reviewer types /lgtm in a PR comment or review;
  if they change their mind, they /lgtm cancel
- [FUTURE]~~Once a~~ ~~reviewer~~ ~~has~~ /lgtm~~'ed,~~ ~~prow~~
  (~~@k8s-ci-robot) applies an~~ lgtm ~~label to the PR~~
Phase 2: Humans approve the PR
- The PR author /assign's all suggested approvers to the PR, and optionally notifies
  them (eg: "pinging @foo for approval")
- Only people listed in the relevant OWNERS files, either directly or through an alias, can act
  as approvers, including the individual who opened the PR
- Approvers look for holistic acceptance criteria, including dependencies with other features,
  forwards/backwards compatibility, API and flag definitions, etc
- If the code changes look good to them, an approver types /approve in a PR comment or
  review; if they change their mind, they /approve cancel
- ~~prow~~ (~~@k8s-ci-robot) updates its~~
  ~~comment in the PR to indicate which~~ ~~approvers~~ ~~still need to approve~~
- ~~Once all~~ ~~approvers~~ ~~(one from each of the previously identified OWNERS files) have approved,~~
  ~~prow~~ (~~@k8s-ci-robot) applies an~~
  approved ~~label~~
Phase 3: Automation merges the PR:
- If all of the following are true:
  - All required labels are present (eg: lgtm, approved)
  - Any blocking labels are missing (eg: there is no do-not-merge/hold, needs-rebase)
- And if any of the following are true:
  - there are no presubmit prow jobs configured for this repo
  - there are presubmit prow jobs configured for this repo, and they all pass after automatically
    being re-run one last time
- Then the PR will automatically be merged

‌

Quirks of the process

‌There are a number of behaviors we've observed that while possible are discouraged, as they go against the intent of this review process. Some of these could be prevented in the future, but this is the state of today.‌

An approver's /lgtm is simultaneously interpreted as an /approve
- While a convenient shortcut for some, it can be surprising that the same command is interpreted
  in one of two ways depending on who the commenter is
- Instead, explicitly write out /lgtm and /approve to help observers, or save the /lgtm for
  a reviewer
- This goes against the idea of having at least two sets of eyes on a PR, and may be a sign that
  there are too few reviewers (who aren't also approver)
Technically, anyone who is a member of the K3ai GitHub organization can drive-by /lgtm a
PR
- Drive-by reviews from non-members are encouraged as a way of demonstrating experience and
  intent to become a collaborator or reviewer
- Drive-by /lgtm's from members may be a sign that our OWNERS files are too small, or that the
  existing reviewers are too unresponsive
- This goes against the idea of specifying reviewers in the first place, to ensure that
  author is getting actionable feedback from people knowledgeable with the code
Reviewers, and approvers are unresponsive
- This causes a lot of frustration for authors who often have little visibility into why their
  PR is being ignored
- Many reviewers and approvers are so overloaded by GitHub notifications that @mention'ing
  is unlikely to get a quick response
- If an author /assign's a PR, reviewers and approvers will be made aware of it on
  their PR dashboard
- An author can work around this by manually reading the relevant OWNERS files,
  /unassign'ing unresponsive individuals, and /assign'ing others
- This is a sign that our OWNERS files are stale; pruning the reviewers and approvers lists
  would help with this
- It is the PR authors responsibility to drive a PR to resolution. This means if the PR reviewers are unresponsive they should escalate as noted below
  - e.g ping reviewers in a timely manner to get it reviewed
  - If the reviewers don't respond look at the OWNERs file in root and ping approvers listed there
Authors are unresponsive
- This costs a tremendous amount of attention as context for an individual PR is lost over time
- This hurts the project in general as its general noise level increases over time
- Instead, close PR's that are untouched after too long (we currently have a bot do this after 30
  days)

‌

~~Automation using OWNERS files~~‌

prow

‌~~Prow receives events from GitHub, and reacts to them. It is effectively stateless. The following pieces of prow are used to implement the code review process above.~~‌

~~cmd: tide~~
- ~~per-repo configuration:~~
  - labels~~: list of labels required to be present for merge (eg:~~ lgtm)
  - missingLabels~~: list of labels required to be missing for merge (eg:~~ do-not-merge/hold)
  - reviewApprovedRequired~~: defaults to~~ false~~; when true, require that there must be at least~~
    ~~one~~ ~~approved pull request review~~
    ~~present for merge~~
  - merge_method~~: defaults to~~ merge~~; when~~ squash or rebase~~, use that merge method instead~~
    ~~when clicking a PR's merge button~~
- ~~merges PR's once they meet the appropriate criteria as configured above~~
- ~~if there are any presubmit prow jobs for the repo the PR is against, they will be re-run one~~
  ~~final time just prior to merge~~
~~plugin: assign~~
- ~~assigns GitHub users in response to~~ /assign ~~comments on a PR~~
- ~~unassigns GitHub users in response to~~ /unassign ~~comments on a PR~~
~~plugin: approve~~
- ~~per-repo configuration:~~
  - issue_required~~: defaults to~~ false~~; when~~ true~~, require that the PR description link to~~
    ~~an issue, or that at least one~~ ~~approver~~ ~~issues a~~ /approve no-issue
  - implicit_self_approve~~: defaults to~~ false~~; when~~ true~~, if the PR author is in relevant~~
    ~~OWNERS files, act as if they have implicitly~~ /approve'd
- ~~adds the~~ approved ~~label once an~~ ~~approver~~ ~~for each of the required~~
  ~~OWNERS files has~~ /approve'd
- ~~comments as required OWNERS files are satisfied~~
- ~~removes outdated approval status comments~~
~~plugin: blunderbuss~~
- ~~determines~~ ~~reviewers~~ ~~and requests their reviews on PR's~~
~~plugin: lgtm~~
- ~~adds the~~ lgtm ~~label when a~~ ~~reviewer~~ ~~comments~~ /lgtm ~~on a PR~~
- ~~the~~ ~~PR author~~ ~~may not~~ /lgtm ~~their own PR~~
~~pkg: k8s.io/test-infra/prow/repoowners~~
- ~~parses OWNERS and OWNERS_ALIAS files~~
- ~~if the~~ no_parent_owners ~~option is encountered, parent owners are excluded from having~~
  ~~any influence over files adjacent to or underneath of the current OWNERS file~~

‌

Maintaining OWNERS files

‌OWNERS files should be regularly maintained.‌

We encourage people to self-nominate or self-remove from OWNERS files via PR's. Ideally in the future we could use metrics-driven automation to assist in this process.‌

We should strive to:‌

grow the number of OWNERS files
add new people to OWNERS files
ensure OWNERS files only contain org members and repo collaborators
ensure OWNERS files only contain people are actively contributing to or reviewing the code they own
remove inactive people from OWNERS files

‌

Bad examples of OWNERS usage:‌

directories that lack OWNERS files, resulting in too many hitting root OWNERS
OWNERS files that have a single person as both approver and reviewer
OWNERS files that haven't been touched in over 6 months
OWNERS files that have non-collaborators present

‌Good examples of OWNERS usage:‌

there are more reviewers than approvers
the approvers are not i

Plugins

GPU support

Quick Start Guide

To install a GPU-enabled cluster there are few mandatory steps to prepare in advance.

Please follow this guide from NVIDIA to install the pre-requisites:

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide

Once you completed the pre-req's you may install everything with the following command:

curl -sfL https://get.k3ai.in | bash -s -- --gpu

Kubeflow Pipelines

Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.

Quick Start Guide

You only have to decide if you want CPU support:

curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_kfpipelines

or if you prefer GPU support:

curl -sfL https://get.k3ai.in | bash -s -- --gpu --plugin_kfpipelines

What is Kubeflow Pipelines?

The Kubeflow Pipelines platform consists of:

A user interface (UI) for managing and tracking experiments, jobs, and runs.
An engine for scheduling multi-step ML workflows.
An SDK for defining and manipulating pipelines and components.
Notebooks for interacting with the system using an SDK.

The following are the goals of Kubeflow Pipelines:

End-to-end orchestration: enabling and simplifying the orchestration of machine learning pipelines.
Easy experimentation: making it easy for you to try numerous ideas and techniques and manage your various trials/experiments.
Easy re-use: enabling you to re-use components and pipelines to quickly create end-to-end solutions without having to rebuild each time.

Learn more on the Kubeflow website: https://www.kubeflow.org/docs/pipelines/****

Kubeflow SDK library

To help K3ai users to interact with Kubeflow we are introducing the support of Kubeflow SDK library (kfp).

The Kubeflow Pipelines SDK provides a set of Python packages that you can use to specify and run your machine learning (ML) workflows. A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other - https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/****

We offer two different way to consume kfp within k3ai:

by a virtual environment on the local computer of the user
by one of our Jupyter notebooks directly within k3ai

KFP with VirtualEnv

virtualenv is a tool to create isolated Python environments. Since Python 3.3, a subset of it has been integrated into the standard library under the venv module. -https://virtualenv.pypa.io/en/latest/

Step 1

Please check you have virtualenv installed on your machine. Depending on the OS you are using you may use different approaches. Please follow the official guides to install virtualenv at https://virtualenv.pypa.io/en/latest/installation.html****

Step 2

Run the following command:

curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_kfp_sdk

Once the installer has finished please proceed to the "How to use KFP SDK" section.

KFP within K3ai environment

We leverage Jupyter Notebooks to provide a pre-installed kfp environment so that one may immediately experiment with this.

curl -sfL https://get.k3ai.in | bash -s --  --cpu --pipelines --plugin_jupyter-minimal

If you are using WSL

curl -sfL https://get.k3ai.in | bash -s --  --wsl --pipelines --plugin_jupyter-minimal

If you already deployed the pipelines simply run:

curl -sfL https://get.k3ai.in | bash -s -- --skipk3s --plugin_jupyter-minimal

Once the installer has finished please proceed to the "How to use KFP SDK" section.

How to use KFP SDK

We present here a simple example to explain how the KFP SDK may be used. More examples may be found at https://www.kubeflow.org/docs/pipelines/tutorials/sdk-examples/****

Testing KFP SDK from virtualenv

This procedure will require you the K3ai IP address in case you forgot it the simplest way to grab it is execute the following command:

k3s kubectl get service/traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}' -n kube-system

In your terminal create a file called demo.py, use your favorite IDE to copy and paste the below example, and change it accordingly to your K3ai environment.

import kfp
import json

# 'host' is your Kubeflow Pipelines API server's host address.

host="http://<K3AI IP>/"

# 'pipeline_name' is the name of the pipeline you want to list. We provide you
#  here a pre-set name to test immediately

pipeline_name = "[Demo] TFX - Iris classification pipeline"

client = kfp.Client(host)

# To filter on pipeline name, you can use a predicate indicating that the pipeline
# name is equal to the given name.
# A predicate includes 'key', 'op' and 'string_value' fields.
# The 'key' specifies the property you want to apply the filter to. For example,
# if you want to filter on the pipeline name, then 'key' is set to 'name' as
# shown below.
# The 'op' specifies the operator used in a predicate. The operator can be
# EQUALS, NOT_EQUALS, GREATER_THAN, etc. The complete list is at [filter.proto](https://github.com/kubeflow/pipelines/blob/master/backend/api/filter.proto#L32)
# When using the operator in a string-typed predicate, you need to use the
# corresponding integer value of the enum. For Example, you can use the integer
# value 1 to indicate EQUALS as shown below.
# The 'string_value' specifies the value you want to filter with.

filter = json.dumps({'predicates': [{'key': 'name', 'op': 1, 'string_value': '{}'.format(pipeline_name)}]})
pipelines = client.pipelines.list_pipelines(filter=filter)

# The pipeline with the given pipeline_name, if exists, is in pipelines.pipelines[0].

print (pipelines)

save the file and execute it with

python demo.py

You should get a result like in the "Checking the results" section.

Testing with Jupyter Notebooks

Open your Jupyter Notebook at the address provided during the plugin installation. If you forgot the ip you may retrieve it this way

IP=$(kubectl get service/traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}' -n kube-system) \
&& echo "http://"$IP":8888"

Once the Notebook is open click on top right of the notebook to create a new ipython environment

In the first cell paste the following script

import kfp
import json

# 'host' is your Kubeflow Pipelines API server's host address.

host="http://ml-pipeline-ui.kubeflow/"

# 'pipeline_name' is the name of the pipeline you want to list. We provide you
#  here a pre-set name to test immediately

pipeline_name = "[Demo] TFX - Iris classification pipeline"

client = kfp.Client(host)

# To filter on pipeline name, you can use a predicate indicating that the pipeline
# name is equal to the given name.
# A predicate includes 'key', 'op' and 'string_value' fields.
# The 'key' specifies the property you want to apply the filter to. For example,
# if you want to filter on the pipeline name, then 'key' is set to 'name' as
# shown below.
# The 'op' specifies the operator used in a predicate. The operator can be
# EQUALS, NOT_EQUALS, GREATER_THAN, etc. The complete list is at [filter.proto](https://github.com/kubeflow/pipelines/blob/master/backend/api/filter.proto#L32)
# When using the operator in a string-typed predicate, you need to use the
# corresponding integer value of the enum. For Example, you can use the integer
# value 1 to indicate EQUALS as shown below.
# The 'string_value' specifies the value you want to filter with.

filter = json.dumps({'predicates': [{'key': 'name', 'op': 1, 'string_value': '{}'.format(pipeline_name)}]})
pipelines = client.pipelines.list_pipelines(filter=filter)

# The pipeline with the given pipeline_name, if exists, is in pipelines.pipelines[0].

print (pipelines)

Pres CTRL+ENTER to execute the cell.

Checking the results

If everything went well in both virtualenv and jupyter notebooks samples you should have a result similar to this:

{'next_page_token': None,
 'pipelines': [{'created_at': datetime.datetime(2020, 10, 14, 13, 27, 18, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2020, 10, 14, 13, 27, 18, tzinfo=tzlocal()),
                                    'id': '8a53981e-7c3e-4897-8c75-26f710c20f7a',
                                    'name': '[Demo] TFX - Iris classification '
                                            'pipeline',
                                    'package_url': None,
                                    'parameters': [{'name': 'pipeline-root',
                                                    'value': 'gs://{{kfp-default-bucket}}/tfx_iris/{{workflow.uid}}'},
                                                   {'name': 'data-root',
                                                    'value': 'gs://ml-pipeline/sample-data/iris/data'},
                                                   {'name': 'module-file',
                                                    'value': '/tfx-src/tfx/examples/iris/iris_utils_native_keras.py'}],
                                    'resource_references': [{'key': {'id': '8a53981e-7c3e-4897-8c75-26f710c20f7a',
                                                                     'type': 'PIPELINE'},
                                                             'name': None,
                                                             'relationship': 'OWNER'}]},
                'description': '[source '
                               'code](https://github.com/kubeflow/pipelines/tree/c84f4da0f7b534e1884f6696f161dc1375206ec2/samples/core/iris). '
                               'Example pipeline that classifies Iris flower '
                               'subspecies and how to use native Keras within '
                               'TFX.',
                'error': None,
                'id': '8a53981e-7c3e-4897-8c75-26f710c20f7a',
                'name': '[Demo] TFX - Iris classification pipeline',
                'parameters': [{'name': 'pipeline-root',
                                'value': 'gs://{{kfp-default-bucket}}/tfx_iris/{{workflow.uid}}'},
                               {'name': 'data-root',
                                'value': 'gs://ml-pipeline/sample-data/iris/data'},
                               {'name': 'module-file',
                                'value': '/tfx-src/tfx/examples/iris/iris_utils_native_keras.py'}],
                'url': None}],
 'total_size': 1}

Tensorflow Operator

Kubeflow Tensorflow-Job Training Operator

TFJob provides a Kubernetes custom resource that makes it easy to run distributed or non-distributed TensorFlow jobs on Kubernetes.

More on the Tensorflow Operator at https://github.com/kubeflow/tf-operator****

Quick Start

All you have to run is with CPU support

curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_tf-operator

to run with GPU support

curl -sfL https://get.k3ai.in | bash -s -- --gpu--plugin_tf-operator

Test your installation

We present here a sample from Tensorflow Operator on https://github.com/kubeflow/tf-operator****

Step 1

We first need to add a persistent volume and claim, to do so let's add the two YAML file we need, copy and paste each command in order.

k3s kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: tfevent-volume
  labels:
    type: local
    app: tfjob
spec:
  capacity:
    storage: 10Gi
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/data
EOF

now we add the PVC.

k3s kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: tfevent-volume
  namespace: kubeflow 
  labels:
    type: local
    app: tfjob
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
EOF

Note: Because we are using local-path as storage volume and we are on a single node cluster we can't use ReadWriteMany as per Rancher local-path provisioner issue https://github.com/rancher/local-path-provisioner/issues/70#issuecomment-574390050__

Step 2

Now we deploy the example

kubectl apply -f https://raw.githubusercontent.com/kubeflow/tf-operator/master/examples/v1/mnist_with_summaries/tf_job_mnist.yaml

You can observe the result of the example with

kubectl logs -l tf-job-name=mnist -n kubeflow --tail=-1

It should output something similar to this (we show just partially the output here)

...
Adding run metadata for 799
Accuracy at step 800: 0.957
Accuracy at step 810: 0.9698
Accuracy at step 820: 0.9676
Accuracy at step 830: 0.9676
Accuracy at step 840: 0.9677
Accuracy at step 850: 0.9673
Accuracy at step 860: 0.9676
Accuracy at step 870: 0.9654
Accuracy at step 880: 0.9694
Accuracy at step 890: 0.9708
Adding run metadata for 899
Accuracy at step 900: 0.9737
Accuracy at step 910: 0.9708
Accuracy at step 920: 0.9721
Accuracy at step 930: 0.972
Accuracy at step 940: 0.9639
Accuracy at step 950: 0.966
Accuracy at step 960: 0.9654
Accuracy at step 970: 0.9683
Accuracy at step 980: 0.9685
Accuracy at step 990: 0.9666
Adding run metadata for 999

PyTorch Operator

Kubeflow PyTorch-Job Training Operator

PyTorch is a Python package that provides two high-level features:

Tensor computation (like NumPy) with strong GPU acceleration
Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed. More information at _**_or the PyTorch site

Quick Start

As usual, let's deploy PyTorch with one single line command

If you leverage CPU only

curl -sfL https://get.k3ai.in | bash -s -- --cpu  --plugin_pytorch-operator

if you like to use PyTorch with GPU

curl -sfL https://get.k3ai.in | bash -s -- --gpu --plugin_pytorch-operator

Test You PyTorch-Job installation

We will use the MNISE example from the Kubeflow PyTorch-Job repo at ****

As usual, we want to avoid complexity so we re-worked a bit the sample and make it way much more easier.

Step 1

You'll see tha in the example a container need to be created before running the sample, we merged the container commands directly in the YAML file so now it's one-click job.

For CPU only

k3s kubectl apply -f - << EOF
apiVersion: "kubeflow.org/v1"
kind: "PyTorchJob"
metadata:
  name: "pytorch-dist-mnist-gloo"
  namespace: kubeflow
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          containers:
            - name: pytorch
              image: pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime
              command: ['sh','-c','pip install tensorboardX==1.6.0 && mkdir -p /opt/mnist/src && cd /opt/mnist/src && curl -O https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/mnist/mnist.py && chgrp -R 0 /opt/mnist && chmod -R g+rwX /opt/mnist && python /opt/mnist/src/mnist.py']
              args: ["--backend", "gloo"]

    Worker:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          containers:
            - name: pytorch
              image: pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime
              command: ['sh','-c','pip install tensorboardX==1.6.0 && mkdir -p /opt/mnist/src && cd /opt/mnist/src && curl -O https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/mnist/mnist.py && chgrp -R 0 /opt/mnist && chmod -R g+rwX /opt/mnist && python /opt/mnist/src/mnist.py']
              args: ["--backend", "gloo"]
EOF

If you have GPU enabled you may run it this way

k3s kubectl apply -f - << EOF
apiVersion: "kubeflow.org/v1"
kind: "PyTorchJob"
metadata:
  name: "pytorch-dist-mnist-gloo"
  namespace: kubeflow
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          containers:
            - name: pytorch
              image: pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime
              command: ['sh','-c','pip install tensorboardX==1.6.0 && mkdir -p /opt/mnist/src && cd /opt/mnist/src && curl -O https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/mnist/mnist.py && chgrp -R 0 /opt/mnist && chmod -R g+rwX /opt/mnist && python /opt/mnist/src/mnist.py']
              args: ["--backend", "gloo"]
              # Change the value of nvidia.com/gpu based on your configuration
              resources:
                limits:
                  nvidia.com/gpu: 1 
    Worker:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          containers:
            - name: pytorch
              image: pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime
              command: ['sh','-c','pip install tensorboardX==1.6.0 && mkdir -p /opt/mnist/src && cd /opt/mnist/src && curl -O https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/mnist/mnist.py && chgrp -R 0 /opt/mnist && chmod -R g+rwX /opt/mnist && python /opt/mnist/src/mnist.py']
              args: ["--backend", "gloo"]
              # Change the value of nvidia.com/gpu based on your configuration
              resources:
                limits:
                  nvidia.com/gpu: 1 
EOF

Step 2

Check if pod are deployed correctly with

kubectl get pod -l pytorch-job-name=pytorch-dist-mnist-gloo -n kubeflow

It should ouput something like this

NAME                               READY   STATUS    RESTARTS   AGE
pytorch-dist-mnist-gloo-master-0   1/1     Running   0          2m26s
pytorch-dist-mnist-gloo-worker-0   1/1     Running   0          2m26s

Step 3

Check logs result of your training job

 kubectl logs -l pytorch-job-name=pytorch-dist-mnist-gloo -n kubeflow

You should observe an output similar to this (since we are using 1 Master and 1 worker in this case)

Train Epoch: 1 [55680/60000 (93%)]      loss=0.0341
Train Epoch: 1 [56320/60000 (94%)]      loss=0.0357
Train Epoch: 1 [56960/60000 (95%)]      loss=0.0774
Train Epoch: 1 [57600/60000 (96%)]      loss=0.1186
Train Epoch: 1 [58240/60000 (97%)]      loss=0.1927
Train Epoch: 1 [58880/60000 (98%)]      loss=0.2050
Train Epoch: 1 [59520/60000 (99%)]      loss=0.0642

accuracy=0.9660

Train Epoch: 1 [55680/60000 (93%)]      loss=0.0341
Train Epoch: 1 [56320/60000 (94%)]      loss=0.0357
Train Epoch: 1 [56960/60000 (95%)]      loss=0.0774
Train Epoch: 1 [57600/60000 (96%)]      loss=0.1186
Train Epoch: 1 [58240/60000 (97%)]      loss=0.1927
Train Epoch: 1 [58880/60000 (98%)]      loss=0.2050
Train Epoch: 1 [59520/60000 (99%)]      loss=0.0642

accuracy=0.9660

Argo Workflows

Quick Start Guide

You only have to decide if you want CPU support:

curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_argo_workflow

or you prefer GPU support:

curl -sfL https://get.k3ai.in | bash -s -- --gpu --plugin_argo_workflow

What is Argo Workflows?

Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD.

Define workflows where each step in the workflow is a container.
Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a graph (DAG).
Easily run compute intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes.
Run CI/CD pipelines natively on Kubernetes without configuring complex software development products.

Learn more on the Kubeflow website: https://argoproj.github.io/projects/argo****

Tensorflow Serving - ResNet

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. Learn more about Tensorflow on their site: https://www.tensorflow.org/tfx/guide/serving****

Quick Start Guide

Running TensorFlow Serving to serve the TensorFlow ResNet model is, as usual, a single line trick.

CPU support:

curl -sfL https://get.k3ai.in | bash -s -- --cpu --plugin_tfs-resnet

GPU support:

curl -sfL https://get.k3ai.in | bash -s -- --gpu --plugin_tfs-resnet

Test the installation

For a full explanation of how to use Tensorflow Serving please take a look at the documentation site:

Step 1 - Prepare your client environment

To run any experiment against a remote inference server you have to have tensorflow-serving-api installed on your machine. As per official documentation here:https://www.tensorflow.org/tfx/serving/setup#tensorflow_serving_python_api_pip_package

As reference

pip install tensorflow-serving-api

Step 2

Clone the TensorFlow repository where we will find the test scripts

git clone https://github.com/tensorflow/serving
cd serving

Step 3

Find your cluster IP where Tensorflow Serving service is exposed

kubectl describe service tf-server-service -n tf-serving

You should have a similar output:

Name:                     tf-server-service
Namespace:                tf-serving
Labels:                   <none>
Annotations:              Selector:  app=tf-serv-resnet
Type:                     LoadBalancer
IP:                       10.43.200.139
LoadBalancer Ingress:     172.21.190.98
Port:                     grpc  8500/TCP
TargetPort:               8500/TCP
NodePort:                 grpc  30525/TCP
Endpoints:                10.42.0.139:8500
Port:                     rest  8501/TCP
TargetPort:               8501/TCP
NodePort:                 rest  30907/TCP
Endpoints:                10.42.0.139:8501
Session Affinity:         None
External Traffic Policy:  Cluster

Take note of LoadBalancer Ingress IP

Step 4

We can now query the service at its external address from our local host.

Using gRPC:

python \
  tensorflow_serving/example/resnet_client_grpc.py \
  --server=<LOADBALANCER INGRESS>:8500

Using REST Api:

python \
  tensorflow_serving/example/resnet_client.py \
  --server=<LOADBALANCER INGRESS>:8501

You should have an output similar to this:

#REST Api
Prediction class: 286, avg latency: 87.9074 ms

#gRPC

[INFO]  app=tf-server isn't ready yet. This may take a few minutes...                                    ││ kubeflow     metadata-envoy-deployment-6d776695d9-24xc7       ●  1/1          3 Running      5  11  │

    float_val: 2.1751149688498117e-05
    float_val: 4.679726407630369e-05
    float_val: 6.22767993263551e-06
    float_val: 2.4046405087574385e-05
    float_val: 0.00013994085020385683
    float_val: 5.0004531658487394e-05
    float_val: 1.670094752626028e-05
    float_val: 2.148277962987777e-05
    float_val: 0.0004090495640411973
    float_val: 3.3705742680467665e-05
    float_val: 3.318636345284176e-06
    float_val: 8.649761730339378e-05
    float_val: 3.984206159657333e-06
    float_val: 3.7564968806691468e-06
    float_val: 3.2912407732510474e-06
    float_val: 3.6244309740141034e-06
    float_val: 2.5648103019193513e-06
    float_val: 2.7759107979363762e-05
    float_val: 1.5157910638663452e-05
    float_val: 1.8459862758390955e-06
    float_val: 8.704301990292151e-07
    float_val: 2.724335217862972e-06
    float_val: 3.3186615837621503e-06
    float_val: 1.455540314054815e-06
    float_val: 8.736999006941915e-06
    float_val: 2.299477728229249e-06
    float_val: 2.0985182800359325e-06
    float_val: 0.00026371944113634527
    float_val: 1.0347321222070605e-05
    float_val: 3.660013362605241e-06
    float_val: 2.0003653844469227e-05
    float_val: 6.355750429065665e-06
    float_val: 2.255582785437582e-06
    float_val: 1.5940782986945123e-06
    float_val: 1.2315674666751875e-06
    float_val: 1.1781222610807163e-06
    float_val: 1.4636576452176087e-05
    float_val: 5.812105996483297e-07
    float_val: 6.599811604246497e-05
    float_val: 0.0012952699325978756
  }
}
model_spec {
  name: "resnet"
  version {
    value: 1538687457
  }
  signature_name: "serving_default"
}

WSL (Windows Subsystem for Linux)

Yep, you get it right we also have WSL support!

Note: GPU is not currently supported in k3ai withing WSL. The reason is simply that GPU capability is still in development by NVIDIA and Microsoft so we will wait for it to reach a more stable grade.

Quick Start

The Windows Subsystem for Linux lets developers run a GNU/Linux environment -- including most command-line tools, utilities, and applications -- directly on Windows, unmodified, without the overhead of a traditional virtual machine or the dual-boot setup.

Step 1

Install any Linux distro supported in WSL. For a quick how-to please follow the guide at https://docs.microsoft.com/en-us/windows/wsl/install-win10

Step 2

Once ready simply run the following command:

curl -sfL https://get.k3ai.in | bash -s -- --wsl --pipelines

Note: the command above is slightly different from the other commands we typically use. It will change to the usual once the feature will be merged in the main code.

(Optional) Step 3

Once the installation is finished you may run any other plugin as usual, but with the --skipk3s flag. As an example the Tensorflow Serving - ResNet:

curl -sfL https://get.k3ai.in | bash -s -- --skipk3s --plugin_tfs-resnet

The following plugins are immediately supported:

Argo
Kubeflow pipelines
Tensorflow Serving - Resnet

Troubleshooting

Restart WSL

If you re-login in your WSL session after a "wsl --shutdown" or simply because you restarted/shutdown your computer the k3ai environment will not automatically restart.

We created a utility file for you to re-start k3ai every time. In order to do so execute in WSL:

startk3s

wait for a couple of minutes for the cluster to restart all the pods and you're good to go.

The connection to the server localhost:8080 was refused

If the above error happens please use the following command to install k3ai

curl -sfL https://get.k3ai.in | INSTALL_K3S_BIN_DIR=/usr/bin bash -s -- --wsl --pipelines

Jupyter Notebook

Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. - **[https://jupyter.org/**](https://jupyter.org/)****

We do support the current list of Jupyter Stacks as indicated in:

https://jupyter-docker-stacks.readthedocs.io/****

In order to run Jupyter Notebooks just run the following command:

curl -sfL https://get.k3ai.in | bash -s -- --plugin_jupyter-minimal

The following notebooks plugins are available:

--plugin_jupyter-minimal: (jupyter-minimal) for more information please see here****
--plugin_jupyter-r: (jupyter-r-notebook) for more information please see here****
--plugin_jupyter-scipy: (jupyter-scipy-notebook) for more information please see here****
--plugin_jupyter-tf: (jupyter-tensorflow-notebook) for more information please see here****
--plugin_jupyter-datascience: (jupyter-datascience-notebook) for more information please see here****
--plugin_jupyter-pyspark: (jupyter-pyspark-notebook) for more information please see here****
--plugin_jupyter-allspark: (jupyter-all-spark-notebook)for more information please see here****

Note: all Jupyter Notebooks container Kubeflow SDK (kfp) library to interact with Kubeflow pipelines. In order to install pipelines and Jupyter at the same time use:

curl -sfL https://get.k3ai.in | bash -s -- --plugin_jupyter-<YOUR SELECTED FLAVOR> --plugin_kfpipelines

other guides

Civo Cloud

Civo was born when our small team first came together to create an OpenStack-based cloud for a shared hosting provider. Read their story here: https://www.civo.com/blog/kube100-so-far

In 2019 we went all in and took Civo in a new direction, launching the world’s first k3s-powered, managed Kubernetes service into beta.

As easy as can be, K3ai works perfectly on Civo. Here it is the simplest guide ever to run k3ai on Civo - three steps and your k3ai is ready!

Installing k3ai on Civo

Ready? It requires less than 5 minutes!

You'll need an account on Civo.com. To do so simply register on Civo here:

https://www.civo.com/signup****

Step 1

Launch your k3s cluster using the default options (Traefik and Metric-server selected)

Wait for the instance to finish the deployment

Step 2

Download the kubeconfig file, move it to your preferred location, and set your environment to use it:

kubectl config --kubeconfig="civo-k3ai-kubeconfig"

Step 3

One last thing and then we're done:

 curl -sfL https://get.k3ai.in | bash -s - --skipk3s --plugin_civo_kfpipelines

enjoy your k3ai on https://civo.com****

Contributing

Welcome to the K3ai project! We took the freedom to take these rules from other great OSS projects like Kubeflow, Kubernetes, and so on.‌

Getting started as a K3ai contributor‌

As you will notice we do not, currently, require any CLA signature. This may change in the future anyway but if so even that change will follow the contributing guidelines and processes.‌

Follow the code of conduct‌

Please make sure to read and observe our Code of Conduct and inclusivity document.‌

Joining the community‌

Follow these instructions if you want to‌

Become a member of the K3ai GitHub org (see below)
Be recognized as an individual or organization contributing to K3ai

‌

Joining the K3ai GitHub Org‌

Before asking to join the community, we ask that you first make a small number of contributions to demonstrate your intent to continue contributing to K3ai.‌

There are a number of ways to contribute to K3ai‌

Submit PRs
File issues reporting bugs or providing feedback
Answer questions on Slack or GitHub issues

‌

When you are ready to join‌

Send a PR adding yourself as a member in org.yaml
After the PR is merged an admin will send you an invitation
- This is a manual process we are a very small team so please be patient
- If a week passes without receiving an invitation reach out on k3ai#community

‌

Your first contribution‌

Find something to work on‌

Starter issues‌

To find K3ai issues that make good entry points:‌

Start with issues labeled good first issue.
For issues that require deeper knowledge of one or more technical aspects,
look at issues labeled help wanted.
Examine the issues in any of the
K3ai repositories.

‌Owners files and PR workflow‌

Our PR workflow goal is to become almost nearly identical to Kubernetes'. Most of these instructions are a modified version of Kubernetes' contributors and owners guides.‌

Overview of OWNERS files‌

Nov. 2020 We are not yet to the point where we use OWNERS and/or REVIEWERS but we plan things in advance so the below represents the idea of future workflows.‌

OWNERS‌

OWNERS files are in YAML format and support the following keys:‌

approvers: a list of GitHub usernames or aliases that can /approve a PR
labels: a list of GitHub labels to automatically apply to a PR
options: a map of options for how to interpret this OWNERS file, currently only one:
- no_parent_owners: defaults to false if not present; if true, exclude parent OWNERS files.
  Allows the use case where a/deep/nested/OWNERS file prevents a/OWNERS file from having any
  effect on a/deep/nested/bit/of/code
reviewers: a list of GitHub usernames or aliases that are good candidates to /lgtm a PR

‌All users are expected to be assignable. In GitHub terms, this means they are either collaborators of the repo, or members of the organization to which the repo belongs.‌

A typical OWNERS file looks like:

approvers:  
    - alice  
    - bob     
# this is a comment
reviewers:  
    - alice  
    - carol   
# this is another comment  
    - sig-foo # this is an alias

‌OWNERS_ALIASES‌

Each repo may contain at its root an OWNERS_ALIAS file.‌

OWNERS_ALIAS files are in YAML format and support the following keys:‌

aliases: a mapping of alias name to a list of GitHub usernames

‌We use aliases for groups instead of GitHub Teams, because changes to GitHub Teams are not publicly auditable.‌

A sample OWNERS_ALIASES file looks like:

aliases:  
    sig-foo:    
        - david    
        - erin  
    sig-bar:    
        - bob    
        - frank

‌GitHub usernames and aliases listed in OWNERS files are case-insensitive.‌

The code review process

‌The author submits a PR

[FUTURE]~~Phase 0: Automation suggests~~ ~~reviewers~~ ~~and~~ ~~approvers~~ ~~for the PR~~
- ~~Determine the set of OWNERS files nearest to the code being changed~~
- ~~Choose at least two suggested~~ ~~reviewers, trying to find a unique reviewer for every leaf~~
  ~~OWNERS file, and request their reviews on the PR~~
- ~~Choose suggested~~ ~~approvers, one from each OWNERS file, and list them in a comment on the PR~~
Phase 1: Humans review the PR
- Reviewers look for general code quality, correctness, sane software engineering, style, etc.
- Anyone in the organization can act as a reviewer with the exception of the individual who
  opened the PR
- If the code changes look good to them, a reviewer types /lgtm in a PR comment or review;
  if they change their mind, they /lgtm cancel
- [FUTURE]~~Once a~~ ~~reviewer~~ ~~has~~ /lgtm~~'ed,~~ ~~prow~~
  (~~@k8s-ci-robot) applies an~~ lgtm ~~label to the PR~~
Phase 2: Humans approve the PR
- The PR author /assign's all suggested approvers to the PR, and optionally notifies
  them (eg: "pinging @foo for approval")
- Only people listed in the relevant OWNERS files, either directly or through an alias, can act
  as approvers, including the individual who opened the PR
- Approvers look for holistic acceptance criteria, including dependencies with other features,
  forwards/backwards compatibility, API and flag definitions, etc
- If the code changes look good to them, an approver types /approve in a PR comment or
  review; if they change their mind, they /approve cancel
- ~~prow~~ (~~@k8s-ci-robot) updates its~~
  ~~comment in the PR to indicate which~~ ~~approvers~~ ~~still need to approve~~
- ~~Once all~~ ~~approvers~~ ~~(one from each of the previously identified OWNERS files) have approved,~~
  ~~prow~~ (~~@k8s-ci-robot) applies an~~
  approved ~~label~~
Phase 3: Automation merges the PR:
- If all of the following are true:
  - All required labels are present (eg: lgtm, approved)
  - Any blocking labels are missing (eg: there is no do-not-merge/hold, needs-rebase)
- And if any of the following are true:
  - there are no presubmit prow jobs configured for this repo
  - there are presubmit prow jobs configured for this repo, and they all pass after automatically
    being re-run one last time
- Then the PR will automatically be merged

‌

Quirks of the process

An approver's /lgtm is simultaneously interpreted as an /approve
- While a convenient shortcut for some, it can be surprising that the same command is interpreted
  in one of two ways depending on who the commenter is
- Instead, explicitly write out /lgtm and /approve to help observers, or save the /lgtm for
  a reviewer
- This goes against the idea of having at least two sets of eyes on a PR, and may be a sign that
  there are too few reviewers (who aren't also approver)
Technically, anyone who is a member of the K3ai GitHub organization can drive-by /lgtm a
PR
- Drive-by reviews from non-members are encouraged as a way of demonstrating experience and
  intent to become a collaborator or reviewer
- Drive-by /lgtm's from members may be a sign that our OWNERS files are too small, or that the
  existing reviewers are too unresponsive
- This goes against the idea of specifying reviewers in the first place, to ensure that
  author is getting actionable feedback from people knowledgeable with the code
Reviewers, and approvers are unresponsive
- This causes a lot of frustration for authors who often have little visibility into why their
  PR is being ignored
- Many reviewers and approvers are so overloaded by GitHub notifications that @mention'ing
  is unlikely to get a quick response
- If an author /assign's a PR, reviewers and approvers will be made aware of it on
  their PR dashboard
- An author can work around this by manually reading the relevant OWNERS files,
  /unassign'ing unresponsive individuals, and /assign'ing others
- This is a sign that our OWNERS files are stale; pruning the reviewers and approvers lists
  would help with this
- It is the PR authors responsibility to drive a PR to resolution. This means if the PR reviewers are unresponsive they should escalate as noted below
  - e.g ping reviewers in a timely manner to get it reviewed
  - If the reviewers don't respond look at the OWNERs file in root and ping approvers listed there
Authors are unresponsive
- This costs a tremendous amount of attention as context for an individual PR is lost over time
- This hurts the project in general as its general noise level increases over time
- Instead, close PR's that are untouched after too long (we currently have a bot do this after 30
  days)

‌

~~Automation using OWNERS files~~‌

prow

‌~~Prow receives events from GitHub, and reacts to them. It is effectively stateless. The following pieces of prow are used to implement the code review process above.~~‌

~~cmd: tide~~
- ~~per-repo configuration:~~
  - labels~~: list of labels required to be present for merge (eg:~~ lgtm)
  - missingLabels~~: list of labels required to be missing for merge (eg:~~ do-not-merge/hold)
  - reviewApprovedRequired~~: defaults to~~ false~~; when true, require that there must be at least~~
    ~~one~~ ~~approved pull request review~~
    ~~present for merge~~
  - merge_method~~: defaults to~~ merge~~; when~~ squash or rebase~~, use that merge method instead~~
    ~~when clicking a PR's merge button~~
- ~~merges PR's once they meet the appropriate criteria as configured above~~
- ~~if there are any presubmit prow jobs for the repo the PR is against, they will be re-run one~~
  ~~final time just prior to merge~~
~~plugin: assign~~
- ~~assigns GitHub users in response to~~ /assign ~~comments on a PR~~
- ~~unassigns GitHub users in response to~~ /unassign ~~comments on a PR~~
~~plugin: approve~~
- ~~per-repo configuration:~~
  - issue_required~~: defaults to~~ false~~; when~~ true~~, require that the PR description link to~~
    ~~an issue, or that at least one~~ ~~approver~~ ~~issues a~~ /approve no-issue
  - implicit_self_approve~~: defaults to~~ false~~; when~~ true~~, if the PR author is in relevant~~
    ~~OWNERS files, act as if they have implicitly~~ /approve'd
- ~~adds the~~ approved ~~label once an~~ ~~approver~~ ~~for each of the required~~
  ~~OWNERS files has~~ /approve'd
- ~~comments as required OWNERS files are satisfied~~
- ~~removes outdated approval status comments~~
~~plugin: blunderbuss~~
- ~~determines~~ ~~reviewers~~ ~~and requests their reviews on PR's~~
~~plugin: lgtm~~
- ~~adds the~~ lgtm ~~label when a~~ ~~reviewer~~ ~~comments~~ /lgtm ~~on a PR~~
- ~~the~~ ~~PR author~~ ~~may not~~ /lgtm ~~their own PR~~
~~pkg: k8s.io/test-infra/prow/repoowners~~
- ~~parses OWNERS and OWNERS_ALIAS files~~
- ~~if the~~ no_parent_owners ~~option is encountered, parent owners are excluded from having~~
  ~~any influence over files adjacent to or underneath of the current OWNERS file~~

‌

Maintaining OWNERS files

‌OWNERS files should be regularly maintained.‌

We encourage people to self-nominate or self-remove from OWNERS files via PR's. Ideally in the future we could use metrics-driven automation to assist in this process.‌

We should strive to:‌

grow the number of OWNERS files
add new people to OWNERS files
ensure OWNERS files only contain org members and repo collaborators
ensure OWNERS files only contain people are actively contributing to or reviewing the code they own
remove inactive people from OWNERS files

‌

Bad examples of OWNERS usage:‌

directories that lack OWNERS files, resulting in too many hitting root OWNERS
OWNERS files that have a single person as both approver and reviewer
OWNERS files that haven't been touched in over 6 months
OWNERS files that have non-collaborators present

‌Good examples of OWNERS usage:‌

there are more reviewers than approvers
the approvers are not i