Polyaxon chart is a Helm chart for creating reproducible and maintainable deployments of Polyaxon with Kubernetes.
Introduction
This chart bootstraps a Polyaxon deployment on a Kubernetes cluster using the Helm package manager.
It also packages some required dependencies for Polyaxon:
Note: It’s possible to provide your own managed version of each one for these dependencies.
This chart can be installed on a single node or multi-nodes cluster,
in which case you need to provide some volumes with ReadWriteMany
or cloud buckets.
An nfs provisioner can be used in cases where you want to try the platform on a multi-nodes cluster
without the need to create your own volumes. Please see polyaxon-nfs-provisioner
Tip: The full list of the default values.yaml
Prerequisites
- Kubernetes >= 1.5.0
- helm >= v2.5.0
Add polyaxon charts
helm repo add polyaxon https://charts.polyaxon.com
helm repo update
Installing the Chart
To install the chart with the release name <RELEASE_NAME>
:
helm install --name=<RELEASE_NAME> --namespace=<NAMESPACE> polyaxon/polyaxon
Please do not use the --wait
flag, otherwise the deployment will not succeed.
The command deploys Polyaxon on the Kubernetes cluster in the default configuration.
The configuration section lists the parameters that can be configured during installation.
Tip: List all releases using
helm list
Uninstalling the Chart
To uninstall/delete the <RELEASE_NAME>
deployment:
helm delete <RELEASE_NAME>
or with --purge
flag
helm delete <RELEASE_NAME> --purge
The command removes all the Kubernetes components associated with the chart and deletes the release.
Warning: Jobs are only deleted if they succeeded, sometimes if you cancel a deployment you might end up with undeleted jobs.
kubectl delete job ...
Note: You can delete the chart and skip the cleaning the hooks
helm del --purge <RELEASE_NAME> --no-hooks
This can be particularly useful if your deployment is not working, because the hooks will most probably fail.
How to set the configuration
Specify each parameter using the --set key=value[,key=value]
argument to helm install
. For example:
helm install --name=<RELEASE_NAME> \
--namespace=<NAMESPACE>\
--set persistence.enabled=false,email.host=email \
polyaxon
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example:
helm install --name my-release -f values.yaml polyaxon
DeploymentType
Parameter | Description | Default |
---|---|---|
DeploymentType |
The deployment type tells polyaxon how to show instructions (‘kubernetes’, ‘minikube’, ‘microk8s’, ‘docker-compose’) | kubernetes |
deploymentVersion
Parameter | Description | Default |
---|---|---|
deploymentVersion |
The deployment version to use, this is important if you are using polyaxon-cli to avoid accidentally deploying/upgrading to a version without noticing | latest |
Namespace
Parameter | Description | Default |
---|---|---|
namespace |
The namespace that will be used by Polyaxon to create operations and communicate with other services | polyaxon |
Ingress, RBAC, and API service
This chart provides support for an Ingress resource with a custom ingress controller polyaxon-ingress
.
You can also provide different annotations for the ingress and it will not use polyaxon-ingress
class. (ingress.annotations.kubernetes.io/ingress.class
)
Parameter | Description | Default |
---|---|---|
rbac.enabled |
Use Kubernetes role-based access control (RBAC) | true |
ingress.enabled |
Use Kubernetes ingress | true |
ingress.path |
Kubernetes ingress path | / |
ingress.hostName |
Kubernetes ingress hostName | “ |
ingress.annotations |
Ingress annotations | {} |
ingress.tls |
Use Ingress TLS | [] |
api.service.annotations |
API Service annotations | {} |
Note: using TLS requires either:
- a preconfigured secret with the TLS secrets in it
- or the user of cert-manager to auto request certs from let’s encrypt and store them in a secret.
It’s also possible to use a service like externalDNS to auto create the DNS entry for the polyaxon API service.
Securing API server with TLS
If you have your own certificate you can make a new secret with the tls.crt
and the tls.key
,
then set the secret name in the values file.
Automating TLS certificate creation and DNS setup
To automate the creation and registration of new domain name you can use the following services:
- cert-manager
- externalDNS (Route53 / Google CloudDNS)
once installed, you can set the values for ingress.tls
:
ingress:
enabled: true
hostName: polyaxon.acme.com
tls:
- secretName: polyaxon.acme-tls
hosts:
- polyaxon.acme.com
TLS can have more than one host.
In order to get the domain registration to work you need to set the value of api.service.annotations
to the annotation needed for your domain:
i.e
annotations:
domainName: polyaxon.my.domain.com
SSL
Parameter | Description | Default |
---|---|---|
ssl |
To set ssl and serve https with Polyaxon deployed with NodePort service | {} |
NGINX acts as a reverse proxy for Polyaxon’s front-end server, meaning NGINX proxies external HTTP (and HTTPS) requests to the Polyaxon API.
NGINX ingress
The recommended way to use Https in Polyaxon on Kubernetes is by setting an ingress-nginx for the Polyaxon Cluster running on Kubernetes.
Polyaxon’s helm chart comes with an ingress resource that you can use with an ingress controller where you should use TLS so that all traffic will be served over HTTPS.
-
Create a TLS secret that contains your TLS certificate and private key.
kubectl create secret tls polyaxon-tls --key $PATH_TO_KEY --cert $PATH_TO_CERT
-
Add the tls configuration to Polyaxon’s Ingress values. (Do not use CluserIP on GKE)
serviceType: ClusterIP ingress: enabled: true hostName: polyaxon.acme.com tls: - secretName: polyaxon.acme-tls hosts: - polyaxon.acme.com
For more information visit the Nginx Ingress Integration
NGINX for NodePort service
To enable ssl for Polyaxon API running with NodePort service on Kubernetes, you need to provide an ssl certificate and SSL certificate key.
you can provide a self-signed certificate or a browser trusted certificate.
-
Create a secret for your certificate:
kubectl create -n polyaxon secret generic polyaxon-cert --from-file=/path/to/certs/polyaxon.com.crt --from-file=/path/to/certs/polyaxon.com.key
-
Make sure to update your deployment config with reference to the certificate
ssl: enabled: true secretName: 'polyaxon-cert'
-
Set the service type to
NodePort
and update the API’s service port to 443.
N.B. By default, Polyaxon mounts the ssl certificate and key to /etc/ssl
, this value can be updated using the .Values.ssl.path
.
CLI setup
If you are serving Polyaxon on HTTPS, you should be aware that CLI need to have a different config:
polyaxon config set --host=IP/Host [--verify_ssl]
dns
Parameter | Description | Default |
---|---|---|
dns |
DNS configuration for cluster running with custom dns setup | {} |
Several users deploy Polyaxon on Kubernetes cluster created with kubespray, and by default Kubespray creates Kubernetes with CoreDNS as a cluster DNS server.
Update DNS backend
Although we could provide logic to detect the DNS used in the cluster, this would require cluster wide RBAC that we think it’s unnecessary. The default DNS backend used by Polyaxon is KubeDNS, to set it to a different DNS, you can provide this value in your Polyaxon’s deployment config:
dns:
backend: "coredns"
Update DNS prefix to different system
Since the DNS service is generally deployed on kube-system
namespace, the default DNS prefix is kube-dns.kube-system
or coredns.kube-system
if you update the previous option.
You can also provide the complete DNS prefix, and not use the DNS backend options:
dns:
prefix: "kube-dns.other-kube-system"
Update the DNS prefix for OpenShift
OpenShift has a different DNS configuration, the default prefix is:
dns:
prefix: "dns-default.openshift-dns"
Update DNS cluster
The default dns cluster used in Polyaxon to resolve routes is cluster.local
, you can provide a Custom Cluster DNS, by setting:
dns:
customCluster: "custom.cluster.name"
Time zone
To set a different time zone for application (convenient for the dashboard and admin interface) you can can provide a valid time zone value
Parameter | Description | Default |
---|---|---|
timezone |
The timezone to use | UTC |
Root user
The default superuser/root user for polyaxon.
A default password is provided, rootpassword
.
You can also remove the value and Polyaxon will generate a random password that you can retrieve later.
Parameter | Description | Default |
---|---|---|
user.username |
Default superuser username | root |
user.email |
Default superuser email | [email protected] |
user.password |
Default superuser password (could be null) | rootpassword |
Nvidia GPU
In order to use GPU for your experiments you need to have nodes with GPU and you need to expose the NVidia paths to your jobs.
This is how you specify where your NVidia library is on your host
Parameter | Description | Default |
---|---|---|
dirs.nvidia.lib |
The nvidia lib:e .g “/usr/lib/nvidia-384” | “ |
dirs.nvidia.bin |
The nvidia bin:e .g “/usr/lib/nvidia-384/bin” | “ |
dirs.nvidia.libcuda |
The nvidia libcuda.so.1:e .g “/usr/lib/x86_64-linux-gnu/libcuda.so.1” | “ |
This where you want to mount these libraries on the pods, by default Polyaxon will same values from dirs if not provided.
Parameter | Description | Default |
---|---|---|
mountPaths.nvidia.lib |
The nvidia lib:e .g “/usr/lib/nvidia-384” | dirs.nvidia.lib |
mountPaths.nvidia.bin |
The nvidia bin:e .g “/usr/lib/nvidia-384/bin” | dirs.nvidia.bin |
mountPaths.nvidia.libcuda |
The nvidia libcuda.so.1:e .g “/usr/lib/x86_64-linux-gnu/libcuda.so.1” | dirs.nvidia.libcuda |
Persistence
Polyaxon provides options to enable or disable persistence, or connect existing claims. You can set the options for setting up the persistence for Polyaxon, or enable the nfs provisioner to provision the volumes, or some of them.
Tip: If you are using a multi node cluster and need to have
ReadWriteMany
volumes for trying out the platform, you can use the nfs provisioner provided by the platform. See later Persistence with nfs
For logs
and repos
Polyaxon by default uses the host node, in many cases this is a sufficient default,
in other cases where Polyaxon is deployed on a multi-nodes and is replicated,
the usage of ReadWriteMany
PVC is recommended to have a stable deployment.
logs: logs generated by experiments/jobs/builds.
If you don’t provide an outputs claim to use, Polyaxon will use the host.
Parameter | Description | Default |
---|---|---|
persistence.logs.existingClaim |
Name of an existing PVC | “ |
persistence.logs.mountPath |
Path where to mount the volume | /polyaxon-logs |
persistence.logs.hostPath |
The directory from the host node’s | /tmp/logs |
repos: code used for creating builds, training experiments, running jobs.
If you don’t provide an outputs claim to use, Polyaxon will use the host.
Parameter | Description | Default |
---|---|---|
persistence.repos.existingClaim |
Name of an existing PVC | “ |
persistence.repos.mountPath |
Path where to mount the volume | /polyaxon-repos |
persistence.repos.hostPath |
The directory from the host node’s | /tmp/repos |
upload: temporary volume where Polyaxon uploads data, code, files, …
If you don’t provide an outputs claim to use, Polyaxon will use the host. It is not very important to have a volume claim for this if your host node has sufficient storage.
Parameter | Description | Default |
---|---|---|
persistence.upload.existingClaim |
Name of an existing PVC | “ |
persistence.upload.mountPath |
Path where to mount the volume | /polyaxon-upload |
persistence.upload.hostPath |
The directory from the host node’s | /tmp/upload |
data: data used for training experiments.
You can mount multiple claims and host paths for data. This should be a dictionary mapping volume names to the respective volumes.
Every definition should follow the structure:
Parameter | Description | Default |
---|---|---|
persistence.data.dataName.existingClaim |
Name of an existing PVC | “ |
persistence.data.dataName.mountPath |
Path where to mount the volume | “ |
persistence.data.dataName.hostPath |
The directory from the host node’s | “ |
persistence.data.dataName.readOnly |
Whether to mount as read-only | |
persistence.data.dataName.store |
To mount a cloud storage (s3, gcs, azure) | |
persistence.data.dataName.bucket |
The bucket to mount | |
persistence.data.dataName.secret |
The secret containing access to the bucket | |
persistence.data.dataName.secretKey |
The key name to get the value from the secret |
The default value based is on a path in the host node:
persistence:
data:
default-data:
mountPath: "/data"
hostPath: "/data"
Example of different data persistence definition:
persistence:
data:
data1:
mountPath: "/data/1"
hostPath: "/path/to/data"
readOnly: true
data2:
mountPath: "/data/2"
existingClaim: "data-2-pvc"
data-foo:
mountPath: "/data/foo"
existingClaim: "data-foo-pvc"
data-gcs3:
store: gcs
bucket: gs://data-bucket
secret: secret-name
secretKey: secret-key
data-s3:
store: s3
bucket: s3://data-bucket
secret: secret-name
secretKey: secret-key
data-azure:
store: azure
bucket: wasbs://data-[email protected]/
secret: secret-name
secretKey: secret-key
outputs: outputs generated from experiments and jobs.
You can mount multiple claims and host paths for outputs. This should be a dictionary mapping volume names to the respective volumes.
Every definition should follow the structure:
Parameter | Description | Default |
---|---|---|
persistence.outputs.outputsName.existingClaim |
Name of an existing PVC | “ |
persistence.outputs.outputsName.mountPath |
Path where to mount the volume | “ |
persistence.outputs.outputsName.hostPath |
The directory from the host node’s | “ |
persistence.outputs.outputsName.readOnly |
Whether to mount as read-only | |
persistence.outputs.dataName.store |
To mount a cloud storage (s3, gcs, azure) | |
persistence.outputs.dataName.bucket |
The bucket to mount | |
persistence.outputs.dataName.secret |
The secret containing access to the bucket | |
persistence.outputs.dataName.secretKey |
The key name to get the value from the secret |
The default value based is on a path in the host node:
persistence:
outputs:
default-outputs:
mountPath: "/outputs"
hostPath: "/outputs"
N.B. Multi-outputs is not supported in CE version
Example of multi-outputs persistence definition with:
persistence:
outputs:
outputs1:
mountPath: "/outputs/1"
hostPath: "/path/to/outputs"
readOnly: true
outputs2:
mountPath: "/outputs/2"
existingClaim: "outputs-2-pvc"
outputs-foo:
mountPath: "/outputs/foo"
existingClaim: "outputs-foo-pvc"
outputs-gcs3:
store: gcs
bucket: gs://outputs-bucket
secret: secret-name
secretKey: secret-key
outputs-s3:
store: s3
bucket: s3://outputs-bucket
secret: secret-name
secretKey: secret-key
outputs-azure:
store: azure
bucket: wasbs://outputs-[email protected]/
secret: secret-name
secretKey: secret-key
Node and Deployment manipulation
Parameter | Description | Default |
---|---|---|
nodeSelector |
Node selector for core pod assignment | {} |
tolerations |
Tolerations for core pod assignment | [] |
affinity |
Affinity for core | Please check the values |
Dependent charts can also have values overwritten. Preface values with
postgresql.*
redis.*
rabbitmq.*
Polyaxon provides a list of options to select which nodes should be used for the core platform, for the dependencies, and for the runs.
Node Selectors
Polyaxon comes with a couple of node selectors options to assign pods to nodes for polyaxon’s core platform
Additionally every dependency in the helm package, exposes a node selector option.
By providing these values, or some of them, you can constrain the pods belonging to that category to only run on particular nodes or to prefer to run on particular nodes.
nodeSelector:
...
postgresql:
nodeSelector:
...
redis:
master:
nodeSelector:
...
slave:
nodeSelector:
...
rabbitmq:
nodeSelector:
...
Tolerations
If one or more taints are applied to a node, and you want to make sure some pods should not deploy on it, Polyaxon provides tolerations option for the core platform, as well as for all dependencies, e.i. database, broker, expose their own tolerations option.
tolerations:
...
postgresql:
tolerations:
...
redis:
master:
tolerations:
...
slave:
tolerations:
...
rabbitmq:
tolerations:
...
Affinity
It allows you to constrain which nodes your pod is eligible to schedule on, based on the node’s labels.
Polyaxon has a default Affinity
values for its core components to ensure that they deploy on the same node.
Polyaxon’s default affinity:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: type
operator: In
values:
- "polyaxon-core"
topologyKey: "kubernetes.io/hostname"
You can update your config deployment file to set affinity for each dependency:
affinity:
...
postgresql:
affinity:
...
redis:
master:
affinity:
...
slave:
affinity:
...
rabbitmq:
affinity:
...
Resources discovery
Parameter | Description | Default |
---|---|---|
resourcesDaemon.enabled |
resourcesDaemon enabled | true |
resourcesDaemon.tolerations |
Tolerations for resourcesDaemon pod assignment | [] |
IPs/Hosts White list
In order to restrict IP addresses and hosts that can communicate with the API
allowedHosts:
- 127.0.0.1
- 159.203.150.212
- .mysite.com # (Will consume every subdomain of mysite.com)
API Host
In order to receive email and notifcation with a clickable link to the objects on the platform
hostName: 159.203.150.212
Or
hostName: polyaxon.foo.com
Admin view
Polyaxon ships with an admin interface:
ui:
adminEnabled: true
Security context
Parameter | Description | Default |
---|---|---|
enabled |
enable security context | false |
runAsUser |
security context UID | 2222 |
runAsGroup |
security context GID | 2222 |
fsGroup |
security context FS | 2222 |
allowPrivilegeEscalation |
Controls whether a process can gain more privileges than its parent process | false |
runAsNonRoot |
Indicates that the container must run as a non-root user | true |
fsGroupChangePolicy |
defines behavior of changing ownership and permission of the volume before being exposed inside Pod |
Polyaxon runs all containers as root by default, this configuration is often fine for several deployment, however, in some use cases it can expose a compliance issue for some teams.
Polyaxon provides a simple way to enable a security context for all core components.
Default configuration:
securityContext:
enabled: false
runAsUser: 2222
runAsGroup: 2222
fsGroup: 2222
allowPrivilegeEscalation: false
runAsNonRoot: true
fsGroupChangePolicy:
Enable security context
securityContext:
enabled: true
Or enable with custom UID/GID other than 2222/2222:
securityContext:
enabled: true
user: 1111
group: 1111
Define behavior of changing ownership and permission of the volume before being exposed inside Pod:
securityContext:
enabled: true
fsGroupChangePolicy: OnRootMismatch
This will enable a security context to run all containers using a UID/GID == 1111/1111.
N.B. If you are using a host path or a volume for the artifacts store, make sure to allow the UID/GID to access it.
Port forwarding
You can use port forwarding to access the API and dashboard on localhost:
kubectl port-forward svc/polyaxon-polyaxon-api 31811:80 31812:1337 -n polyaxon
Upgrade Polyaxon
To upgrade Polyaxon to a newer version, you can simply run:
helm repo update
helm upgrade polyaxon polyaxon/polyaxon -f config.yaml