Developing machine learning models requires several iterations, some of these iteration require:

  • changing the parameters
  • running the experiments on GPU nodes
  • running the experiments in distributed ways

The following sections will give a high level examples of how to create several templates to either update the parameters or the node used for running your experiments. Similar approach can be used to override polyaxonfiles used for jobs, notebooks, tensorboards, ...

Standard and simple polyaxonfiles

The best way to rapidly develop experiments, we recommend creating a standard polyaxonfile with your main command and parameters:

version: 1

kind: experiment

declarations:
  lr: 0.01
  batch_size: 128

build:
  image: tensorflow/tensorflow:1.4.1-py3
  build_steps:
    - pip install -r polyaxon_requirements.txt

run:
  cmd: python3 train.py --batch-size={{ batch_size }} --lr={{ lr }}

And running this file:

polyaxon run -f polyaxonfile.yaml

Overriding the parameters

version: 1

kind: experiment

declarations:
  lr: 0.2
  batch_size: 64

Using this override file:

polyaxon run -f polyaxonfile.yaml -f polyaxonfile_params.yaml

By running this command your experiment will be started using the declarations of the override file.

Overriding the node scheduling

version: 1

kind: experiment

environment:
  node_selector:
    node_label: node_value
  
  resources:
    cpu:
      requests: 2
      limits: 4
    gpu:
      requests: 1
      limits: 1
    memory:
      requests: 512
      limits: 2048

Using this override file:

polyaxon run -f polyaxonfile.yaml -f polyaxonfile_node_scheduling.yaml

By running this command your experiment will use the node selector and request the resources in the environment section.