Variables ========= There are a number of variables which can be placed in a merlin input .yaml file that can control workflow execution, such as via string expansion and control flow. .. note:: Only user variables and ``OUTPUT_PATH`` may be reassigned or overridden from the command line. Directory structure context --------------------------- The directory structure of merlin output looks like this: .. code:: SPECROOT ... OUTPUT_PATH MERLIN_WORKSPACE MERLIN_INFO .orig.yaml .partial.yaml .expanded.yaml .workspace WORKSPACE Reserved variables ------------------ .. list-table:: Study variables that Merlin uses. May be referenced within a specification file, but not reassigned or overridden. :widths: 25 50 25 :header-rows: 1 * - Variable - Description - Example Expansion * - ``$(SPECROOT)`` - Directory path of the specification file. - :: /globalfs/user/merlin_workflows * - ``$(OUTPUT_PATH)`` - Directory path the study output will be written to. If not defined will default to the current working directory. May be reassigned or overridden. - :: ./studies * - ``$(MERLIN_TIMESTAMP)`` - The time a study began. May be used as a unique identifier. - :: "YYYYMMDD-HHMMSS" * - ``$(MERLIN_WORKSPACE)`` - Output directory generated by a study at ``OUTPUT_PATH``. Ends with ``MERLIN_TIMESTAMP``. - :: $(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP) * - ``$(WORKSPACE)`` - The workspace directory for a single step. - :: $(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP)/step_name/`` * - ``$(MERLIN_INFO)`` - Directory within ``MERLIN_WORKSPACE`` that holds the provenance specs and sample generation results. Commonly used to hold ``samples.npy``. - :: $(MERLIN_WORKSPACE)/merlin_info/ * - ``$(MERLIN_SAMPLE_ID)`` - Sample index in an ensemble - :: 0 1 2 3 * - ``$(MERLIN_SAMPLE_PATH)`` - Path in the sample directory tree to a sample's directory, i.e. where the task is actually run. - :: /0/0/0/ /0/0/1/ /0/0/2/ /0/0/3/ * - ``$(MERLIN_GLOB_PATH)`` - All of the directories in a simulation tree as a glob (*) string - :: /*/*/*/* * - ``$(MERLIN_PATHS_ALL)`` - A space delimited string of all of the paths; can be used as is in bash for loop for instance with: .. code-block:: bash for path in $(MERLIN_PATHS_ALL) do ls $path done - :: 0/0/0 0/0/1 0/0/2 0/0/3 * - ``$(MERLIN_SAMPLE_VECTOR)`` - Vector of merlin sample values - :: $(SAMPLE_COLUMN_1) $(SAMPLE_COLUMN_2) ... * - ``$(MERLIN_SAMPLE_NAMES)`` - Names of merlin sample values - :: SAMPLE_COLUMN_1 SAMPLE_COLUMN_2 ... * - ``$(MERLIN_SPEC_ORIGINAL_TEMPLATE)`` - Copy of original yaml file passed to ``merlin run``. - :: $(MERLIN_INFO)/*.orig.yaml * - ``$(MERLIN_SPEC_EXECUTED_RUN)`` - Parsed and processed yaml file with command-line variable substitutions included. - :: $(MERLIN_INFO)/*.partial.yaml * - ``$(MERLIN_SPEC_ARCHIVED_COPY)`` - Archive version of ``MERLIN_SPEC_EXECUTED_RUN`` with all variables and paths fully resolved. - :: $(MERLIN_INFO)/*.expanded.yaml User variables ------------------- Variables defined by a specification file in the ``env`` section, as in this example: .. code-block:: yaml env: variables: ID: 42 EXAMPLE_VAR: hello As long as they're defined in order, you can nest user variables like this: .. code-block:: yaml env: variables: EXAMPLE_VAR: hello WORKER_NAME: $(EXAMPLE_VAR)_worker Like all other Merlin variables, user variables may be used anywhere (as a yaml key or value) within a specification as below: .. code-block:: yaml cmd: echo "$(EXAMPLE_VAR), world!" ... $(WORKER_NAME): args: ... If you want to programmatically define the study name, you can include variables in the ``description.name`` field as long as it makes a valid filename: .. code-block:: yaml description: name: my_$(EXAMPLE_VAR)_study_$(ID) description: example of programmatic study name The above would produce a study called ``my_hello_study_42``. Environment variables --------------------- Merlin expands Unix environment variables for you. The values of the user variables below would be expanded: .. code-block:: yaml env: variables: MY_HOME: ~/ MY_PATH: $PATH USERNAME: ${USER} However, Merlin leaves environment variables found in shell scripts (think ``cmd`` and ``restart``) alone. So this step: .. code-block:: yaml - name: step1 description: an example run: cmd: echo $PATH ; echo $(MY_PATH) ...would be expanded as: .. code-block:: yaml - name: step1 description: an example run: cmd: echo $PATH ; echo /an/example/:/path/string/ Step return variables ----------------------------------- .. list-table:: Special return code variables for task steps. :widths: 25 50 25 :header-rows: 1 * - Variable - Description - Example Usage * - ``$(MERLIN_SUCCESS)`` - This step was successful. Keep going to the next task. Default step behavior if no exit code given. - :: echo "hello, world!" exit $(MERLIN_SUCCESS) * - ``$(MERLIN_RESTART)`` - Run this step's ``restart`` command, or re-run ``cmd`` if ``restart`` is absent. The default maximum number of retries+restarts for any given step is 30. You can override this by adding a ``max_retries`` field under the run field in the specification. Issues a warning. Default will retry in 1 second. To override the delay time, specify ``retry_delay``. - :: run: cmd: | touch my_file.txt echo "hi mom!" >> my_file.txt exit $(MERLIN_RESTART) restart: | echo "bye, mom!" >> my_file.txt max_retries: 23 retry_delay: 10 * - ``$(MERLIN_RETRY)`` - Retry this step's ``cmd`` command. The default maximum number of retries for any given step is 30. You can override this by adding a ``max_retries`` field under the run field in the specification. Issues a warning. Default will retry in 1 second. To override the delay time, specify retry_delay. - :: run: cmd: | touch my_file.txt echo "hi mom!" >> my_file.txt exit $(MERLIN_RETRY) max_retries: 23 retry_delay: 10 * - ``$(MERLIN_SOFT_FAIL)`` - Mark this step as a failure, note in the warning log but keep going. Unknown return codes get translated to soft fails, so that they can be logged. - :: echo "Uh-oh, this sample didn't work" exit $(MERLIN_SOFT_FAIL) * - ``$(MERLIN_HARD_FAIL)`` - Something went terribly wrong and I need to stop the whole workflow. Raises a ``HardFailException`` and stops all workers connected to that step. Workers will stop after a 60 second delay to allow the step to be acknowledged by the server. .. note:: Workers in isolated parts of the workflow not consuming from the bad step will continue. You can stop all workers with ``$(MERLIN_STOP_WORKERS)``. - :: echo "Oh no, we've created skynet! Abort!" exit $(MERLIN_HARD_FAIL) * - ``$(MERLIN_STOP_WORKERS)`` - Launch a task to stop all active workers. To allow the current task to finish and acknowledge the results to the server, will happen in 60 seconds. - :: # send a signal to all workers to stop exit $(MERLIN_STOP_WORKERS)