Proper Usage of Multiline YAML Strings in GitLab CI: From Misconceptions to Practice

Keywords: GitLab CI | YAML | multiline strings

Abstract: This article delves into common issues and solutions for using multiline YAML strings in GitLab CI's .gitlab-ci.yml files. By analyzing the nature of YAML scalars, it explains why traditional multiline string syntax leads to parsing errors and details two effective approaches: multiline plain scalars and folded scalars. The discussion covers YAML parsing rules, GitLab CI limitations, and practical considerations to help developers write clearer and more maintainable CI configurations.

When configuring GitLab CI/CD pipelines, developers often need to write complex script commands in the .gitlab-ci.yml file. To improve readability, many attempt to use multiline strings for these commands, but frequently encounter parsing errors. For example, the following configuration seems reasonable but actually causes issues:

script:
  - |
    echo -e "
        echo 'hi';
        echo 'bye';
    "

When the GitLab Runner tries to execute this, it may only recognize echo -e " instead of the full command. The root cause of this problem lies in a misunderstanding of YAML scalars.

The Nature of YAML Scalars

First, it's important to clarify a common misconception: there is no such thing as a "multiline string" in YAML. YAML uses scalars to represent data, which can be loaded as strings, integers, floats, etc. In the context of GitLab CI, we focus on scalars that are loaded as strings, since these strings are later interpreted as command-line instructions.

However, GitLab CI does not support true multiline scripts (i.e., commands with embedded newlines). Therefore, our goal is actually to leverage YAML's multiline scalar capabilities to write commands that visually span multiple lines but are loaded as single-line strings, thereby enhancing readability.

Analysis of Incorrect Examples

In the described problem, the developer used a literal scalar (via the | indicator), which causes YAML to preserve newlines in the string. When GitLab CI attempts to execute this, it may not handle these newlines correctly, leading to truncated or failed command parsing. For example:

script:
  - |
    echo -e "
        echo 'hi';
        echo 'bye';
    "

Here, YAML loads a string containing newlines, but GitLab CI expects a single-line command, so it might only execute up to the first newline.

Correct Approach: Multiline Plain Scalars

A multiline plain scalar is an unquoted scalar that can be written across multiple lines, but during loading, newlines are replaced by spaces. This allows commands to be readable in the YAML file while becoming single-line strings when executed. For example:

script:
  - echo -e 
     "echo 'hi';
      echo 'bye';"

In this example, echo -e is the initial line, and subsequent lines must be indented more than the initial line (at least one space). The YAML parser replaces newlines with spaces, so the loaded string is: echo -e "echo 'hi'; echo 'bye';". This avoids issues caused by newlines.

Note that multiline plain scalars have some restrictions. For instance, they cannot contain a colon followed by a space (as this would be misinterpreted as a key-value pair). Additionally, since newlines are replaced by spaces, care must be taken if the command itself requires newlines (e.g., for output formatting). For example, the following would introduce extra spaces:

script:
  - echo -e 
     "echo 'hi';
      echo '
     bye';"

This inserts a visible space before bye.

Correct Approach: Folded Scalars

Folded scalars use the > indicator and are similar to multiline plain scalars in that newlines are replaced by spaces during loading. Their syntax is more flexible and not subject to some of the plain scalar restrictions. For example:

script:
  - >
    echo -e 
    "echo 'hi';
    echo 'bye';"

Here, the command lines must be indented at least to the same level as the > indicator. Folded scalars are safer for commands containing special characters (like colons) as they won't cause YAML parsing errors.

Practical Applications and Considerations

In practice, if multiline readability is not needed, single-line commands can be used directly, such as:

script:
  - echo -e "\n    echo 'hi';\n    echo 'bye';\n"

This avoids the complexity of YAML multiline scalars. However, for long commands, multiline scalars can significantly improve maintainability.

Supplementing from other answers, the GitLab Runner version may also affect the handling of multiline commands. For example, Shell Runner 1.11.0 supports the following:

script:
  - az sql server create -n ${variable} -g ${variable} -l ${variable}
    --administrator-login ${variable} --administrator-login-password ${variable}

This actually leverages YAML's multiline plain scalar rules, but version compatibility should be ensured.

Conclusion

When using multiline YAML strings in GitLab CI, the key is to understand the loading behavior of YAML scalars. Avoid literal scalars (|) and instead use multiline plain scalars or folded scalars to ensure commands are loaded as single-line strings. Pay attention to indentation rules and newline replacement to prevent unintended spaces or parsing errors. By correctly applying these concepts, developers can write clear and reliable CI/CD configurations.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.