Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

pyyaml-include

An extending constructor of PyYAML: include other YAML files into current YAML document

Rank: #2221Downloads: 3,085,685 (30 days)Stars: 90Forks: 23

Description

pyyaml-include

GitHub tag Python Package Documentation Status PyPI Quality Gate Status

An extending constructor of [PyYAML][]: include other [YAML][] files into current [YAML][] document.

In version 2.0, [fsspec][] was introduced. With it, we can even include files by HTTP, SFTP, S3 ...

⚠️ Warning
“pyyaml-include” 2.0 is NOT compatible with 1.0

Install

pip install "pyyaml-include"

Because [fsspec][] was introduced to open the including files since v2.0, an installation can be performed like below, if want to open remote files:

  • for files on website:

    pip install "pyyaml-include" fsspec[http]
    
  • for files on S3:

    pip install "pyyaml-include" fsspec[s3]
    
  • see [fsspec][]'s documentation for more

🔖 Tip
“pyyaml-include” depends on [fsspec][], it will be installed no matter including local or remote files.

Basic usages

Consider we have such [YAML][] files:

├── 0.yml
└── include.d
    ├── 1.yml
    └── 2.yml
  • 1.yml 's content:

    name: "1"
    
  • 2.yml 's content:

    name: "2"
    

To include 1.yml, 2.yml in 0.yml, we shall:

  1. Register a yaml_include.Constructor to [PyYAML][]'s loader class, with !inc(or any other tags start with ! character) as it's tag:

    import yaml
    import yaml_include
    
    # add the tag
    yaml.add_constructor("!inc", yaml_include.Constructor(base_dir='/your/conf/dir'))
    
  2. Use !inc tag(s) in 0.yaml:

    file1: !inc include.d/1.yml
    file2: !inc include.d/2.yml
    
  3. Load 0.yaml in your Python program

    with open('0.yml') as f:
       data = yaml.full_load(f)
    print(data)
    

    we'll get:

    {'file1': {'name': '1'}, 'file2': {'name': '2'}}
    
  4. (optional) the constructor can be unregistered:

    del yaml.Loader.yaml_constructors["!inc"]
    del yaml.UnSafeLoader.yaml_constructors["!inc"]
    del yaml.FullLoader.yaml_constructors["!inc"]
    

Include in Mapping

If 0.yml was:

file1: !inc include.d/1.yml
file2: !inc include.d/2.yml

We'll get:

file1:
  name: "1"
file2:
  name: "2"

Include in Sequence

If 0.yml was:

files:
  - !inc include.d/1.yml
  - !inc include.d/2.yml

We'll get:

files:
  - name: "1"
  - name: "2"

Advanced usages

Wildcards

File name can contain shell-style wildcards. Data loaded from the file(s) found by wildcards will be set in a sequence.

That is, a list will be returned when including file name contains wildcards. Length of the returned list equals number of matched files:

If 0.yml was:

files: !inc include.d/*.yml

We'll get:

files:
  - name: "1"
  - name: "2"
  • when only 1 file matched, length of list will be 1
  • when there are no files matched, an empty list will be returned

We support **, ? and [..]. We do not support ^ for pattern negation. The maxdepth option is applied on the first ** found in the path.

Important

  • Using the ** pattern in large directory trees or remote file system (S3, HTTP ...) may consume an inordinate amount of time.
  • There is no method like lazy-load or iteration, all data of found files returned to the YAML doc-tree are fully loaded in memory, large amount of memory may be needed if there were many or big files.

Work with fsspec

In v2.0, we use [fsspec][] to open including files, thus we can include files from many different sources, such as local file system, S3, HTTP, SFTP ...

For example, we can include a file from website in YAML:

conf:
  logging: !inc http://domain/etc/app/conf.d/logging.yml

In such situations, when creating a Constructor constructor, a [fsspec][] filesystem object shall be set to fs argument.

For example, if want to include files from website, we shall:

  1. create a Constructor with a [fsspec][] HTTP filesystem object as it's fs:

    import yaml
    import fsspec
    import yaml_include
    
    http_fs = fsspec.filesystem("http", client_kwargs={"base_url": f"http://{HOST}:{PORT}"})
    
    ctor = yaml_include.Constructor(fs=http_fs, base_dir="/foo/baz")
    yaml.add_constructor("!inc", ctor, yaml.Loader)
    
  2. then, write a [YAML][] document to include files from http://${HOST}:${PORT}:

    key1: !inc doc1.yml    # relative path to "base_dir"
    key2: !inc ./doc2.yml  # relative path to "base_dir" also
    key3: !inc /doc3.yml   # absolute path, "base_dir" does not affect
    key3: !inc ../doc4.yml # relative path one level upper to "base_dir"
    
  3. load it with [PyYAML][]:

    yaml.load(yaml_string, yaml.Loader)
    

Above [YAML][] snippet will be loaded like:

  • key1: pared YAML of http://${HOST}:${PORT}/foo/baz/doc1.yml
  • key2: pared YAML of http://${HOST}:${PORT}/foo/baz/doc2.yml
  • key3: pared YAML of http://${HOST}:${PORT}/doc3.yml
  • key4: pared YAML of http://${HOST}:${PORT}/foo/doc4.yml

🔖 Tip
Check [fsspec][]'s documentation for more


ℹ️ Note
If fs argument is omitted, a "file"/"local" [fsspec][] filesystem object will be used automatically. That is to say:

data: !inc: foo/baz.yaml

is equivalent to (if no base_dir was set in Constructor()):

data: !inc: file://foo/baz.yaml

and

yaml.add_constructor("!inc", Constructor())

is equivalent to:

yaml.add_constructor("!inc", Constructor(fs=fsspec.filesystem("file")))

Parameters in YAML

As a callable object, Constructor passes YAML tag parameters to [fsspec][] for more detailed operations.

The first argument is urlpath, it's fixed and must-required, either positional or named. Normally, we put it as a string after the tag(eg: !inc), just like examples above.

However, there are more parameters.

  • in a sequence way, parameters will be passed to python as positional arguments, like *args in python function. eg:

    files: !inc [include.d/**/*.yaml, {maxdepth: 1}, {encoding: utf16}]
    
  • in a mapping way, parameters will be passed to python as named arguments, like **kwargs in python function. eg:

    files: !inc {urlpath: /foo/baz.yaml, encoding: utf16}
    

But the format of parameters has multiple cases, and differs variably in different [fsspec][] implementation backends.

  • If a scheme/protocol(“http://”, “sftp://”, “file://”, etc.) is defined, and there is no wildcard in urlpath, Constructor will invoke fsspec.open directly to open it. Which means Constructor's fs will be ignored, and a new standalone fs will be created implicitly.

    In this situation, urlpath will be passed to fsspec.open's first argument, and all other parameters will also be passed to the function.

    For example,

    • the [YAML][] snippet

      files: !inc [file:///foo/baz.yaml, r]
      

      will cause python code like

      with fsspec.open("file:///foo/baz.yaml", "r") as f:
          yaml.load(f, Loader)
      
    • and the [YAML][] snippet

      files: !inc {urlpath: file:///foo/baz.yaml, encoding: utf16}
      

      will cause python code like

      with fsspec.open("file:///foo/baz.yaml", encoding="utf16") as f:
          yaml.load(f, Loader)
      
  • If urlpath has wildcard, and also scheme in it, Constructor will:

    Invoke [fsspec][]'s open_files function to search, open and load files, and return the results in a list. [YAML][] include statement's parameters are passed to open_files function.

  • If urlpath has wildcard, and no scheme in it, Constructor will:

    1. invoke corresponding [fsspec][] implementation backend's glob method to search files,
    2. then call open method to open each found file(s).

    urlpath will be passed as the first argument to both glob and open method of the corresponding [fsspec][] implementation backend, and other parameters will also be passed to glob and open method as their following arguments.

    In the case of wildcards, what need to pay special attention to is that there are two separated parameters after urlpath, the first is for glob method, and the second is for open method. Each of them could be either sequence, mapping or scalar, corresponds single, positional and named argument(s) in python. For example:

    • If we want to include every .yml file in directory etc/app recursively with max depth at 2, and open them in utf-16 codec, we shall write the [YAML][] as below:

      files: !inc ["etc/app/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf16}]
      

      it will cause python code like:

      for file in local_fs.glob("etc/app/**/*.yml", maxdepth=2):
          with local_fs.open(file, encoding="utf16") as f:
              yaml.load(f, Loader)
      
    • Since maxdepth is the seconde