Give Jinja2 superpowers with Loren

... a POC of an awesome python pattern

Feb 15, 2023

Loren is a github repository, but it is also a proof of concept of an idea. In fact, it is the fortunate blending of two ideas, that turned out to be a really powerful coding(?) pattern.

I find it quite hard to outline the value Loren brings to your coding toolbox, for me it has been an efficiency booster and a boilerplate killer. But given it’s generic nature, it can be so many different things for different people and use cases… and while Loren is written in Python, and useful as a Python package, it also provides a CLI that makes it a nice generalized tool.

A note of caution too. This library is intended to be used to read and generate files, locally or in a controlled environment like a CI/CD pipeline, it can do a few unsafe things, like executing arbitrary .py code from files you pass to it, so be a little careful.

So … what is it?

Loren is short for Load & Render. The Load part addresses it’s ability to easily read and parse files from a file system. The render part is just Jinja2 with a slight extension.

There’s great benefits to both the Load and Render parts, and they can be used individually — however the real magic happens when they are combined.

Load

The Load part of Loren addresses a minor inconvenience in Python, the fact that loading (configuration) files into python objects require a few lines of code.

It also addresses an issue I’ve had with massive configuration files, that would be much easier to work with if they were spread out into different files.

Below is an example of using Loren to load a JSON file into a dictionary:

# old way, single file
import json
with open("some_dir/some_json_file.json") as json_file:
  data = json.load(json_file)

# With Loren
from loren.load import LorenDict
data = LorenDict("some_dir")["some_json_file"]

This super basic example does very little to highlight the value of the LorenDict, which by the way behaves mostly like a python dict object. What Loren does here however is to index the files in “some_dir”, and make them dictionary keys. When accessing one of those keys, it will infer the file type from the extension, and parse it into a suitable python object. In this case, “some_json_file” was simply loaded with json.load .

Note how the file extension is missing when getting the key, that behaviour is optional. If there were several files with the same name and different extension, Loren would’ve loaded all of them, and merged them into one blob.

You can also access files in sub-directories with loren, for instance:

data = LorenDict("some_root_dir")["some_sub_dir"]["some_file"]

This loading of files is lazy by default, e.g. files are read and folders parsed, when keys are accessed. This can be overridden, in which case Loren will do a recursive parsing of all files and folders starting in “some_dir”

Since the LorenDict works like a python dict, it can also be iterated on, e.g.

for file_name, file_contents in LorenDict("some_dir").items():
  ... do something with the file ...

I find this iteration to be easier to work with than alternative ways of walking through a file tree in Python.

Render

I’m not done with “Load”, there’s more to discover. But I figured this was a good point to introduce the render. The true beauty of the LorenDict is that it can be passed to a Jinja2 template file; which allow that template file to load various configurations from disk, and generate whatever content it was designed to generate.

For instance, let’s say you have a really boilerplate heavy-configuration file, that you will need to replicate for a bunch of use-cases. This is the first major use-case I found for Loren, namely generating DAG files for Airflow.

I won’t outline an entire Airflow DAG here, but essentially it is a python file that looks like below:

from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryInsertJobOperator

with DAG('dummy_dag_name', description='bigquery DAG',
          schedule_interval='0 * * * *',
          start_date=datetime(2017, 3, 20), catchup=False) as dag:


  query_a = BigQueryInsertJobOperator(
      task_id="insert_query_job",
      configuration={
          "query": {
              "query": "SELECT abc",
              "useLegacySql": False,
          }
      },
      location="EU",
  )
  
  query_b = BigQueryInsertJobOperator(
      task_id="insert_query_job",
      configuration={
          "query": {
              "query": "SELECT abc2",
              "useLegacySql": False,
          }
      },
      location="EU",
  )

  query_a.set_downstream(query_b)

The code above probably won’t work, but illustrates defining a DAG, which is a group of tasks that we want to run on a schedule. The tasks are then instantiated, and set up to have a dependency where query_a has to run before query_b.

This file can grow insanely large, as Airflow users often have hundreds of DAGs with tens or hundreds of tasks in each. You can also see there’s a lot of repeated defaults.

We could replace above with the below configuration files:

config/dummy_dag_name/query_a.yaml
query: |
  SELECT abc

config/dummy_dag_name/query_b.yaml
upstreams:
  - query_a
query: |
  SELECT abc2

… and pass that to a jinja2 template:

from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryInsertJobOperator


{% for dag_name, tasks in config.items() %}
with DAG('{{ dag_name ))', description='bigquery DAG',
          schedule_interval='0 * * * *',
          start_date=datetime(2017, 3, 20), catchup=False) as dag:

  {% for task_name, task_config in tasks.items() %}
  {{ task_name }} = BigQueryInsertJobOperator(
      task_id="insert_query_job",
      configuration={
          "query": {
              "query": "{{ task_config.query }}",
              "useLegacySql": False,
          }
      },
      location="EU",
  )
 {% endfor %}
  
{% for task_name, task_config in tasks.items() %}
{% for upstream in task_config.get("upstreams", []) %}
  {{ upstream }}.set_downstream( {{ task_name }})
{% endfor %}
{% endfor %}
{% endfor %}

In the templated solution, inserting a new query is just a matter of adding another yaml-file to the config/dummy_dag_name folder. This would allow generating this potentially massive DAG file with thousands of tasks, while keeping the benefit of having small separate config files per task that are easy to manage with e.g. source control . In this example on Github I went even further and generate separate files with SQL queries. There’s no stopping you from extending this template to handle things like cross-DAG task dependencies or custom DAG schedules — and if that last sentence didn’t make sense, it doesn’t matter, it was aiming at Airflow users.

I mentioned Loren also made a slight extension to jinja2. In the above case, it might be nice to split the output into separate files instead of generating one massive. That can be done by moving the outer for-loop like this:

{% for dag_name, tasks in config.items() %}
=>{{ dag_name }}.py
import ...
...
...
{% endfor %}

The => operand tells loren to split the output into separate files.

So to summarize, giving Jinja2 the super-power to easily access files and walk through file-trees is incredibly useful when generating complex config files with a lot of config. But this power can be used for a lot of things! You can see more examples, with corresponding CLI calls here.

Anyway, that was most of what the Render does. Now back to Load and some additional functionality.

Back to Load

So, we just saw that traversing a file-tree as if it was a dict was really powerful in Jinja2, but it is quite convenient elsewhere too.

Some more functionality of the load that I’d like to outline:

.loren.yml

When a loren-dict is initiated, it will look for a .loren.yml dotfile in the supplied root folder. If this file exist, it does two things:

Instruct loren which methods to use to parse which file extensions
Provide a file-ignore section similar to e.g. .gitignore or .dockerignore

You can see an example here. You can generate a .loren.yml file using the CLI:

> python -m loren init --configuration-path .

You can implement your own file handlers and include in the loren configuration, or contribute them back to the source project!

Non-json

Loren will currently load any file, but has special handling for py, json, yaml, csv, tsv and j2. Other files will, by default, be read as text-files. When loading a text-file, a dict is returned with a file_contents key that holds the content.

For files with multiple suffixes, Loren can handle them in order. For instance if you have a .yml.j2 file that is a Jinja2 templated yaml-file, Loren will first render the j2 template, and then parse the resulting yaml into a python object.

Python Files

As you saw in the previous header, Loren will read .py files, and for those it executes the contents and stores a dict with the user namespace. For instance:

some_file.py
import requests

result = requests.get("https://dummyjson.com/products/1").json()

Would be translated to

{
...
"some_file": {
  "result": {
    "id":1,
    "title":"iPhone 9",
     ...
  }
...
}

Note that only the user-namespace is passed, so if you want to expose a function, you will need to assign it to a variable, and if you need external libraries, they will need to be imported inside the function. For instance:

some_file.py
def get_json(url):
    import requests
    return requests.get("https://dummyjson.com/products/1").json()

Would be translated to

{
...
"some_file": {
  "get_json": <function __main__.get_json(url)>
...
}

The above file could give Jinja2 another superpower, that of fetching external resources into the template. This pattern has more or less endless possibilities, but of course, use it with some care.

Some CLI goodies

The CLI does a few nice things apart from just loading and rendering files.

For instance you can tell it to read a large file-structure, and dump it into a single JSON file.

It can also compare that JSON file to a JSON-Schema and hence be used to validate the contents of a set of files. I found this useful in unit tests that validates the user-configurations.

To summarise

Loren allows you to traverse a file system as if the folder structure was a (giant) python dictionary, files are automatically opened and parsed into the dictionary based on their file extension.

Making a LorenDict available to Jinja2 templating will supercharge the capabilities of those templates to generate pretty much anything. I’ve found it super useful to break large configurations into smaller files, or when I’ve had to translate e.g. a CSV file into JSON.

Finally, having .py files as configuration input do allow you to supercharge your Jinja2 rendering even further, including reading data from an API or database.

&Data

Discussion about this post

Ready for more?