Loading Dangerously: PyYAML and Safety by Design

This story is about code, but I hope the analogy should hold if you think in terms of woodworking tools too. Losing a finger to a saw may be more memorable than picturing a computer blue-screening.

Part 1: Power, Safety, and Good Design

“When your hammer is C++, everything begins to look like a thumb.” - Steve Haflich

Good designs behave the way users expect. However, there is not a generic answer for what users expect from a tool. People tend to focus on the things that a tool enables, and not consider what that tool restricts them from doing. Do users expect tools to let them hurt themselves?

Tools with more power have greater potential to have negative effects. Tools with limited power are able to avoid entire classes of error that more powerful tools are exposed to. For example, Python doesn’t let users manage memory, making it less “powerful” than C. However, this tradeoff protects users from accidentally dereferencing null pointers. In exchange for letting programmers make memory management errors, C users can write programs that run orders of magnitude faster. Power comes with risks, which users learn to manage with experience.

Effective and popular designs restrict power sensibly, guiding users towards good default choices. As Cheng Lou said at React Europe 2016:

(As developers) We’re often not seeking more power. We’re seeking more principled ways to use our existing power.

Many popular projects thrive because they provide sensible defaults that users of all experience levels can start with. Users may override the defaults as their needs and knowledge grow. Apollo Boost does this for the Apollo GraphQL Client, and Create React App does this for Webpack and the React ecosystem.

Few would argue that all tools should be released with maximum power, and expect that all users go through advanced training before doing anything. This mindset is exclusionary, and unlikely to build much of community beyond a very small group of devout users. From a responsibility perspective, it would be like opening up a machine shop with laser cutters and buzzsaws, and hoping that only veteran mechanics (but not teenagers) would ever use them.

In contrast, there is a balanced approach to toolmaking, one that enables a powerful tool to be used in a community with mixed experience levels. We can provide safety (restrict power) by default, and add power only if the user explicitly asks for it. Through the default effect, this ensures proper usage in most cases. In practice, this means:

Setting safe defaults for optional arguments in functions as requests does for the SSL verify flag, or
Making the user type more to choose the risky option, as React does with dangerouslySetInnerHTML.

While this idea may seem simple, it can be nontrivial to do in practice, even in projects with very high visibility and usage. (See the first Related Reading for a story of API design challenges with the command line tool curl).

Inspired by a recent conversation with my coworker Charlie, what follows is a story about how the difficulty of managing this idea led to over half a year of heated discussion on a real Python project.

Part 2: Changing to Safe by Default: A PyYAML Story

Introducing PyYAML and YAML

PyYAML is a Python library for opening and creating YAML files. It is extremely popular- with over 1 million downloads daily, it is consistently a top 10 downloaded library from the Python package index (PyPI).

YAML is a terser human-readable superset of the JSON data format. While JSON is common as a raw data format sent over HTTP, YAML files are very popular for storing configuration.

Hidden Powers in YAML, and a Security Hole

The Python standard library json.load does not have “side effects” besides reading a stream of text input. Because I assumed YAML was equivalent to JSON and had not read the 23,000+ word spec, I assumed that PyYAML’s yaml.load had the same properties. Last June, I learned that this was incorrect.

In tip #7 of 10 Common Security Gotchas in Python, I learned that using yaml.load could run arbitrary code. While the danger of this possibility is limited only by your imagination, the article provided the very plausible example of having your passwords emailed to a hacker.

In fairness to the library authors, there was a warning on their documentation page about this danger. Unfortunately, I had not seen this before. Upon learning this fact, I opened a PR to replace instances of yaml.load with yaml.safe_load in my projects at work.

Intermission: Try both loaders, live

You can experience the difference between the two loading methods with this online code sandbox. The sandbox has 2 configuration YAML files, one of which only has plain text data, while the other has a line that executes a potentially malicious script. Change the value of isSafe to switch which loader is used.

def process_case(case, isSafe=False):
  if isSafe:
    loader = yaml.safe_load
  else:
    loader = yaml.load

  with open(case['filename']) as fp:
    config = loader(fp)

  print(config)

You will find that the default loader opens both files without complaining, whereas the safe loader throws an exception if you try to open the YAML which contains code. Do not worry, since this python code is running on a remote server, your computer is safe.

Making safety the default

Coincidentally, after reading about this vulnerability I was discussing the concept of restricting APIs in a software design course with coach Jimmy Koppel. When prompted to find examples in the wild, I realized I was curious about why safe_load wasn’t the default behavior, as it implied that regular load was dangerous. Luckily, since the last 3.12 release, a PyYAML project maintainer had the same idea!

On June 26, 2018, I was notified through a dependency checker’s pull request to my open source python library that PyYAML released a new version 4.1 after 2 years at 3.12! Since this was a potentially breaking change, I read the changelog.

In my opinion, the most important change was this addition from August 2017- making yaml.load/yaml.dump “safe” by default, and renaming the old vanilla load/dump to danger_load and danger_dump.

This upgrade would make the updates I had made to my work project unnecessary, as it had the same effect as upgrading the library version.

Python’s YAML Parsers are not Alone

Upon sharing this example, I learned that several other languages had popular “unsafe” YAML loaders by default as well:

PHP from Prescott Murphy (another course participant)
Ruby/Rails via Jimmy Koppel

How could this happen in so many places? As Martin Tournoij put it in the PHP article:

YAML may seem ‘simple’ and ‘obvious’ when glancing at a basic example, but turns out it’s not. The YAML spec is 23,449 words; for comparison, TOML is 3,339 words, JSON is 1,969 words, and XML is 20,603 words.

The longer the spec, the harder it is to write a parser that covers all the edge cases.

Defining “Safety” is Hard

Because of work, I didn’t merge the automated pull request right away. However, I anticipated that 4.1 release wouldn’t break my project, and was prepared to merge it after the US holiday.

In a stunning turn of events, 1 week after I received the automated pull request for this innocent sounding set of changes, a series of increasingly intense discussions were launched:

A PyYAML lead maintainer was thinking of reverting this “safe by default” change in the next 4.x release because it broke backwards compatibility with many other widely used libraries, such as vcrpy.
The definition of “safe” is not straightforward, the new safe_load could be considered less safe than it was before
There was a long thread about whether to prefix the “not-safe” methods with the word danger or python.
The whole situation was so draining that one of the two core maintainers chose to leave the project.

The team ultimately ended up removing PyYAML 4.1 from PyPI. This was a highly unusual move given that PyPI is viewed as immutable. I ended up rejecting the PR, because 4.1’s absence from PyPI meant that it was not a valid version to pin my app to.

At the time of this writing, the remaining team is working towards the roadmap for a new stable 4.2 release.

In the meantime, Github’s new vulnerability indicator has been pushing people to switch their usage of load to safe_load as this vector was flagged as a high severity vulnerability by NIST. 6+ months later, people are still learning about this change for the first time.

As of February 21 2019, based on the 4.2 roadmap, the only changes made to PyYAML 3.13 released in July 5 2018, are the ones necessary for compatibility with Python 3.7. I can’t say for sure because the changes are not in the changelog

Takeaways

If you’re not concerned with performance and just want a stricter Python YAML parser, consider StrictYAML .
It is much easier to design APIs with safety (restricted power) from the beginning. You can always add power in future versions because people have to opt-in to using it. In contrast, taking power away is tricky because people are used to having it, and doing so will break backwards compatibility.
Maintaining open source software is a challenging and often thankless task. The way people choose to communicate can drive even the most dedicated people away. While you can’t change the conduct of other people, letting maintainers of projects use use know that you appreciate their work may give them the motivation to persist.
Reading documentation for new libraries is generally important. However, it is especially critical for functions that process data from untrusted sources, or operate at the outer boundaries of your application surface in general.

With thanks to Charly Fontaine for encouraging me to turn my notes from last summer into a blog post, and Nazim Saouli for brainstorming ideas around projects with good defaults

Notes

I was notified of the release of PyYAML 4.1 thanks to having an open source projects monitored by PyUP. I highly recommend this free service, as it keeps my projects up to date, and because I would not have noticed the PyYAML situation without it.
Regarding constraints and “power” in design: With algorithmic sound generation or sonification - if the user has too much power (access to all the notes), there there is a risk that “noise” rather than “music” is produced. Some engines work only with notes that are guaranteed to sound “nice” when played together. This sound may not be viewed as “good” if it’s too predictable or repetitive, so lifting this “guaranteed harmony” restriction may be a necessary step to produce compelling tracks. Music is complicated and different from the code in this scenario, as there isn’t “dangerous” sound, and because dissonant sound can be used artistically.

Published 22 Feb 2019

For updates about new posts, subscribe via email or RSS.

remixing ideas with data, visuals, and codeserendipidata on Twitter