I’ve spent a bit of time recently working on a tool and, yes, a language too. It’s something I’ve often needed and haven’t found anything quite like it.

Declarative formats like JSON, YAML and TOML are ubiquitous. They all have their different strengths and weaknesses and different sweet spots.

  • JSON is great on the wire and supremely interoperable.
  • TOML is quite natural for configuration.
  • YAML seems particularly strong in contexts where you’re describing what there is or what there should be like AWS CloudFormation or Kubernetes resource descriptions.

Under the covers YAML and JSON have a similar data model, whether you call them mappings and sequences or objects and arrays, they’re essentially the same thing. And the universality of this scheme means that it’s easy to map other formats like TOML, XML, EDN into them (although clearly you need a bit of wiggle room when the source format is richer - more on this some other time).

A lot of the tooling that is out there applies only to one format (like the awesome jq) or is ad hoc conversion (json2yaml…) or just intended for really different things (pandoc).

And then there’s a whole lot of text-based templating going on as well. Like in things like helm which needs to generate yaml for kubernetes. It falls back to Golang string templates for hacking up and spitting out YAML.

Templating, or processing of some form, is necessary because data formats are simply that. They provide little leverage or expressiveness to a programmer for describing repetition and relationships.

Often declarative DSLs are found wanting because they lack loops or conditionals or other concepts that developers find useful (often for good reason). This is why we use tools or processing on these formats at all. To generate from a more structured or concise specification an artefact.

I think when working with a semi-structured data format, we should have tools available that work with that data model, not tools which dissolve it utterly into text and then reconstitute it again.

I think we should have powerful and expressive language to define, generate and process these formats.

This doesn’t necessarily mean adherence to all the minutiae of the fearsome YAML 1.2 spec. It doesn’t mean reproducing of XML, XSD and IDE tooling. Practical over pedantic.

A good tool would make JSON and YAML elements more liquid rather than evaporating everything to a gaseous stream of characters.

So that’s what I’ve been sketching out in the Eucalypt project (here).

It should make simple things really simple. So if I have a document with values in it…

foo: 3
bar: 4

…and a document fulfilling the role of a “template”:

description: this is like a template
number: !eu foo + bar
text: !eu "{foo}-and-{bar}"

… I can trivially smudge them together (using eu values.yaml template.yaml) to get a filled-in template.

description: this is like a template
number: 7
text: 3-and-4

It should also make complicated things possible. So in reality the Eucalypt tool is actually a lazy, functional programming language with a library of useful functions for manipulating strings, sequences and blocks and import / export capability for the main formats it supports.

The native eucalypt syntax allows you to express functions and combine blocks and lists is all sorts of interesting ways.

azs: ["a", "b", "c"] map("eu-west-1{}")
cidrs: [1, 2, 3] map("10.0.{}.0/24")

subnet(vpc, az, cidr): {
  Type: "AWS::EC2::Subnet"
  Properties: {
	VpcId: { Ref: vpc }
	AvailabilityZone: az
	CidrBlock: cidr
  }
}

public-subnet(vpc, az, cidr):
  subnet(vpc, az, cidr) << { Properties: { MapPublicIpOnLaunch: true } }

`:main
subnets: zip-with(public-subnet("my-vpc"), azs, cidrs)

…expands (via eu subnets.eu) to:

- Type: AWS::EC2::Subnet
  Properties:
	VpcId:
	  Ref: my-vpc
	AvailabilityZone: eu-west-1a
	CidrBlock: 10.0.1.0/24
	MapPublicIpOnLaunch: true
- Type: AWS::EC2::Subnet
  Properties:
	VpcId:
	  Ref: my-vpc
	AvailabilityZone: eu-west-1b
	CidrBlock: 10.0.2.0/24
	MapPublicIpOnLaunch: true
- Type: AWS::EC2::Subnet
  Properties:
	VpcId:
	  Ref: my-vpc
	AvailabilityZone: eu-west-1c
	CidrBlock: 10.0.3.0/24
	MapPublicIpOnLaunch: true

I have some explaining to do.

There is a lot going on in the Eucalypt language that’s unorthodox or controversial and some of the design decisions are worth describing. I’ve tried some things out and discarded them and moved some way from the initial concept in some areas. There are plenty of things I’d like to implement that aren’t in there yet. And the current implementation is crude and error messages are predictably horrible at this stage. But it’s useful and getting more powerful quite quickly.

If you’re interested:

The implementation is at https://github.com/curvelogic/eucalypt-hs. It’s accidentally in Haskell. (Hey! Seriously not a problem. Get the repo then brew install stack and stack install. Though one day I may get around to a Go / Rust / C++ / Clojure implementation…)

A repo at https://github.com/curvelogic/eucalypt contains (very few) docs and (many more) sample files for a test harness.

Update (02/2019):

  • these two repos have now merged and everything is at https://github.com/curvelogic/eucalypt
  • you can install on macos with brew install curvelogic/homebrew-tap/eucalypt