Advent 2021: XML & JSON
This blog is part of the 24 posts long series "Advent 2021":
- Advent 2021: Intro (December 01, 2021)
- Advent 2021: C++ (December 02, 2021)
- Advent 2021: C# (December 03, 2021)
- Advent 2021: Python (December 04, 2021)
- Advent 2021: Go (December 05, 2021)
- Advent 2021: TypeScript (December 06, 2021)
- Advent 2021: CMake (December 07, 2021)
- Advent 2021: Django (December 08, 2021)
- Advent 2021: Angular (December 09, 2021)
- Advent 2021: Flask (December 10, 2021)
- Advent 2021: gRPC (December 11, 2021)
- Advent 2021: GraphQL (December 12, 2021)
- Advent 2021: XML & JSON (December 13, 2021)
- Advent 2021: Matplotlib, Pandas & Numpy (December 14, 2021)
- Advent 2021: Linux (December 15, 2021)
- Advent 2021: Ansible (December 16, 2021)
- Advent 2021: SQLite (December 17, 2021)
- Advent 2021: Catch2 (December 18, 2021)
- Advent 2021: Zstandard (December 19, 2021)
- Advent 2021: ZFS (December 20, 2021)
- Advent 2021: Thunderbird (December 21, 2021)
- Advent 2021: Visual Studio Code (December 22, 2021)
- Advent 2021: Blender (December 23, 2021)
- Advent 2021: Open source (December 24, 2021)
No, I’m not joking today, I do actually like boring file formats, and in fact I like both XML and JSON. It’s a bit difficult today to imagine how the world looked before XML and JSON, but they’re not around since the dawn of time (actually, both were defined around the year 2000). It also took a while for both to gain popularity, and in the case of XML, I think it’s fair to say that it took some time to settle in as it was hyped a lot at the beginning. As a programmer these days it may be a bit hard to appreciate their existence, but let’s try to look at how we use them today, and then it hopefully becomes clearer.
These days, if you have simple structured data, JSON will be your goto format. It’s so ubiquitous and has high-quality support in any language you could care about (Go, C++, Python, …) that is has become a no-brainer to use. The way JSON represents data maps very well to most programming languages, and even serialization to JSON is usually fairly simple. It’s also so deeply integrated into most modern languages that serializing using JSON is often less work than serializing something to text and reading it back – which is quite amazing if you think about it!
XML on the other hand is always a bit more heavy-handed, but it comes with a few features that JSON doesn’t have yet and is slowly adopting. In my opinion, the XML ecosystem originally came with three big “features”: Schemas, XPath, and XSL(T). XSL(T) was promising but never really took off, but schemas and XPath were good ideas and the value can be seen in the fact that JSON is following suit. JSON Schema is the JSON variant of XML schemas, and it’s reasonably standard that you can assume that you can find a tool or library supporting it. On the query side, there was be JSONPath but unfortunately it’s not standard so you’ll see different ways to access JSON documents across multiple languages – MongoDB has one way of doing projections, PostgreSQL has another. That said, JSON is simple enough that in most cases it directly maps onto a dictionary style data structure in your language and you probably don’t need to do much of a query to start with. I do like seeing some convergence here because ultimately, structured data requires some way to define the structure, and some way to access data in a structured fashion.
There have been also some learnings over the years which I think are interesting to see. XML is really a format meant for computers to generate and consume. We can see this in the popularity of things like SVG – I don’t think that any human would enjoy writing SVGs by hand. The analogy I like to make is that XML is to JSON what C++ is to Python. If you need something strongly typed with structure, XML is there, has rock-solid tooling, and will get the job done, but take you a little bit longer. If you need something quickly, JSON is the simplest imaginable solution. That flexibility comes at the cost that later down the line you might want to enforce a structure and have to duct tape it on.
I still think it’s great to see those formats around and ultimately serving nearly all needs we have these days for structured data. They’re verbose and textual, which means they’re not great for storing large amount of binary data, but on the other hand it also means that if you get hold of a 10 year old, undocumented JSON or XML, you’re more likely to make sense out of it than any binary encoding. I hope that sheds some light on why I like the current state of affairs. It wasn’t always like this, and it’s refreshing to see a world in which we can just use some standards and get work done!