Converting JSON Keys to Snake Case in PostgreSQL
Well, it’s that time of the decade again when I dust off the old technical blog and write about a recent random and esoteric task that I completed for my job.
This time the problem I had involved transforming the keys of a
jsonb object stored in PostgresQL from camelCase to snake_case. The reason why isn’t super important to the details, but imagine a couple tables like so:
This schema uses the PostgreSQL-recommended case-insensitive snake_case naming conventions. Now imagine
my_table has a row with
pk_id = 1, and
my_table_patches has a row referencing that with something like this:
The idea is that something would happen in the system that would “resolve” the row in
my_table_patches, and a trigger would fire that should update the row in
my_table with the new
some_value value specified in the JSON patch under the
You might be asking yourself a couple of very reasonable questions right about now. Firstly, why store the patch as JSON instead of discreet columns in
my_table_patches? Because of reasons, that’s why. Actually in the model I’m working with, the “patch” table can reference any of several other tables and the patch data can reference columns in any of those, so a one-to-one column mapping for updated data would introduce a lot of ungainly overhead into the model.
The next very reasonable question you might be asking is why is the patch stored as camelCase anyway? Why not just store the patches with snake_case keys to match the schema and vastly simplify the dynamic SQL needed to update
my_table? Well, the answer to that involves a Postgraphile API that automatically translates snake_cased PostgresQL schemas into camelCased Graphile models, and a front-end that writes subsets of those Graphile models back into the patch table directly.
If we just neededed to apply the patch in a single mutation (that’s the GraphAPI term for an API endpoint that modifies data, I think), it wouldn’t be a problem because we could write that custom mutation as a Postgraphile extension and let it handle the conversions between camelCase and snake_case for us the same way it does with everything else. In this case, however, it made much more sense to put the patch logic into an SQL trigger, meaning it had to be written in pure SQL (or PL/pgSQL) without the benefit of Postgraphile’s serialization magic.
And so began my odyssey into the wonderful world of PostgreSQL’s json manipulation and introspection capabilities. As a starting point, here was (roughly) what I threw down as an initial pass for the proof of concept:
This function converts the entire JSON value to a string, then uses a regular expression to find everything in that string that looks like a JSON property name (a quote followed by some string of letters followed by another quote and a colon). For each of those property-looking things it extracts the name from between the quotes, runs another regular expression that inserts an underscore before any capital letters in the name, and then swaps the lowercase version of that in for the old property before converting the whole thing back into a JSON object.
This actually seems to work, but is horrifying and inelegant for a number of reasons, the primary one being that there’s almost certainly a way to craft a valid JSON string that, having passed through this function, would result in at best an invalid JSON string or at worst something that would result in malicious SQL injection issues down the road when we try to apply this payload as a patch.
The latest versions of PostgreSQL actually have a number of useful JSON functions, so when it came time to revisit this procedure I wanted to replace it with something more robust that didn’t rely on regular expressions to extract keys, and didn’t rely on converting the entire payload to text and back to JSON. After many attempts and many visits back to the PostgreSQL JSON documentation, this is what I ended up with:
The comments inline should do a fairly good job of explaining how it works. This new function uses
jsonb_each instead of a sketchy regular expression to extract the keys and values. It still uses a regular expression to do the actual conversion from camelCase to snake_case, which could probably be improved upon, but at least we know it’s running on an actual property name here and not something that just looks like a property name based on another regular expression. The function also checks every value in the JSON data using
jsonb_typeof and makes recursive calls to itself so that property names in child objects and object arrays will also get converted to snake_case. At no point is any data converted between text and JSON.
My main takeaway from this exercise was discovering just how overinflated my sense of accomplishment can get, even for achieving relatively simple and mundane tasks, when the language I’m working with is an obtuse pseudo-language like PL/pgSQL. I mean I felt really, really good about this one. Like I remember feeling back in college when, usually late at night in some computer lab, I would come up with what I thought was a really clever way to solve some homework assignment. Of course, in hindsight, the solution was not so much “clever” but more “correct” and “expected.”
Still, it is nice to occasionally feel that way again, even now decades later. So thank you for that, PL/pgSQL.
Short Permalink for Attribution: rdsm.ca/nt53g