Abstract Nonsense

Posts tagged with "Software-Engineering"

Writing in Future Tense: Machine Time

I published a blog post last night but it never appeared on the site. My GitHub Actions workflow kicked in, my commit hit the server, my Cloudflare build completed with no warnings or errors - everything looked good.

The culprit? Timezone mismatch. I’m writing from AEST (+10, I’m in Melbourne), but Cloudflare Pages Workers builds in UTC (“server time”). Hugo saw my future timestamp and politely ignored the post.

The fix: Use hugo --buildFuture as the build command in Cloudflare Pages settings to include posts “in the future”. I’ll consider this a cautionary tale … it’s not the first time timezones have caused me havoc in production.

⚡️Apache Spark 4.0 released

Apache Spark 4.0 has been released. It’s the first major version update since Spark 3.0 in 2020.

Here’s some of the highlights I’m excited about:

  • A new SQL pipe syntax. It seems to be a trend with modern SQL engines to include “pipe” syntax support now (e.g. BigQuery). I’m a fan of functional programming inspired design patterns and the excellent work by the prql team, so I’m glad to see this next evolution of SQL play out.
  • A structured logging framework. Spark logs are notoriously lengthy and this means you can now use Spark to consume Spark logs! Coupled with improvements to stacktraces in PySpark, hopefully this will mean less grepping tortuously long stack traces.
  • A new DESCRIBE TABLE AS JSON option. I really dislike unstructured command line outputs that you have to parse with awkward bashisms. JSON input/outputs and manipulation with jq is a far more expressive consumption pattern that I feel captures the spirit of command line processing.
  • A new PySpark Plotting API! It’s interesting to see it supports plotly on the backend as an engine. I’ll be curious to see how this plays out going forward… Being able to do #BigData ETL as well as visualisation and analytics within the one tool is a very powerful combination.
  • A new lightweight python-only Spark Connect PyPi package. Now that Spark Connect is getting more traction, it’s nice to be able to pip install Spark on small clients without having to ship massive jars around.
  • A bug fix for inaccurate Decimal arithmetic. This is interesting only insofar as it reminds me that even well-established, well-tested, correctness-first, open-source software with industry backing can still be subject to really nasty correctness bugs!

Databricks has some excellent coverage on the main release and the new pipe syntax specifically.

Things rewrites their server architecture in Swift

I’ve been a long time user of Cultured Code’s Things to-do app. It’s slick, has well designed ergonomics, and is perfectly minimalistic. Things’ Markdown support is tasteful and its approach to task management structured but pared back.

They’ve just announced a rewrite of their existing server-side infrastructure stack in Swift, the linked post and blog post are worth a read.

From a technical perspective, I’ve always appreciated its rock-solid proprietary Things Cloud syncing service. In particular, I find it interesting the app asks for Local Network access to enable faster syncing:

To uv or not to uv

uv is a blazing fast Python package manager that aims to displace pip. It’s a really slick tool that lets you go from git clone to executing code with all the dependencies seamlessly. All the standard accolades apply: written in Rust, beautiful terminal UI, well thought-out user ergonomics … all written by Astral, the same company that gave the Python community ruff.

But what makes it so damn fast? Under the rusty hood, the magic is even more impressive. Charlie Marsh, the project lead, presented at Jane Street and revealed some of the inner workings. The whole talk is super interesting, but some standout highlights are:

Surely you're `jq`ing

Today I read through the jq manual cover-to-cover. For those unaware, jq is a popular CLI tool to query and manipulate JSON. It’s also a Turing-complete mini-language with nice functional semantics that fits well into the ethos of composable CLI tools.

It was an exemplar of well-written technical documentation. Concise, well-written, littered with examples, and linking to an interactive playground to test-and-learn.

Some learnings:

  1. It’s surprisingly functional! You can implement recursive functions and use higher-order functions! For example, here’s factorial in jq:
$ jq '[.,1]|until(.[0] < 1; [.[0] - 1, .[1] * .[0]])|.[1]'
  1. It supports string interpolation - this is really nice if you’re piping stuff from JSON into a string. Coupled with format strings this becomes frictionless:
$ echo '{"search":"hello; world"}' | jq -r '@uri "https://www.google.com/search?q=\(.search)"'
# https://www.google.com/search?q=hello%3B%20world
  1. You can define functions that accept functions1, and control structures that allow labelling.
$ echo '[[1,2],[10,20]]' | jq -r 'def addvalue(f): . + [f]; map(addvalue(.[0]))'
#[[1,2,1], [10,20,10]]
  1. You can traverse complex data structures with first-class pathing support. And you can easily modify nested structures to extend objects.
  2. For the category theorists/polyglots, there’s a denotational semantics paper written about jq.
  3. Bonus: You can build a Brainfuck interpreter in jq; and you can build a jq interpreter in jq - how’s that for bootstrapping!

Side note: This is one of my goals for 2025 - read through documentation end-to-end to develop mastery over tools. I’m trying to prioritise selectively depth over breadth.

Developer Ergonomics

“I wonder how much it is insightful to watch someone doing a workflow and to note when discomfort kicks in. That’s a really insightful thing to realize what matters from bitter experience, right? … Experience tells you when to worry about something and when not to worry about it” - Ben Sparks

That is - the rising discomfort of a programmer when employing a new tool, framework, or library is a good window into the ergonomics of how one uses your tool, framework, or library. Source: How I animate 3Blue1Brown | A Manim demo with Ben Sparks. The whole video is worth checking out! It’s a masterclass on how to construct a programatic-animation library and demonstrate how to work within it.