Posts tagged with "Software-Engineering"

13.07.2025 Code minimisation for ethical learning from LLMs

Like many people nowadays, I find LLMs an invaluable tool for learning new concepts or vibecoding in unfamiliar stacks and langauages. It’s undeniably a massive accelerator when it comes to quickly iterating and learning.

But it has its downsides. There’s a strong argument to be said that by offloading the challenge of learning something new, your core critical thinking atrophies and your ability to focus slides down the slippery slope of instant gratification.

11.07.2025 Better Python path handling with Pathlib and Git

One of the most frustrating problems in ad-hoc data science projects is broken file paths.

You write a script that loads a model, grabs data, or instantiates a config from your local disk. It works perfectly on your machine, then someone else runs it and… catastrophe: FileNotFoundError: No such file or directory. Uh oh, looks like someone just got bit by hard-coded paths or assumptions about where the script is being run from.

10.06.2025 Mapbox documentation uses real tokens

I’ve been working on a small side project that requires embedding an interactive map. I spent some time evaluating different mapping providers - initially OpenStreetMap with leaflet for annotations. OpenStreetMap is great for rapid prototyping since it’s free and doesn’t require any API key, but it doesn’t look great (by default, at least) and it doesn’t have the same rich POI data integration.

I dabbled with Mapbox next and was really impressed. It has end-to-end API coverage for heaps of mapping use cases, and provides lots of examples of how to integrate Mapbox into your choice of language/platform/framework. However, the thing that really stood out to was their onboarding design. Once you create an account and login, Mapbox injects your API keys directly into the example code on the webpage. They literally inject your personal API key into the standard examples for logged in users so that you can copy/paste and immediately start working.

09.06.2025 Internationali(z|a)tion is hard

I came across a UI glitch today in my Uber app. At first glance it appears to be a preposterous oversight, the s in Favourites has been orphaned!

I live in Australia, where we closely follow British English spelling - meaning it’s “Favourites” and not “Favorites”. In the world of UI/UX, it’s common to use localisation dictionaries to map strings to locale-appropriate versions. I suspect some UI designer carefully crafted this screen for US English and mapped over to AU English, accidentally committing a tiny typographic crime.

04.06.2025 Gitting things done

GitHub has a fascinating interview with Linus Torvalds, the inventor of both git and Linux for the 20th anniversary of git.

03.06.2025 Writing in Future Tense: Machine Time

I published a blog post last night but it never appeared on the site. My GitHub Actions workflow kicked in, my commit hit the server, my Cloudflare build completed with no warnings or errors - everything looked good.

The culprit? Timezone mismatch. I’m writing from AEST (+10, I’m in Melbourne), but Cloudflare Pages Workers builds in UTC (“server time”). Hugo saw my future timestamp and politely ignored the post.

The fix: Use hugo --buildFuture as the build command in Cloudflare Pages settings to include posts “in the future”. I’ll consider this a cautionary tale … it’s not the first time timezones have caused me havoc in production.

29.05.2025 ⚡️Apache Spark 4.0 released

Apache Spark 4.0 has been released. It’s the first major version update since Spark 3.0 in 2020.

Here’s some of the highlights I’m excited about:

A new SQL pipe syntax. It seems to be a trend with modern SQL engines to include “pipe” syntax support now (e.g. BigQuery). I’m a fan of functional programming inspired design patterns and the excellent work by the prql team, so I’m glad to see this next evolution of SQL play out.
A structured logging framework. Spark logs are notoriously lengthy and this means you can now use Spark to consume Spark logs! Coupled with improvements to stacktraces in PySpark, hopefully this will mean less grepping tortuously long stack traces.
A new DESCRIBE TABLE AS JSON option. I really dislike unstructured command line outputs that you have to parse with awkward bashisms. JSON input/outputs and manipulation with jq is a far more expressive consumption pattern that I feel captures the spirit of command line processing.
A new PySpark Plotting API! It’s interesting to see it supports plotly on the backend as an engine. I’ll be curious to see how this plays out going forward… Being able to do #BigData ETL as well as visualisation and analytics within the one tool is a very powerful combination.
A new lightweight python-only Spark Connect PyPi package. Now that Spark Connect is getting more traction, it’s nice to be able to pip install Spark on small clients without having to ship massive jars around.
A bug fix for inaccurate Decimal arithmetic. This is interesting only insofar as it reminds me that even well-established, well-tested, correctness-first, open-source software with industry backing can still be subject to really nasty correctness bugs!

Databricks has some excellent coverage on the main release and the new pipe syntax specifically.

23.05.2025 Things rewrites their server architecture in Swift

I’ve been a long time user of Cultured Code’s Things to-do app. It’s slick, has well designed ergonomics, and is perfectly minimalistic. Things’ Markdown support is tasteful and its approach to task management structured but pared back.

They’ve just announced a rewrite of their existing server-side infrastructure stack in Swift, the linked post and blog post are worth a read.

From a technical perspective, I’ve always appreciated its rock-solid proprietary Things Cloud syncing service. In particular, I find it interesting the app asks for Local Network access to enable faster syncing:

13.04.2025 To uv or not to uv

uv is a blazing fast Python package manager that aims to displace pip. It’s a really slick tool that lets you go from git clone to executing code with all the dependencies seamlessly. All the standard accolades apply: written in Rust, beautiful terminal UI, well thought-out user ergonomics … all written by Astral, the same company that gave the Python community ruff.

But what makes it so damn fast? Under the rusty hood, the magic is even more impressive. Charlie Marsh, the project lead, presented at Jane Street and revealed some of the inner workings. The whole talk is super interesting, but some standout highlights are:

18.02.2025 Surely you're `jq`ing

Today I read through the jq manual cover-to-cover. For those unaware, jq is a popular CLI tool to query and manipulate JSON. It’s also a Turing-complete mini-language with nice functional semantics that fits well into the ethos of composable CLI tools.

It was an exemplar of well-written technical documentation. Concise, well-written, littered with examples, and linking to an interactive playground to test-and-learn.

Some learnings:

It’s surprisingly functional! You can implement recursive functions and use higher-order functions! For example, here’s factorial in jq:

$ jq '[.,1]|until(.[0] < 1; [.[0] - 1, .[1] * .[0]])|.[1]'

It supports string interpolation - this is really nice if you’re piping stuff from JSON into a string. Coupled with format strings this becomes frictionless:

$ echo '{"search":"hello; world"}' | jq -r '@uri "https://www.google.com/search?q=\(.search)"'
# https://www.google.com/search?q=hello%3B%20world

You can define functions that accept functions¹, and control structures that allow labelling.

$ echo '[[1,2],[10,20]]' | jq -r 'def addvalue(f): . + [f]; map(addvalue(.[0]))'
#[[1,2,1], [10,20,10]]

You can traverse complex data structures with first-class pathing support. And you can easily modify nested structures to extend objects.
For the category theorists/polyglots, there’s a denotational semantics paper written about jq.
Bonus: You can build a Brainfuck interpreter in jq; and you can build a jq interpreter in jq - how’s that for bootstrapping!

Side note: This is one of my goals for 2025 - read through documentation end-to-end to develop mastery over tools. I’m trying to prioritise selectively depth over breadth.

15.02.2025 Shellshocked? Brace yourselves!

I just discovered that to capture multiple lines of stdout from a shell script and redirect them to a file, you can simply wrap them in braces!

For example, my “Create a blog post via a GitHub Action triggered on an Issue creation” workflow uses this snippet:

{
  echo "---"
  jq 'del(.content)' "parsed_issue.json" | yq -P
  echo "---"
  echo ''
  
  # Inline "content" key for the body
  jq -r '.content' "parsed_issue.json"
} > content/micro-blog/"$FILENAME"

22.01.2025 Developer Ergonomics

“I wonder how much it is insightful to watch someone doing a workflow and to note when discomfort kicks in. That’s a really insightful thing to realize what matters from bitter experience, right? … Experience tells you when to worry about something and when not to worry about it” - Ben Sparks

That is - the rising discomfort of a programmer when employing a new tool, framework, or library is a good window into the ergonomics of how one uses your tool, framework, or library. Source: How I animate 3Blue1Brown | A Manim demo with Ben Sparks. The whole video is worth checking out! It’s a masterclass on how to construct a programatic-animation library and demonstrate how to work within it.