A place for musings, observations, design notes, code snippets - my thought gists.
Internationali(z|a)tion is hard
I came across a UI glitch today in my Uber app. At first glance it appears to be a preposterous oversight, the s
in Favourites
has been orphaned!
I live in Australia, where we closely follow British English spelling - meaning it’s “Favourites” and not “Favorites”. In the world of UI/UX, it’s common to use localisation dictionaries to map strings to locale-appropriate versions. I suspect some UI designer carefully crafted this screen for US English and mapped over to AU English, accidentally committing a tiny typographic crime.
There’s a fantastic site literally called grumpy.website that aggregates many such UI/UX whoopsies. I’d definitely recommend checking it out!
Gitting things done
GitHub has a fascinating interview with Linus Torvalds, the inventor of both git and Linux for the 20th anniversary of git.
Writing in Future Tense: Machine Time
I published a blog post last night but it never appeared on the site. My GitHub Actions workflow kicked in, my commit hit the server, my Cloudflare build completed with no warnings or errors - everything looked good.
The culprit? Timezone mismatch. I’m writing from AEST (+10, I’m in Melbourne), but Cloudflare Pages Workers builds in UTC (“server time”). Hugo saw my future timestamp and politely ignored the post.
The fix: Use hugo --buildFuture
as the build command in Cloudflare Pages settings to include posts “in the future”. I’ll consider this a cautionary tale … it’s not the first time timezones have caused me havoc in production.
⚡️Apache Spark 4.0 released
Apache Spark 4.0 has been released. It’s the first major version update since Spark 3.0 in 2020.
Here’s some of the highlights I’m excited about:
- A new SQL pipe syntax. It seems to be a trend with modern SQL engines to include “pipe” syntax support now (e.g. BigQuery). I’m a fan of functional programming inspired design patterns and the excellent work by the prql team, so I’m glad to see this next evolution of SQL play out.
- A structured logging framework. Spark logs are notoriously lengthy and this means you can now use Spark to consume Spark logs! Coupled with improvements to stacktraces in PySpark, hopefully this will mean less
grep
ping tortuously long stack traces. - A new
DESCRIBE TABLE AS JSON
option. I really dislike unstructured command line outputs that you have to parse withawk
ward bashisms. JSON input/outputs and manipulation withjq
is a far more expressive consumption pattern that I feel captures the spirit of command line processing. - A new PySpark Plotting API! It’s interesting to see it supports plotly on the backend as an engine. I’ll be curious to see how this plays out going forward… Being able to do #BigData ETL as well as visualisation and analytics within the one tool is a very powerful combination.
- A new lightweight python-only Spark Connect PyPi package. Now that Spark Connect is getting more traction, it’s nice to be able to
pip install
Spark on small clients without having to ship massive jars around. - A bug fix for inaccurate Decimal arithmetic. This is interesting only insofar as it reminds me that even well-established, well-tested, correctness-first, open-source software with industry backing can still be subject to really nasty correctness bugs!
Databricks has some excellent coverage on the main release and the new pipe syntax specifically.
Answering the Unasked
I’m not sure exactly where this originated from, but I’m quite delighted by this exam question:
State some substantive question which you thought might appear on this exam, but did not. Answer this question (correctly).
As an interview question, I’ll sometimes ask: “Tell me something interesting you’ve discovered or learned recently.” I find its goes a long way to understanding the way the candidate thinks; how they convey technical knowledge to others; and to get a flavour for how real their passion and interest is for the domain.
Paper: Step-by-Step Diffusion: An Elementary Tutorial
I’ve been reading through Step-by-Step Diffusion: An Elementary Tutorial by Apple (arxiv). The mathematics underpinning diffusion language or image models is quite complex, but this walkthrough strikes a nice balance between concrete mathematical grounding and intuition.