A place for musings, observations, design notes, code snippets - my thought gists.
Gitting things done
GitHub has a fascinating interview with Linus Torvalds, the inventor of both git and Linux for the 20th anniversary of git.
Writing in Future Tense: Machine Time
I published a blog post last night but it never appeared on the site. My GitHub Actions workflow kicked in, my commit hit the server, my Cloudflare build completed with no warnings or errors - everything looked good.
The culprit? Timezone mismatch. I’m writing from AEST (+10, I’m in Melbourne), but Cloudflare Pages Workers builds in UTC (“server time”). Hugo saw my future timestamp and politely ignored the post.
The fix: Use hugo --buildFuture
as the build command in Cloudflare Pages settings to include posts “in the future”. I’ll consider this a cautionary tale … it’s not the first time timezones have caused me havoc in production.
⚡️Apache Spark 4.0 released
Apache Spark 4.0 has been released. It’s the first major version update since Spark 3.0 in 2020.
Here’s some of the highlights I’m excited about:
- A new SQL pipe syntax. It seems to be a trend with modern SQL engines to include “pipe” syntax support now (e.g. BigQuery). I’m a fan of functional programming inspired design patterns and the excellent work by the prql team, so I’m glad to see this next evolution of SQL play out.
- A structured logging framework. Spark logs are notoriously lengthy and this means you can now use Spark to consume Spark logs! Coupled with improvements to stacktraces in PySpark, hopefully this will mean less
grep
ping tortuously long stack traces. - A new
DESCRIBE TABLE AS JSON
option. I really dislike unstructured command line outputs that you have to parse withawk
ward bashisms. JSON input/outputs and manipulation withjq
is a far more expressive consumption pattern that I feel captures the spirit of command line processing. - A new PySpark Plotting API! It’s interesting to see it supports plotly on the backend as an engine. I’ll be curious to see how this plays out going forward… Being able to do #BigData ETL as well as visualisation and analytics within the one tool is a very powerful combination.
- A new lightweight python-only Spark Connect PyPi package. Now that Spark Connect is getting more traction, it’s nice to be able to
pip install
Spark on small clients without having to ship massive jars around. - A bug fix for inaccurate Decimal arithmetic. This is interesting only insofar as it reminds me that even well-established, well-tested, correctness-first, open-source software with industry backing can still be subject to really nasty correctness bugs!
Databricks has some excellent coverage on the main release and the new pipe syntax specifically.
Answering the Unasked
I’m not sure exactly where this originated from, but I’m quite delighted by this exam question:
State some substantive question which you thought might appear on this exam, but did not. Answer this question (correctly).
As an interview question, I’ll sometimes ask: “Tell me something interesting you’ve discovered or learned recently.” I find its goes a long way to understanding the way the candidate thinks; how they convey technical knowledge to others; and to get a flavour for how real their passion and interest is for the domain.
Paper: Step-by-Step Diffusion: An Elementary Tutorial
I’ve been reading through Step-by-Step Diffusion: An Elementary Tutorial by Apple (arxiv). The mathematics underpinning diffusion language or image models is quite complex, but this walkthrough strikes a nice balance between concrete mathematical grounding and intuition.
View Transition Web API
The (relatively) new View Transition API is really slick! Simply adding the following CSS to my blog enabled same-document view transitions - no JavaScript required!
Go ahead and give it a try now! Simply click a link to another page on this site and you should observe a seamless transition occur.
@view-transition {
navigation: auto;
}
If you want to add even more pizzazz, you can declare CSS keyframe animations:
/* Create a custom animation */
@keyframes move-out {
from {
transform: translateX(0%);
}
to {
transform: translateX(-100%);
}
}
@keyframes move-in {
from {
transform: translateX(100%);
}
to {
transform: translateX(0%);
}
}
/* Apply the custom animation to the old and new page states */
::view-transition-old(root) {
animation: 0.4s ease-in both move-out;
}
::view-transition-new(root) {
animation: 0.4s ease-in both move-in;
}
For a blog like this there’s no real use, but for more complex web applications, the View Transition API is a really seamless way to integrate smooth transitions.
As of writing, it’s supported by the major browsers, excepting Firefox 😔.