An Afternoon Reading the Dictionary
I’m a heavy user of Apple’s Look Up for words on macOS. With a quick force click you can instantly summon the dictionary or thesaurus for a selected word, which removes the friction of pursuing “hmm, that’s a new word - what does it mean?”
I’ve been collecting new words I discover in an Apple Note for years and recently started migrating my list to a living blog page. Rather than manually collecting the macOS built-in Oxford Dictionary definition for each word (of which I have hundreds), I thought I’d script it. It turns out that Oxford Dictionaries has a very neat API, but unfortunately it’s exorbitantly expensive.
Poking around some more, I noticed an Apple Shortcut called Show definition of
, but it just opens Dictionary.app
with a deep link to the word entry.
But after some searching, I uncovered the path to the OED dictionary database on my Mac and a wonderfully detailed blog post and GitHub repo explicating how to extract the entries from the encoded dictionary files.
After munging around with XPath queries on the resulting dictionary.xml
file, I settled on the following xmlstarlet
snippet that extracts out the definition and example for each headword in new-word-list.txt
and formats it as a YAML list:
# Construct an XPath filter expression for each headword
xpath_expr="//d:entry["
first=1
while read word; do
if [ $first -eq 1 ]; then
xpath_expr="${xpath_expr}@d:title='${word}'"
first=0
else
xpath_expr="${xpath_expr} or @d:title='${word}'"
fi
done < new-word-list.txt
xpath_expr="${xpath_expr}]"
# and extract the definitions and examples for each headword
xmlstarlet sel -N d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" \
-t \
-m "$xpath_expr//span[starts-with(@class, 'msDict') and .//span[@class='df']]" \
-o "- word: \"" \
-v "ancestor::d:entry/@d:title" \
-o "\"" -n \
-o " definition: \"" \
-v "normalize-space(.//span[@class='df'])" \
-o "\"" -n \
-o " example: \"" \
-v "normalize-space(.//span[@class='eg']//span[@class='ex'])" \
-o "\"" -n \
-n \
dictionary.xml > vocabulary.yaml
This YAML definition list gets popped into a data/vocabulary.yaml
file and a layouts/_shortcodes/vocabulary.html
Hugo shortcode constructs the entries in the blog post at build time. From here on, if I want to add a new word it’s trivial to simply update the vocabulary list manually with the word and definition.
I probably spent more time working out how to automate this process (relevant XKCD) than it would’ve taken to do it manually, but I had good fun learning how to read the dictionary this afternoon.
An aside on copyright and fair use: I went down an interesting little rabbit hole understanding how copyright and dictionaries relate. I am most definitely not a lawyer, but my understanding is that, whilst you cannot copyright the definition of a word, the broader composition of headwords with word senses, parts of speech, IPA transcription, examples, and other components commonly found in a dictionary is certainly copyrightable. However, considering I am replicating a vanishingly small proportion of the dictionary, and I am attributing appropriately, I believe this usage satisfies fair use.