Adopting Colemak
A year ago, I reflected that my typing accuracy was lower than I wanted. I’d learnt to type autodidactically decades prior and had developed an idiosyncratic technique. My speed was satisfactory, but my consistency wasn’t, and I had to make regular corrections. Furthermore, I used my right hand to press keys on the left side of the keyboard, which meant my hands moved from side to side, which was tiring during long typing sessions.
Continue reading →Visibility and monitoring in deployed ML systems
Modern machine learning has ushered in an era of unparalleled system capabilities, exemplified by self-driving cars and synthetic speech indistinguishable from human. However, these techniques bring with them the challenge of monitoring and understanding the behaviour of live ML systems.
Continue reading →Viewing Jupyter notebooks at the command line
The Jupyter notebook is a literate programming environment that has become ubiquitous in machine learning. While the standard tools for interacting with notebooks are web applications, it’s often useful to be able to view notebooks at the command line. This is convenient when logged into a training workstation via SSH, and the process of configuring SSH to forward a port, starting a Jupyter server, and navigating to it in a web browser is a chore to view a notebook for a few seconds.
Continue reading →Representation learning for audio data
The application of classical machine learning methods on complex data formats, such as audio of human speech, typically necessitates extensive feature engineering. This requires significant domain knowledge to extract the key components of the data.
Deep learning can allow models to learn their data representations, obviating the need for feature engineering. However, as the quality of the learned representations strongly influences performance on downstream tasks, how can we ensure that these representations are appropriate?
Continue reading →Jupyter notebooks and collaboration
The adoption of Git as the primary means of collaborating on code, and Jupyter notebooks as the standard environment for data exploration and interactive modelling, is widespread. However, a problem arises in that Git was designed to version plain text files, such as those containing source code, and not structured data like JSON documents or binary data such as images embedded in Jupyter notebooks.
Continue reading →