My career change from UX to AI
A 10 year journey focused on one question: how could I make a machine that thinks?
I started learning to write code when I was 12 but by the time I finished high school I decided I didn’t want to be a programmer. I liked the problem solving but the actual process of architecting, writing, debugging, and testing code felt like a chore. I would occasionally write code for a personal project and in 2012 I built a start-up with some friends. Part of that process involved designing what the screens would do and while I wasn’t the main designer I did as much design as code.
I remember the first time I watched someone use our app and became frustrated at how they were doing it wrong. Of course it wouldn’t work right if you tried to do it that way. That’s when I learned about UX. I liked the process of working out what to build and how to build it in a way that humans would intuitively understand. From this experience I got my first UX job in 2013 – technically I was a “Digital Producer” with half my time managing projects and the other half doing UX design.
Meanwhile (for the umpteenth time) I became interested in the nature of consciousness. During a particularly poignant meditation session I realised that, from a first person perspective, thoughts are no different to our other senses. Thoughts pop into conscious perception on their own. And that feeling that you are authoring thoughts is itself a blip on the radar of consciousness. I wondered how much we understood about how the brain generates thoughts and how much of that could be transferred to a machine. I asked myself a question that drives me to this day: what would it take to design a computer that could think?
In my spare time I learned what I could about the brain. An amazing little site called The Brain From Top To Bottom showed me exactly what I was looking for. An understanding of how the brain worked from the molecular level up to the psychological. As I learned about the brain I started writing code to simulate hand crafted neural networks. My code involved independent nodes on individual threads exchanging messages with each other in real time.
I picked up computer science and relevant math as I went. I had been forced to learn some things already – e.g. that gradually increasing the size of an array or indexing a linked list were slow operations. I had taught myself computational complexity so that I wouldn’t make mistakes like that on my personal projects. Other things, like calculus and linear algebra, I had to go out of my way to learn. There are so many free and accessible sources for learning math online that the biggest problem is choosing the one that’s most suited to your needs. Self-learning math was helpful but early on I overlooked linear algebra as “not useful”. I didn’t notice that my message passing approach could be massively sped up with matrix multiplication and it would have helped to learn it earlier on.
By 2015 I decided that my little hobby was becoming an obsession. I finally looked into the field of Artificial Intelligence and saw a lot of what I was looking for. That’s also when I realised the price I would have to pay to actually contribute to this field: I would have to study. A lot. I caught up on all the maths I’d forgotten since high school and then some. I filled in my knowledge of computer science. I taught myself data science. This time I neglected statistics – I couldn’t see myself ever estimating population statistics and then would I need anything more than a t-test? Yet statistics comes up in surprising ways even in deep learning.
In 2016 I felt I’d gained enough background knowledge to dive into neural networks (what was starting to be called Deep Learning). If I thought I’d been obsessed before then this pulled me in deeper. I discovered that LSTM networks used for Natural Language Processing were almost identical to an idea I came up with for message passing networks. This is when I became particularly interested in language modelling. I thought that the path to general AI was through generating human sounding text. Side note: not that making human sounding text requires intelligence, just that this was rhetorical path towards AGI.
Meanwhile in 2016 I was developing my UX skills to the point where I was earning good money as a contractor / consultant. I was passionate about UX as a craft, but I became interested in AI as a career. What I didn’t want was to take a giant pay cut to be a junior in some low-level data analysis role. I also didn’t want to do further formal study as I felt that I’d come really far on my own without the hefty price tag of a formal education.
I came up with a plan: slowly transition my role from UX to Data Science. I changed my focus from qualitative research to quantitative research. If my clients wanted to understand customers I recommended analytics and quantitative studies. I requested customer data from every client. I used Natural Language Processing techniques to analyse unstructured data. I learned more SQL so I could query data myself. I did everything I could to turn my UX focus towards Data Science.
In the background I took a detour down a less-useful path. I was convinced that long-range dependencies couldn’t be solved with gradient descent. Vanishing gradients would prevent a model from ever learning to pay attention to information in the distant past. So I became focused on genetic algorithms and how I could evolve the architecture and weights of a network. It’s an idea I still think has merit but the advances of deep learning kept pushing gradient descent to its limits. In retrospect I should have gone into reinforcement learning instead of genetic algorithms but I think the bet I made at the time made sense.
In 2020 I had feedback from potential employers: I didn’t have the right “background” for a Data Science role. By which they meant that I didn’t have an appropriate degree. I found ways to include machine learning in my projects. I used NLP tools to analyse unstructured text data. I had another go at a start-up and built machine learning features into it. And through all of that I was still told I was unqualified. With the COVID lockdowns I decided to get a Masters in Data Science and found a suitable program I could do online.
I won’t say the entire degree was an expensive waste of time. Although a couple of my classmates did a “speed run” of one course which made all assessments available in week 1 and we raced to see who could finish the fastest. (Brag: I won for speed. Less brag: I only scored 98% when one of my classmates got 100%).
I did learn some interesting things I wouldn’t have taken the time to study otherwise – Game Theory and Bayesian Inference. And I have since made extensive use of Monte Carlo simulations.
There were two standout courses that I took. The first was Neural Networks which seemed so basic that I almost cruised through it without paying much attention. Among the many interesting exercises was one that had us visualise the hidden states of a network on 2D data. I learned at a fundamental level what the matrix multiply + non-linearity were actually doing. I learned some fundamental ideas about width vs depth and the effects of different activation functions. That “basic” class helped me build such a strong intuition that I still think about it when designing neural networks.
The second standout course was Inferential Statistics. We were forced to prove, in intricate mathematical detail, WHY different statistical tests even worked. It was hard. It challenged many concepts I thought I already understood. For instance, in deep learning we often hear if the bias-variance trade-off along with the cute illustration below. Except that picture doesn’t accurately cover the underlying mathematics. Nor does it properly explain what variance really means (it misses the fundamental aspects of variance as a function of the sample data).
Source: http://scott.fortmann-roe.com/docs/BiasVariance.html
In 2022 I was working in a largely strategic role as a contractor and my boss said “there’s always work for you here. As long as I’m here your contract will be extended.” Once I finished my degree I went from regularly using data to formally becoming a Data Scientist for the team. I had a lot of liberty to explore what I thought could improve the customer experience through use of data.
For a personal project I was writing a Python library to switch out gradient descent for different kinds of generic algorithms. And I was experimenting with fine tuning GPT-2 and learning about what would one day be called prompt engineering. I was happy with my job and could pursue AI research in my spare time.
The I stumbled upon a job listing from a start-up actually using Deep Learning for Natural Language Processing. In Australia. I had become certain that very few organisations in my country were doing that kind of work. At first I was reluctant to apply because I thought they would want someone with lots of professional experience in Deep Learning. My professional experience was largely with Data Science, with tiny bits of deep learning on unstructured customer feedback. All my AI work was for personal projects and largely in an unfinished state. On a whim I applied for the job and had a few conversations with the team.
After a decade of independent AI research I was excited to talk to someone using modern deep learning techniques to solve real problems. And it turned out they really valued my knowledge and expertise. If anything the fact that I had learned most of these things in my spare time showed them how strong my passion was.
Now to some people document automation probably doesn’t sound interesting. When people think of AI they think of chat-bots and self driving cars. But every day I work at a problem that’s surprisingly intricate: humans design these documents and fill them with information. At a glance we can look at a document and understand what that information is and how it’s structured. What does it take to make a machine to have that same level of understanding?
In what seemed like a couple of years I went from UX Consultant to Lead AI Engineer. How did that happen? Over 10 year of study, tinkering, a plan to gradually transition my career, and eventually some serendipity. I had the good luck to come by like minded people looking for exactly my skills at the right time. In the end my story isn’t that different to anyone else’s… a combination of hard work, patience, and luck.
