I’ve long been interested in Artifical Intelligence, going back to David M. Skapura’s 1996 book, https://www.amazon.com/dp/B002R47XAW (Amazon helpfully tells me I bought the book a few days after I finished high school). I read it, but with no code samples, I was too inexperienced to turn the algorithms and theory into anything useful. When I started university a few months later, I didn’t pursue the mystical arts of the neural network.
OpenAI’s release of their ChatGPT software woke the world up to a revolution that has been going on in the small machine learning field over the past decade, and has spawned a new fad: everything needs to have AI!
So I dug up the PyTorch documentation and sourced a large, unused dataset, and wrote an AI! (Let’s leave aside for now the terminology quibbles: artificial intelligence, machine learning, natural language processing, and so on. In this case, I was doing natural language processing with TorchText.) I spent some time working through the theory, processing the data correctly, building a dataset that processed the input in the correct ways, and batching the data.
After all that work, the actual neural network was pretty simple, an RNN running LSTM. With bated breath, I ran the system: it appeared to be working! The loss was going down!
But there was a wrinkle: it wasn’t actually producing usable results. I tried a new neural network model, with the same results. I rewrote the dataset building code; the batching code; added new layers; tweaked batch sizes and weights and learning rates and numbers of layers. No dice.
So I went back to the drawing board and started with the step I should probably have done first in any case: used someone else’s (acclaimed) teaching code. But what I discovered was the same thing: their loss went down, but the results were no better than random chance.
Faced with results like this, it’s easy to start wondering if the secret to OpenAI’s success might not simply be a warehouse of underpaid teenagers. Because even basic machine learning is clearly not easy, to say nothing of generative AI. You’re reliant on time and money: expensive, limited GPUs, and you need days or even weeks worth of computing to evaluate even small changes in your model.
So when I see small new companies offering “AI for X” (where X is anything you can charge money for, from healthcare to writing), I wonder to myself: do investors know this about AI? They’re putting in the money, but do they have the patience to wait for results?
When I see existing companies bolting on AI, changing job descriptions to require experience in AI (“10 years experience in AI” or “PhD in Machine Learning”, yeah, those are common attributes), I wonder to myself: is the trajectory of this job not just increasing levels of pressure from high up to keep up with a fad that doesn’t really relate to the core business, ending in inevitable failure: the AI department gets shut down, and you get fired.
If I were a company wanting to explore AI, unless I had money and time to spare, I’d just buy someone else’s solution.