set up high end AI environments with Azure Data Science VMs


Machine Learning and AI models, particularly neural networks, require huge amounts of computer power, particularly when the model is being "trained". This is because when a neural network is being trained, it runs tens of millions of calculations on a large amount of observations (e.g. 60,000 images) over a number of cycles, say 100. The end result is often billions of individual calculations. 

These calculations run most efficiently on the Graphical Processing Unit (GPU) of a computer, as opposed to the CPU. Out of the box, Keras and Tensorflow will run on the CPU of a PC, rather than the GPU. Even a very high end PC running Keras on the CPU will take more than ten hours to process a cutting edge model with sufficient data, making development too time consuming to be practical. 

However, whilst a high end PC or laptop will have a graphics card with a GPU on it, these aren't the best GPUs for Machine Learning and AI. And whilst it is possible to purchase and install a GPU that is purpose built for AI, configuring Keras and Tensorflow to run on the GPU is quite a time consuming and complicated task. Then when one of the component parts (CUDA, Tensorflow, Keras, Python, Jupyter etc) undergoes a version change the configuration can have to be redone.

The alternative is to use a cloud provider such as Amazon or Microsoft who have Virtual Machine (VM) templates available, where the processor is a high end GPU, purpose built for AI, and software such as CUDA, Python, Jupyter Notebooks, Keras, Tensorflow and Pytorch are already installed and configured. It means that you can have a cutting edge AI environment spun up in 5 minutes.

Microsoft have created the Azure Data Science Virtual Machines specifically for this purpose. I chose the Ubuntu V18 template. Ubuntu, being Linux, means that when you aren't using the VM you can switch it off and there is no cost. Also, it used to be the case that Tensorflow ran faster on Linux than it did on Windows. I haven't tested to see whether it's still the case.

The Ubuntu template worked out of the box after a slight hitch. The first time I provided a user name which contained a capital letter, and this created an error in the Jupyter Notebook installation. To get around the problem rather than fix it I simply deleted the first VM and all the other components, and ordered another one using a user name without capitals. Another 5 minutes later I was connected to and using the environment.

The machines come with Jupyter Notebooks installed and ready to go. Therefore you can just connect to the machines by https://:8000 and the Jupyter screen will come up.

The main problem I have found is that, depending on your Service Plan, Azure will sometimes disconnect the VM, due to capacity restrictions, even when the VM is in use. Therefore, after uploading data or training a large model, make sure you back up the VM.

All the best

Latest posts