Table of contents
This article will continue the Docker-related article series and show how to create an Ubuntu 20.04 LTS based Docker container for Machine Learning. We’ll install to the environment: Python 3, Jupyter, Keras, Tensorflow, TensorBoard, Pandas, Sklearn, Matplotlib, Seaborn, pyyaml, h5py. Prepare your personal ML environment in 3 minutes, excluding Docker image building time!
I’ve updated the container to Ubuntu 20.04 LTS base and speed up Docker build process. Now we’re not building OpenCV from source but installing it from
Environment setup is a common question when you start learning Machine Learning (ML). In this article, I’ll show you how to create your own Docker container, including the following frameworks for a comfortable start:
Those are the TOP 10 widely used Python frameworks for Data Science, and you’ll find most of them in any how-to article on the Internet. In the next article (How to build Python Data Science Docker container based on Anaconda), I’ll show how to build the same image on top of Anaconda distribution.
All you need to have is Docker and text editor installed on your system.
Here’s the final project structure:
$ tree -a python_data_science_container $ python_data_science_container ├── Dockerfile ├── conf │ └── .jupyter │ └── jupyter_notebook_config.py └── run_jupyter.sh 2 directories, 3 files
All you need to do is to create a project folder and filename
$ mkdir python_data_science_container $ cd python_data_science_container $ vim Dockerfile
After that, put the following content to the
FROM ubuntu:20.04 MAINTAINER "Andrei Maksimov" ENV DEBIAN_FRONTEND noninteractive RUN apt-get update && apt-get install -y \ libopencv-dev \ python3-pip \ python3-opencv && \ rm -rf /var/lib/apt/lists/* RUN pip3 install tensorflow && \ pip3 install numpy \ pandas \ sklearn \ matplotlib \ seaborn \ jupyter \ pyyaml \ h5py && \ pip3 install keras --no-deps && \ pip3 install opencv-python && \ pip3 install imutils RUN ["mkdir", "notebooks"] COPY conf/.jupyter /root/.jupyter COPY run_jupyter.sh / # Jupyter and Tensorboard ports EXPOSE 8888 6006 # Store notebooks in this mounted directory VOLUME /notebooks CMD ["/run_jupyter.sh"]
As soon as we declared our container and its components, it’s time to prepare a configuration for Jupyter. Create a file jupyter_notebook_config.py with the following content:
# get the config object c = get_config() # in-line figure when using Matplotlib c.IPKernelApp.pylab = 'inline' c.NotebookApp.ip = '*' c.NotebookApp.allow_remote_access = True # do not open a browser window by default when using notebooks c.NotebookApp.open_browser = False # No token. Always use jupyter over ssh tunnel c.NotebookApp.token = '' c.NotebookApp.notebook_dir = '/notebooks' # Allow to run Jupyter from root user inside Docker container c.NotebookApp.allow_root = True
As you can guess from
Dockerfile, we’ll put it in
/root/.jupyter/ folder during the container build process.
Creating startup script
The last thing we need to do is to create a script
run_jupyter.sh, which will launch the Jupiter server inside our container during its starting process. Create a with the following content:
#!/usr/bin/env bash jupyter notebook "[email protected]"
And make this file executable:
$ chmod +x run_jupyter.sh
This file will be launched inside your container by default each time you’ll start the new one.
Creating container image
The last stage – container creation. Just run the following command to build your Docker container from the project directory:
$ docker build -f Dockerfile -t python_data_science_container .
Docker will install all necessary libraries and frameworks inside your container image during the build process and make it available for use.
Now you have a working container, and it’s time to start it. Create a folder inside your project’s folder where we’ll store all our Jupyter Notebooks with the source code of our projects:
$ mkdir notebooks
And start the container with the following command:
$ docker run -it -p 8888:8888 \ -p 6006:6006 \ -d \ -v $(pwd)/notebooks:/notebooks \ python_data_science_container
It will start the container and expose Jupyter to port 8888 and Tensorflow Dashboard on port 6006 on your local computer or your server, depending on where you’re executed this command.
Please be aware that this container was created only for local development purposes. I removed authentication on Jupyter in this container, so everybody can connect to port 8888 or 6006 and execute the Python code of cause.
If you’re just looking for a working solution
If you don’t want to create and maintain your own container and aforesaid components will be sufficient for you, please feel free to use my personal container, that I usually update:
$ docker run -it -p 8888:8888 \ -p 6006:6006 \ -d \ -v $(pwd)/notebooks:/notebooks \ amaksimov/python_data_science
I hope this article will be helpful for you. If you like the article, please repost it using any social media you’d like. See you soon!
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?