Quick Introduction to Machine Learning

Bashir Alam

Bashir Alam

0
(0)

Machine Learning (ML) is a rapidly evolving technology that automatically allows computers to learn from previous data. Machine learning employs a variety of algorithms to create mathematical models to make predictions based on past data or knowledge. The most common use cases for ML are image and video analysis, speech recognition, email filtering, recommending and forecasting systems, and many more. This article is a simple introduction to Machine Learning which covers the basic concepts about Machine learning, including different types of learning, frameworks for building ML systems, popular Python packages that you can use in the Machine Learning space. Finally, we’ll cover the broadest and deepest set of machine learning services provided to you by the AWS cloud. These services are putting Machine Learning in the hands of every developer, data scientist, and expert practitioner.

What is Machine Learning

Machine Learning (ML) is a branch of computer science that evolved from pattern recognition and computational learning theory in Artificial Intelligence (or simply AI).

Machine Learning can be defined in different ways. One of the ways to define Machin Learning is:

Field of study that provides computers the ability to learn without being explicitly taught

Arthur Samuel, an American pioneer in the fields of ML and AI

In other words, Machine Learning is a subfield of computer science that involves using statistical approaches to develop computer systems that can either automatically improve performance over time or spot patterns in large volumes of data that people would be unlikely to notice.

Learning algorithms work on the basis that strategies, algorithms, and inferences that worked well in the past are likely to continue working well in the future. These algorithms build a model based on sample data, known as training data (the data known from the past), to make predictions or decisions without being explicitly programmed to do so.

A subset of Machine Learning is closely related to computational statistics, which focuses on making predictions using computers, but not all ML is statistical learning.

While handling the basics, we need to cover several other terms related to Machine Learning:

  • Artificial Intelligence – is the broadest field of study which goal is to enable machines to behave like a human. By applying the idea of “rewarding” and “punishing” computers for correct or incorrect actions, people can teach machines how to play computer games or drive cars, for example.
  • Deep Learning – is the subclass of Machine Learning algorithms that uses multiple layers to progressively extract higher-level features from the raw data. Here we’re speaking about Neural Networks or Artificial Neural Networks (ANN) and trying to solve problems that are hard to solve by humans or develop algorithms for.

Here’s how Machine Learning is related to Artificial Intelligence and Deep Learning:

Introduction to machine learning using python- Machine learning

The mathematical part of Machine Learning deals with statistics, calculus, linear algebra, and probabilities.

When do we need to use Machine Learning

It is possible to program algorithms telling the machine how and which steps to execute to solve the required problem for simple tasks. For example, you may develop a program that analyzes the history of one stock to identify where the stock might go (up or down) and what value to expect to buy or sell the stock share. In that case, there’s no learning needed for the computer itself.

It might be challenging for a human to develop the needed algorithm for more advanced tasks. In practice, it can turn out to be more effective to help the machine develop its own algorithm, rather than having human programmers specify every single step. A good example of this is when several different stocks influence the stock’s values you’re writing a trading algorithm for. It is hard to understand how this influence is happening and describe it in the programming language. So, you can use Deep Learning algorithms to “learn” this for you and provide required stock values based on different other stocks prices.

Types of Machine Learning algorythms

A Machine Learning system learns from existing data, constructs prediction models, and predicts the result whenever fresh data is received. The more data you’re using to educate the model, the more precise predictions you’ll get.

Machine Learning uses three classes of algorithms:

  • Supervised learning – these algorithms build a mathematical model from the set of historical data that contains both inputs and desired outputs.
  • Unsupervised learning – algorithms that focus on identifying patterns in data sets for data that are neither classified nor labeled.
  • Reinforcement learning – focuses on algorithms that enable the machine to learn in an interactive environment by trials and errors using feedbacks from its own actions.
Introduction to machine learning using python- Machine learning category

Data mining vs Machine Learning

Data mining is the process of finding anomalies, patterns, and correlations within large data sets. It uses a broad range of techniques that allows you to use “mined” information to increase revenues, cut costs, improve customer relationships, reduce risks, and solve other business problems. Data mining intersects with Machine Learning. Here are some major differences:

Data miningMachine Learning
Allows semi-automatically extract knowledge and rules from a large amount of existing data that you can use for the decision-making process in the futureUsing existing data for the creation of new algorithms that can automatically make accurate decisions and predictions in the future
Introduced in 1930 as knowledge discovery in databasesIntroduced a couple of decades later
Extracts rules and patterns from the existing dataTeaches machines to learn and understand these rules
Uses traditional databases with unstructured dataUses existing data from any sources and algorithms
Involves manual human interferenceFully automated once the model design is implemented
Used in cluster analysisUsed by web search engines, spam filters, fraud detection, credit scoring, image recognition, stock price predictions, etc
More like data research or data analysis, which can rely on Machine LearningSelf-learning and self-trained system to do the intelligent task
You can apply it to a limited problem setYou can use it to solve a potentially unlimited set of problems.

Data mining frameworks for Machine Learning

Data mining is commonly used as a part of the Machine Learning process during data management, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating phases. Let’s look at three major data mining frameworks that you can use to get more insights from your data.

Knowledge Discovery Databases (KDD)

Knowledge Discovery in Databases (KDD) is discovering usable knowledge from a data set. This commonly used data mining technique involves data preparation and selection, data purification, adding prior knowledge into data sets and accurately interpreting the observed results. Here’s a visualization of KDD that shows the steps involved in this process.

Introduction to machine learning using python-Knowledge Discorvery Databases

Cross Industrial Standard Process for Data Minning (CRISP-DM)

The acronym CRISP-DM is commonly referred to Cross Industrial Standard Process for Data mining. The European Strategic Program on Research created it in Information Technology initiative to develop an unbiased, domain-independent technique. It works as a series of guidelines to assist you in planning, organizing, and implementing your data science (or machine learning) project. It’s a six-phase process model that accurately depicts the data science life cycle. The following diagram shows those six phases.

CRoss Industry Standard Process for Data Mining

Sample, Explore, Modify, Model and Assess (SEMMA)

The SAS Institute developed a standard data mining process called SEMMA. SEMMA stands for Sample, Explore, Modify, Model, and Assess, the actual five steps of the data mining process:

  • The Sample step entails choosing a subset of the appropriate volume dataset from a vast dataset that has been given for the model’s construction.
  • During the Explore step, univariate and multivariate analysis is conducted to study interconnected relationships between data elements and identify gaps in the data.
  • In Modify step, lessons learned in the exploration phase from the data collected in the sample phase are derived with business logic.
  • After the variables have been refined and the data has been cleaned, the Model step uses several data mining techniques to create a projected model of how this data leads to the process’s final, desired output.
  • In the final Assess stage, the model is evaluated for its usefulness and reliability for the studied topic.
Introduction to Machine Learning - SEMMA

Python in Machine Learning

Python is very popular in the ML space because it is a simple language to learn and develop on. It allows Machine Learning engineers to validate their ideas quickly. Another reason why Python is the preferred language for ML is because it has an extensive modules library. Python modules make it easier for data scientists to conduct numerous studies which rely largely on mathematical optimization, probability, and statistics. So before taking any course on Machine learning with Python, make sure that you have Python installed on your system.

The following are some of the main reasons Python is preferred for Machine learning:

  • It’s simple to use and enables for quick data evaluation.
  • It has a great library ecosystem. Python libraries make it easier for data scientists to conduct numerous studies because machine learning relies largely on mathematical optimization, probability, and statistics.
  •  Python is easy to learn.
  • It is very flexible. Python can be used in conjunction with other programming languages by developers to achieve their objectives. The source code does not need to be recompiled.
  • It is versatile. Python for machine learning can run literally on any platform, including Windows, macOS, Linux, Unix, and many, many more
  • It is easy to read. When a change in the code is required, Python developers can easily implement, duplicate, or distribute it.
  • Python is quickly becoming the most widely used programming language on the planet.
  • Python has a large community of supporters, and it’s good to know that if you have an issue, someone will be able to assist you.

There are a variety of open-source libraries available to make Machine Learning more realistic. These are commonly referred to as scientific Python libraries and are used to execute basic machine learning tasks. Based on their usage/purpose, we can separate these libraries into data analysis and core machine learning libraries at a high level.

Data analysis packages in Python

Some of the popular data analysis packages in Python are:

  • NumPy – a Python module that offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more
  • Pandas – a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python
  • Scipy – well-known Python module that provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, etc
  • Matplotlib – a comprehensive library for creating static, animated, and interactive visualizations in Python
  • Seaborn – a Python data visualization library based on Matplotlib

Check out our How to build Anaconda Python Data Science Docker container article to see what other libraries you may be interested in.

These packages give us the mathematic and scientific capabilities we need to execute data preprocessing, transformation, and visualization.

Core Machine Learning packages in Python

Some of the important and popular Machine Learning packages are:

  • Scikit-learn – is a free software machine learning library for the Python
  • PyTorch – An open source machine learning framework that accelerates the path from research prototyping to production deployment
  • MXNet – is a fast and scalable training and inference framework with an easy-to-use, concise API for machine learning and artificial intelligence
  • Keras – an open-source software library that provides a Python interface for artificial neural networks
  • TensorFlow – an end-to-end open source platform for machine learning

Scikit-learn is undoubtedly Python’s most helpful machine learning library. Classification, regression, clustering, and dimensionality reduction are just a few of the useful capabilities in the sklearn toolkit for machine learning and statistical modeling. These packages provide us with all the necessary machine learning techniques and functionalities for extracting patterns from a given dataset. Keras is a neural network library, while TensorFlow is an open-source machine learning framework that you may use for various tasks. TensorFlow has high-level and low-level APIs, whereas Keras solely has high-level APIs.

Machine Learning in AWS cloud

AWS cloud provides many cloud-based services and frameworks that allow developers and engineers of all skill levels to leverage Machine Learning technologies. You can split these services into three major categories:

  • Frameworks and Infrastructure – this category provides you low level ML frameworks and cloud infrastructure for ML tasks
  • SageMaker Studio IDE – is a fully managed Machine Learning service that assists us in developing advanced machine learning models. Data scientists and developers can use SageMaker to create and train machine learning models, compare ML models’ performance, and then deploy the best working model into a production-ready hosted environment.
  • AI Services – these are high-level services based on AWS ML experience that help you to solve your specific business problems without any knowledge in ML. For example, you don’t have to train your own ML model to use Amazon Rekognition, to analyze images or videos and identify objects, people, text, scenes, and activities from them.
Introduction to machine learning-AWS ML stack

Basic concepts of Amazon Machine Learning

The AWS cloud service has the following key Machine Learning concepts:

  • Datasources stores metadata about the data that we’ve provided to ML service. Amazon ML analyzes your input data, computes descriptive statistics on its properties, and saves the statistics, schema and other information, as part of the Datasource object. Amazon ML then uses the Datasource to train and test an ML model and generate batch predictions
  • ML models – generate predictions using the patterns extracted from the input data using a mathematical models such as: binary classification, multiclass classification, regression, etc.
  • Evaluations – as soon as the model is ready, you can measure the quality of ML model or compare its performance with other models based on other ML algorythms
  • Batch Predictions asynchronously generate predictions for multiple input data observations
  • Real-time Predictions synchronously generate predictions for individual data observations

Summary

Machine learning is a type of data analysis that automates the creation of analytical models. It’s a field of artificial intelligence based on the premise that computers can learn from data, recognize patterns, and make judgments with little or no human input. In this article, we’ve introduced Machine Learning concepts, Python frameworks, and AWS ML services that you can start using to incorporate Machine Learning and Deep learning features into your applications and services.

How useful was this post?

Click on a star to rate it!

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest

Want to be an author of another post?

We’re looking for skilled technical authors for our blog!

Leave a comment

If you’d like to ask a question about the code or piece of configuration, feel free to use https://codeshare.io/ or a similar tool as Facebook comments are breaking code formatting.