Predictive analytics uses historical data to predict future events. Typically, historical data is used to build a mathematical model that captures important trends. That predictive model is then used on current data to predict what will happen next, or to suggest actions to take for optimal outcomes.
Predictive analytics has received a lot of attention in recent years due to advances in supporting technology, particularly in the areas of big data and machine learning.
Predictive analytics is often discussed in the context of big data, Engineering data, for example, comes from sensors, instruments, and connected systems out in the world. Business system data at a company might include transaction data, sales results, customer complaints, and marketing information. Increasingly, businesses make data-driven decisions based on this valuable trove of information.
With increased competition, businesses seek an edge in bringing products and services to crowded markets. Data-driven predictive models can help companies solve long-standing problems in new ways.
Equipment manufacturers, for example, can find it hard to innovate in hardware alone. Product developers can add predictive capabilities to existing solutions to increase value to the customer. Using predictive analytics for equipment maintenance, or predictive maintenance, can anticipate equipment failures, forecast energy needs, and reduce operating costs. For example, sensors that measure vibrations in automotive parts can signal the need for maintenance before the vehicle fails on the road.
Companies also use predictive analytics to create more accurate forecasts, such as forecasting the demand for electricity on the electrical grid. These forecasts enable resource planning (for example, scheduling of various power plants), to be done more effectively.
To extract value from big data, businesses apply algorithms to large data sets using tools such as Hadoop and Spark. The data sources might consist of transactional databases, equipment log files, images, video, audio, sensor, or other types of data. Innovation often comes from combining data from several sources.
With all this data, tools are necessary to extract insights and trends. Machine learning techniques are used to find patterns in data and to build models that predict future outcomes. A variety of machine learning algorithms are available, including linear and nonlinear regression, neural networks, support vector machines, decision trees, and other algorithms.
Predictive analytics helps teams in industries as diverse as finance, healthcare, pharmaceuticals, automotive, aerospace, and manufacturing.
Predictive analytics is the process of using data analytics to make predictions based on data. This process uses data along with analysis, statistics, and machine learning techniques to create a predictive model for forecasting future events.
The term “predictive analytics” describes the application of a statistical or machine learning technique to create a quantitative prediction about the future. Frequently, supervised machine learning techniques are used to predict a future value (How long can this machine run before requiring maintenance?) or to estimate a probability (How likely is this customer to default on a loan?).
Predictive analytics starts with a business goal: to use data to reduce waste, save time, or cut costs. The process harnesses heterogeneous, often massive, data sets into models that can generate clear, actionable outcomes to support achieving that goal, such as less material waste, less stocked inventory, and manufactured product that meets specifications.
We are all familiar with predictive models for weather forecasting. A vital industry application of predictive models relates to energy load forecasting to predict energy demand. In this case, energy producers, grid operators, and traders need accurate forecasts of energy load to make decisions for managing loads in the electric grid. Vast amounts of data are available, and using predictive analytics, grid operators can turn this information into actionable insights.
Typically, the workflow for a predictive analytics application follows these basic steps:
Your aggregated data tells a complex story. To extract the insights it holds, you need an accurate predictive model.
Predictive modeling uses mathematical and computational methods to predict an event or outcome. These models forecast an outcome at some future state or time based upon changes to the model inputs. Using an iterative process, you develop the model using a training data set and then test and validate it to determine its accuracy for making predictions. You can try out different machine learning approaches to find the most effective model.
Organizations that have successfully implemented predictive analytics see prescriptive analytics as the next frontier. Predictive analytics creates an estimate of what will happen next; prescriptive analytics tells you how to react in the best way possible given the prediction.
Prescriptive analytics is a branch of data analytics that uses predictive models to suggest actions to take for optimal outcomes. Prescriptive analytics relies on optimization and rules-based techniques for decision making. Forecasting the load on the electric grid over the next 24 hours is an example of predictive analytics, whereas deciding how to operate power plants based on this forecast represents prescriptive analytics.
Companies are finding innovative ways to apply predictive analytics using MATLAB® to create new products and services, and to solve long-standing problems in new ways.
These examples illustrate predictive analytics in action:
Baker Hughes trucks are equipped with positive displacement pumps that inject a mixture of water and sand deep into drilled wells. With pumps accounting for about $100,000 of the $1.5 million total cost of the truck, Baker Hughes needed to determine when a pump was about to fail. They processed and analyzed up to a terabyte of data collected at 50,000 samples per second from sensors installed on 10 trucks operating in the field, and trained a neural network to use sensor data to predict pump failures. The software is expected to reduce maintenance costs by 30–40%—or more than $10 million.
Heating, ventilation, and air-conditioning (HVAC) systems in large-scale commercial buildings are often inefficient because they do not take into account changing weather patterns, variable energy costs, or the building’s thermal properties. Building IQ’s cloud-based software platform uses advanced algorithms to continuously process gigabytes of information from power meters, thermometers, and HVAC pressure sensors. Machine learning is used to segment data and determine the relative contributions of gas, electric, steam, and solar power to heating and cooling processes. Optimization is used to determine the best schedule for heating and cooling each building throughout the day. The Building IQ platform reduces HVAC energy consumption in large-scale commercial buildings by 10–25% during normal operation.
False alarms from electrocardiographs and other patient monitoring devices are a serious problem in intensive care units (ICUs). Noise from false alarms disturbs patients’ sleep, and frequent false alarms desensitize clinical staff to genuine warnings. Competitors in the PhysioNet/Computing in Cardiology Challenge were tasked with developing algorithms that could distinguish between true and false alarms in signals recorded by ICU monitoring devices. Czech Academy of Sciences researchers won first place in the real-time category of the challenge with MATLAB algorithms that can detect QRS complexes, distinguish between normal and ventricular heartbeats, and filter out false QRS complexes caused by cardiac pacemaker stimuli. The algorithms produced a true positive rate (TPR) and true negative rate (TNR) of 92% and 88%, respectively.
To unlock the value of business and engineering data to make informed decisions, teams developing predictive analytics applications increasingly turn to MATLAB.
Using MATLAB tools and functions, you can perform predictive analytics with engineering, scientific, and field data, as well as business and transactional data. With MATLAB, you can deploy predictive applications to large-scale production systems, and embedded systems.
In this simplified view, engineering data arrives from sensors, instruments, and connected systems out in the world. The data is collected and stored in a file system either in-house or in the cloud.
“No matter what industry our client is in, and no matter what data they ask us to analyze—text, audio, images, or video—MATLAB code enables us to provide clear results faster.”
— Dr. G. Subrahamanya VRK Roo, Cognizant
This data is combined with data sourced from traditional business systems such as cost data, sales results, customer complaints, and marketing information.
After this, the analytics are developed by an engineer or domain expert using MATLAB. Preprocessing is almost always required to deal with missing data, outliers, or other unforeseen data quality issues. Following that, analytics methods such as statistics and machine learning are used to produce an “analytic”–a predictive model of your system.
To be useful, that predictive model is then deployed—either in a production IT environment feeding a real-time transactional or IT system such as an e-commerce site or to an embedded device—a sensor, a controller, or a smart system in the real-world such as an autonomous vehicle.
Applying MATLAB and Simulink® as part of this architecture is ideal, because the tools enable easy deployment paths to embedded systems with Model-Based Design, or to IT systems with application deployment products.
“MATLAB has helped accelerate our R&D and deployment with its robust numerical algorithms, extensive visualization and analytics tools, reliable optimization routines, support for object-oriented programming, and ability to run in the cloud with our production Java applications.”
— Borislav Savkovic, lead data scientist, BuildingIQ