# Developing an IoT Analytics System with MATLAB, Machine Learning, and ThingSpeak

By Robert S. Mawrey, MathWorks

The combination of smart connected devices with data analytics and machine learning is enabling a wide range of applications, from home-grown traffic monitors to sophisticated predictive maintenance systems and futuristic consumer products (like the Amazon Echo and the Google Nest).

While the potential of Internet of Things (IoT) is virtually limitless, designing IoT systems can seem daunting, requiring a complex web infrastructure and multidomain expertise.

This article shows how you can prototype and deploy an IoT system with data analytics without developing custom web software or servers. The workflow is based on MATLAB^{®} and ThingSpeak™, an analytic IoT platform that can run MATLAB code on demand in the cloud. To illustrate the workflow we’ll create a tidal forecasting system that boaters can use to predict the effect of wind on water levels.

Water depth varies with the tides, but it is also significantly influenced by the strength, duration, and direction of the wind. Forecasting wind-driven water levels typically requires sophisticated hydrodynamic models as well as detailed knowledge of the shape of the local bay and ocean floor. NOAA and other organizations use these resources to forecast water depth in major harbors, but minor ports and bays cannot justify the expense involved.

The system we’ll be developing is based on neural networks and low-cost hardware devices rather than computationally intensive hydrodynamic models and complex web infrastructure, providing economically feasible tidal forecasting to minor ports and bays.

The code used in this example is available for download.

## Tidal Wind Surge Forecasting Example

This example is based on data captured in a bay in Cape Cod, Massachusetts (Figure 1).

MATLAB code reads wind and tide data from ThingSpeak and other online data sources and runs tidal prediction and neural network algorithms that forecast the tide surge and generate an on-demand tide surge forecast plot (Figure 2).

We’ll complete the workflow for developing the system in these five steps:

- Collect historical data
- Analyze the data
- Develop predictive algorithms
- Deploy the analytics to the cloud
- Perform on-demand data analytics and visualization

## Step 1: Collecting Data

We begin by collecting historical tide-level data using ThingSpeak and a low-cost tide gauge. The tide gauge firmware is written in C code and uses open-source ThingSpeak communication libraries for embedded devices published on GitHub.

## Step 2: Analyzing Historical Data

We download the historical data from ThingSpeak to MATLAB running on the desktop, where we can inspect and clean the data to remove noise, outliers, missing or erroneous values, and other anomalies.

% Download the tide data using a custom function [tideTime,tideRangemm] = readAllMyTideData(); % Convert the range to water depth from the mud disttomud = 3449; %Measured distance from the gauge to the mud. depthFeet = (disttomud - tideRangemm) / 25.4 / 12; [tideTime,duptimeindex] = unique(tideTime); depthFeet = depthFeet(duptimeindex); depthFeetNoisy = depthFeet; % Remove outliers (could also use movmedian) depthFeet = hampel(depthFeet,121,2); depthFeet = hampel(depthFeet,11,1);

We plot the data to verify that the outliers and other anomalies have been removed (Figure 3).

Tide levels are measured relative to the *mean lower low water* (MLLW), or lowest level during the lunar day. We calculate LLW levels using the `findpeaks`

function in Signal Processing Toolbox™.

%Invert the data to use findpeaks to find the valleys negdepthFeet = -depthFeet; dur = duration(15,0,0); % Use findpeaks [lowerLowWater,lowerLowUTC] = findpeaks(negdepthFeet, tideTime,'MinPeakProminence',10/12,'MinPeakDistance',dur); lowerLowWater = -lowerLowWater; MLLWmud = mean(lowerLowWater); MLLW = 0; % Tide level in feet relative to MLLW feetMLLW = (depthFeet-MLLWmud);

We plot the lower low water values to inspect the results of the calculation (Figure 4).

We use the cleaned tidal data to forecast future tide levels with UTide, sophisticated tidal analysis functions available on File Exchange. Using UTide we can calculate the phase and amplitude of the many sinusoidal functions that form the tide at a particular location.

% Predict Tides. Downsample to avoid memory issues. sampledTide = feetMLLW(1:15:end); sampledTime = tideTime(1:15:end); MashpeeLatitude = 41.6483; % Calculate the tidal coefficients using UTide’s ut_solv function tideCoefWithSurge = ut_solv(datenum(sampledTime),sampledTide,[],... MashpeeLatitude,'auto','notrend','DiagnPlots'); % Use ut-reconstr to forecast the astronomical tides at the same times as the % measuredtides forecastedFeetMLLW = ut_reconstr(datenum(tideTime),tidecoefNoSurge); % Calculate the residual error residualFeetMLLW = feetMLLW - forecastedFeetMLLW; residualFeetMLLWhourly = interp1(datenum(tideTime),residualFeetMLLW,datenum(tideTimeHourly)); residualFeetMLLWhourly = fillgaps(residualFeetMLLWhourly);

Comparison of the astronomical forecast and the measured data reveals a residual error that varies with time (Figure 5).

We notice that the residual error is proportional to the strength of the wind: As the wind blows, the water is pushed in or out of the bay. When the wind blows hard from certain directions, it can raise or lower the water level by as much as a foot (Figure 6).

Clearly we need to take wind strength into account. In the next section we’ll describe how to forecast the wind’s effect on the water level.

## Step 3: Developing Predictive Algorithms

After trying both Fitting and NARX neural networks with various sets of input data, we determine that the NARX network works well for near-term predictions and the Fitting network is effective for both near- and long-term predictions.

Data inputs to the neural networks include historical and forecasted wind, measured tide levels, and forecasted astronomical tide levels. We design and test the neural networks using the Neural Net Fitting and Neural Net Time Series apps built into Neural Network Toolbox™.

% Use the readNOAABuoy function to read the data from NOAA buoy 44020 % readNOAABuoy(buoyNumber,years) [buoyTime,buoyWSPD,buoyWDIR,buoyPRES]=readNOAABuoy(44020,1); tideTimeHourly.TimeZone = 'UTC'; windSpeedResampled = interp1(buoyTime,buoyWSPD,tideTimeHourly,'linear','extrap'); windDirResampled = interp1(buoyTime,buoyWDIR,tideTimeHourly,'linear','extrap'); windSpeedSquared = windSpeedResampled.^2; % Prepare the data to create arrays to train a NARX neural network tideForecastHourly = ut_reconstr(datenum(tideTimeHourly),tidecoefNoSurge); stressN = (windSpeedResampled.^2).*cosd(windDirResampled); stressE = (windSpeedResampled.^2).*sind(windDirResampled); neuralInputNarx = [tideForecastHourly,stressN,stressE]; neuralOutputNarx = tideResampled - MLLWmud; neuralInputNarxTrain = neuralInputNarx(300:4950,:); neuralOutputNarxTrain = neuralOutputNarx(300:4950);

We use the neural network app to train the NARX network (Figure 7).

We use the Neural Time Series tool to train the NARX network and generate the following code:

% Solve an Autoregression Problem with External Input with a NARX Neural % Network % Script generated by Neural Time Series app X = tonndata(neuralInputNarxTrain,false,false); T = tonndata(neuralOutputNarxTrain,false,false); % Choose a Training Function trainFcn = 'trainlm'; % Levenberg-Marquardt backpropagation. % Create a Nonlinear Autoregressive Network with External Input inputDelays = 1:24; feedbackDelays = 1:24; hiddenLayerSize = 9; net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize,'open',trainFcn); % Choose Input and Feedback Pre/Post-Processing Functions net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'}; net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'}; % Prepare the Data for Training and Simulation [x,xi,ai,t] = preparets(net,X,{},T); % Setup Division of Data for Training, Validation, Testing net.divideFcn = 'dividerand'; % Divide data randomly net.divideMode = 'time'; % Divide up every sample net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; % Choose a Performance Function net.performFcn = 'mse'; % Mean Squared Error % Train the Network [net,tr] = train(net,x,t,xi,ai); netNarx = net;

We use the Neural Fitting app to similarly train a two-layer feed-forward network.

We evaluate the performance of the neural network during wind surge events by comparing the NARX network performance to the Fitting network for varying surge levels (Figure 8).

We see that the NARX multi-step ahead provides lower mean error than the Fitting network for predictions of moderate surge up to 12 hours ahead. For forecasts 24 or more hours ahead, the NARX network underperforms the Fitting network.

To forecast the tide levels with wind for the first 12 hours, we use the NARX network. Between hours 12 and 24 we linearly interpolate the results of the NARX and Fitting network, and after 24 hours we use the Fitting network.

We plot the results and see that the forecasts fit the actual levels well during wind-surge events (Figures 9 and 10).

## Step 4: Deploying Analytics to the Cloud

We write a MATLAB script using the Fitting and NARX neural network and deploy it to ThingSpeak to forecast and display measured and predicted tidal levels (Figure 11).

## Step 5: Perform On-demand Data Analytics and Visualization

MATLAB code generates the on-demand tide surge forecast plot (Figure 12).

We use MATLAB code to read the data from ThingSpeak using the `thingSpeakRead`

function and from the other sources using websave. We combine the data using the timetable function and run the neural networks to generate the forecast.

We have the option to make the plot publicly available or keep it private, and the plot can easily be embedded in a custom website.

## Summary

We have demonstrated prototyping and deploying sophisticated IoT analytics without web development or the deployment of web infrastructure. We found that the ability to use MATLAB functions and tools such as Neural Network Toolbox apps reduces the overall effort and makes it possible for a single engineer to implement a working IoT analytics system without specialized web development skills or highly specialized statistics and machine learning knowledge.

This IoT analytics workflow can easily be applied to other data analytics and machine learning applications such as predictive maintenance or power load forecasting.

Published 2016 - 93072v00