Main Content

Test a Deep Neural Network with Captured Data to Detect WLAN Router Impersonation

This example shows how to train a radio frequency (RF) fingerprinting convolutional neural network (CNN) with captured data. You capture wireless local area network (WLAN) beacon frames from real routers using a software defined radio (SDR). You program a second SDR to transmit unknown beacon frames and capture them. You train the CNN using these captured signals. You then program a software-defined radio (SDR) as a router impersonator that transmits beacon signals with the media access control (MAC) address of one of the known routers and use the CNN to identify it as an impersonator.

For more information on router impersonation and validation of the network design with simulated data, see the Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation example.

Train with Captured Data

Collect a dataset of 802.11a/g/n/ac OFDM non-high throughput (non-HT) beacon frames from real WLAN routers. As described in the Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation example, only the legacy long training field (L-LTF) field present in preambles are used as training units in order to avoid any data dependency.

In this example, the data was collected using the scenario depicted in the following figure. The observer is a stationary ADALM-PLUTO radio. Known router data was collected as follows:

  1. Set the observer's center frequency based on the WLAN channel used by the routers

  2. Receive a beacon frame

  3. Extract the L-LTF signal

  4. Decode the MAC address to use as the label

  5. Save the L-LTF signal together with its label

  6. Repeat steps 2-5 to collect numFramesPerRouter frames from numKnownRouters routers.

Unknown router beacon frames are simulated using a mobile ADALM-PLUTO radio as a transmitter. This radio repeatedly transmits beacon frames with a random MAC address. Unknown router data was collected as follows:

  1. Generate beacon frames with a random MAC address

  2. Start transmitting the beacon frames repeatedly using the ADALM-PLUTO radio

  3. Collect NUMFRAMES beacon frames

  4. Extract the L-LTF signal

  5. Save the L-LTF frames with label "Unknown"

  6. Move the radio to another location

  7. Repeat steps 3-6 to collect data from NUMLOC locations

This combined dataset of known and unknown routers is used to train the same DL model as in the Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation example.

This example downloads training data and trained network from https://www.mathworks.com/supportfiles/spc/RFFingerprinting/RFFingerprintingCapturedData.tar.gz. If you do not have an Internet connection, you can download the file manually on a computer that is connected to the Internet and save to the same directory as the current example files. For privacy reasons, MAC addresses have been anonymized in the downloaded data. To replicate the results of this example, capture your own data as described in Appendix: Known and Unknown Router Data Collection.

rfFingerprintingDownloadData ('captured')
Starting download of data files from:
	https://www.mathworks.com/supportfiles/spc/RFFingerprinting/RFFingerprintingCapturedData.tar.gz
Download and extracting files done

To run this example quickly, use the downloaded pretrained network. To train the network on your computer, choose the "Train network now" option (i.e. set trainNow to true). Training this network takes about 5 minutes with an Nvidia(R) Titan Xp GPU. Training on a CPU may result in a very long training duration.

trainNow = false;  %#ok<*UNRCH> 

This example uses data from four known routers. The dataset contains 3600 frames per router, where 90% is used as training frames and 10% is used as test frames.

numKnownRouters = 4;
numFramesPerRouter = 3600;
numTrainingFramesPerRouter = numFramesPerRouter * 0.9;
numTestFramesPerRouter = numFramesPerRouter * 0.1;
frameLength = 160;

Preprocess Known and Unknown Router Data

Separate collected complex baseband data into its in-phase and quadrature components and reshape it into a 2 x frameLength x 1 x numFramesPerRouter*numKnownRouters matrix. Repeat the same process for the unknown router data. The following code uses previously collected and pre-processed data. To use your own data, first collect data as described in Appendix: Known and Unknown Router Data Collection. Copy the new data files named rfFingerprintingCapturedDataUser.mat and rfFingerprintingCapturedUnknownFramesUser.mat to the same directory as this example. Then update the load commands to load these files.

if trainNow
  % Load known router data
  load('rfFingerprintingCapturedData.mat')
  
  % Create label vectors
  yTrain = repelem(MACAddresses, numTrainingFramesPerRouter);
  yTest = repelem(MACAddresses, numTestFramesPerRouter);
  
  % Separate between I and Q
  numTrainingSamples = numTrainingFramesPerRouter*numKnownRouters*frameLength;
  xTrainingFrames = xTrainingFrames(1:numTrainingSamples,1);
  xTrainingFrames = [real(xTrainingFrames), imag(xTrainingFrames)];
  numTestSamples = numTestFramesPerRouter*numKnownRouters*frameLength;
  xTestFrames = xTestFrames(1:numTestSamples,1);
  xTestFrames = [real(xTestFrames), imag(xTestFrames)];
  
  % Reshape dataset into an 2 x frameLength x 1 x numTrainingFramesPerRouter*numKnownRouters matrix
  xTrainingFrames = permute(...
    reshape(xTrainingFrames,[frameLength,numTrainingFramesPerRouter*numKnownRouters, 2, 1]),...
    [1 3 4 2]);
  
  % Reshape dataset into an 2 x frameLength x 1 x numTestFramesPerRouter*numKnownRouters matrix
  xTestFrames = permute(...
    reshape(xTestFrames,[frameLength,numTestFramesPerRouter*numKnownRouters, 2, 1]),...
    [1 3 4 2]);
  
  % Load unknown router data
  load('rfFingerprintingCapturedUnknownFrames.mat')
  
  % Number of training units
  numUnknownFrames = size(unknownFrames, 4);
  
  % Split data into 90% training and 10% test
  numUnknownTrainingFrames = floor(numUnknownFrames*0.9);
  numUnknownTest = numUnknownFrames - numUnknownTrainingFrames;
  
  % Add ADALM-PLUTO data into training and test datasets
  xTrainingFrames(:,:,:,(1:numUnknownTrainingFrames) + numTrainingFramesPerRouter*numKnownRouters) ...
    = unknownFrames(:,:,:, 1:numUnknownTrainingFrames);
  xTestFrames(:,:,:,(1:numUnknownTest) + numTestFramesPerRouter*numKnownRouters) ...
    = unknownFrames(:,:,:, (1:numUnknownTest) + numUnknownTrainingFrames);
  
  % Shuffle data
  vr = randperm(numKnownRouters*numTrainingFramesPerRouter+numUnknownTrainingFrames);
  xTrainingFrames = xTrainingFrames(:,:,:,vr);
  
  % Add "unknown" label and shuffle
  yTrain = [yTrain, repmat("Unknown", [1, numUnknownTrainingFrames])];
  yTrain = categorical(yTrain(vr));
  
  yTest = [yTest, repmat("Unknown", [1, numUnknownTest])];
  yTest = categorical(yTest);
end

Train the CNN

Use the same NN architecture and training options as in the training with simulated data example.

poolSize = [2 1];
strideSize = [2 1];
% Create network architecture
layers = [
  imageInputLayer([frameLength 2 1], 'Normalization', 'none', 'Name', 'Input Layer')
  
  convolution2dLayer([7 1], 50, 'Padding', [1 0], 'Name', 'CNN1')
  batchNormalizationLayer('Name', 'BN1')
  leakyReluLayer('Name', 'LeakyReLu1')
  maxPooling2dLayer(poolSize, 'Stride', strideSize, 'Name', 'MaxPool1')
  
  convolution2dLayer([7 2], 50, 'Padding', [1 0], 'Name', 'CNN2')
  batchNormalizationLayer('Name', 'BN2')
  leakyReluLayer('Name', 'LeakyReLu2')
  maxPooling2dLayer(poolSize, 'Stride', strideSize, 'Name', 'MaxPool2')
  
  fullyConnectedLayer(256, 'Name', 'FC1')
  leakyReluLayer('Name', 'LeakyReLu3')
  dropoutLayer(0.5, 'Name', 'DropOut1')
  
  fullyConnectedLayer(80, 'Name', 'FC2')
  leakyReluLayer('Name', 'LeakyReLu4')
  dropoutLayer(0.5, 'Name', 'DropOut2')
  
  fullyConnectedLayer(numKnownRouters+1, 'Name', 'FC3')
  softmaxLayer('Name', 'SoftMax')
  classificationLayer('Name', 'Output')
  ];

Configure the training options to use ADAM optimizer with a mini-batch size of 128. Use test frames for validation since optimization of hyperparameters were done in [1].

By default, ExecutionEnvironment is set to 'auto', which uses a GPU for training if one is available. Otherwise, trainNetwork (Deep Learning Toolbox) uses the CPU for training. To explicitly set the execution environment, set ExecutionEnvironment to one of 'cpu', 'gpu', 'multi-gpu', or 'parallel'. Choosing 'cpu' may result in a very long training duration.

if trainNow
  miniBatchSize = 128;
  
  % Training options
  options = trainingOptions('adam', ...
    'MaxEpochs',30, ...
    'ValidationData',{xTestFrames, yTest}, ...
    'ValidationFrequency',floor((numTrainingFramesPerRouter*numKnownRouters + numUnknownTrainingFrames)/miniBatchSize/3), ...
    'Verbose',false, ...
    'L2Regularization', 0.0001, ...
    'InitialLearnRate', 0.0001, ...
    'MiniBatchSize', miniBatchSize, ...
    'ValidationPatience', 5, ...
    'Plots','training-progress', ...
    'Shuffle', 'every-epoch');
  
  % Train the network
  capturedDataNet = trainNetwork(xTrainingFrames, yTrain, layers, options);
else
  load('rfFingerprintingCapturedDataTrainedNN.mat','capturedDataNet','xTestFrames','yTest','MACAddresses')
end

The following plot shows the training progress of the network run on a computer with a single Nvidia Titan Xp GPU, where the network converged in about 10 epochs to almost 100% accuracy. The final accuracy of the network is 100%.

Generate the confusion matrix.

figure
yTestPred = classify(capturedDataNet,xTestFrames);
cm = confusionchart(yTest, yTestPred);
cm.Title = 'Confusion Matrix for Test Data';
cm.RowSummary = 'row-normalized';

Test with SDR

Test the performance of the trained network on the class “Unknown”. Generate beacon frames with MAC addresses of the known routers and one unknown router. Transmit these frames using an ADALM-PLUTO radio and receive using another ADALM-PLUTO radio. Since the channel and RF impairments created between these two radios are different than the ones created between the real routers and the observer, the neural network should classify all of the received signals as "Unknown". If the received MAC address is a known one, then the system declares the source as a router impersonator. If the received MAC address is an unknown one, then the system declares the source as an unknown router. To perform this test, you need two ADALM-PLUTO radios for transmission and reception. Also, you need to install Communication Toolbox Support Package for ADALM-PLUTO Radio.

Waveform Generation

Generate a transmission waveform consisting of beacon frames with different MAC addresses. The transmitter repeatedly transmits these WLAN frames. The receiver captures the WLAN frames and determines if it is a router impersonator using the received MAC address and RF fingerprint detected by the trained NN.

chanBW='CBW20';     % Channel Bandwidth
osf = 2;            % Oversampling Factor
frameLength=160;    % Frame Length in samples
% Create Beacon frame-body configuration object
frameBodyConfig = wlanMACManagementConfig;

% Create Beacon frame configuration object
beaconFrameConfig = wlanMACFrameConfig('FrameType', 'Beacon');
beaconFrameConfig.ManagementConfig = frameBodyConfig;

% Create interpolation and decimation objects
decimator = dsp.FIRDecimator('DecimationFactor',osf);

% Save known MAC addresses
knownMACAddresses = MACAddresses;
MACAddressesToSimulate = [MACAddresses, "ABCDEFABCDEF"];

% Create WLAN waveform with the MAC addresses of known routers and an
% unknown router
txWaveform = zeros(1540,5);
for i = 1:length(MACAddressesToSimulate)
  
  % Set MAC Address
  beaconFrameConfig.Address2 = MACAddressesToSimulate(i);
  
  % Generate Beacon frame bits
  [beacon, mpduLength] = wlanMACFrame(beaconFrameConfig, 'OutputFormat', 'bits');
  
  nonHTcfg = wlanNonHTConfig(...
    'ChannelBandwidth', chanBW,...
    "MCS", 1,...
    "PSDULength", mpduLength);
  txWaveform(:,i) = [wlanWaveformGenerator(beacon, nonHTcfg); zeros(20,1)];
end

txWaveform = txWaveform(:);

% Get center frequency for channel 153 in 5 GHz band
fc = helperWLANChannelFrequency(153, 5);
fs = wlanSampleRate(nonHTcfg);

txSig  = resample(txWaveform,osf,1);

% Samples per frame in Burst Mode
spf = length(txSig)/length(MACAddressesToSimulate);

runSDRSection = false;
if helperIsPlutoSDRInstalled()  
  radios = findPlutoRadio();
  if length(radios) >= 2
    runSDRSection = true;
  else
    disp("Two ADALM-PLUTO radios are needed. Skipping SDR test.")
  end
else
    disp("Communications Toolbox Support Package for Analog Devices ADALM-PLUTO Radio not found.")
    disp("Click Add-Ons in the Home tab of the MATLAB toolstrip to install the support package.")
    disp("Skipping SDR test.")
end
Communications Toolbox Support Package for Analog Devices ADALM-PLUTO Radio not found.
Click Add-Ons in the Home tab of the MATLAB toolstrip to install the support package.
Skipping SDR test.

if runSDRSection
  % Set up PlutoSDR transmitter
  deviceNameSDR = 'Pluto';
  txGain = 0;
  txSDR = sdrtx(deviceNameSDR);
  txSDR.RadioID = 'usb:0';
  txSDR.BasebandSampleRate = fs*osf;
  txSDR.CenterFrequency = fc;
  txSDR.Gain = txGain;
  
  % Set up PlutoSDR Receiver
  rxSDR = sdrrx(deviceNameSDR);
  rxSDR.RadioID = 'usb:1';
  rxSDR.BasebandSampleRate = txSDR.BasebandSampleRate;
  rxSDR.CenterFrequency = txSDR.CenterFrequency;
  rxSDR.GainSource ='Manual';
  rxSDR.Gain = 30;
  rxSDR.OutputDataType = 'double';
  rxSDR.EnableBurstMode=true;
  rxSDR.NumFramesInBurst = 20;
  rxSDR.SamplesPerFrame = osf*spf;
end

L-LTF for Classification

The L-LTF sequence present in each beacon frame preamble is used as input units to the NN. rfFingerprintingNonHTFrontEnd System object is used to detect the WLAN packets, perform synchronization tasks and, extract the L-LTF sequences and data. In addition, the MAC address is also decoded. In addition, the data is pre-processed and classified using the trained network.

if runSDRSection
  numLLTF = 20;       % Number of L-LTF captured for Testing
  
  rxFrontEnd = rfFingerprintingNonHTFrontEnd('ChannelBandwidth', 'CBW20');
  
  disp("The known MAC addresses are:");
  disp(knownMACAddresses)
  
  % Set PlutoSDR to transmit repeatedly
  disp('Starting transmitter')
  transmitRepeat(txSDR, txSig);
  
  % Captured Frames counter
  numCapturedFrames = 0;
  
  disp('Starting receiver')
  % Loop until numLLTF frames are collected
  while numCapturedFrames < numLLTF
    
    % Receive data using PlutoSDR
    rxSig = rxSDR();
    rxSig = decimator(rxSig);
    
    % Perform front-end processing and payload buffering
    [payloadFull, cfgNonHT, rxNonHTData, chanEst, noiseVar, LLTF] = ...
      rxFrontEnd(rxSig);
    
    if payloadFull
      
      % Recover payload bits
      recBits = wlanNonHTDataRecover(rxNonHTData, chanEst, ...
        noiseVar, cfgNonHT, 'EqualizationMethod', 'ZF');
      
      % Decode and evaluate recovered bits
      [mpduCfg, ~, success] = wlanMPDUDecode(recBits, cfgNonHT);
      
      if success == wlanMACDecodeStatus.Success
        % Update counter
        numCapturedFrames = numCapturedFrames+1;
        
        % Create real-valued input
        LLTF = [real(LLTF), imag(LLTF)];
        LLTF = permute(reshape(LLTF,frameLength ,[] , 2, 1), [1 3 4 2]);
        
        ypred = classify(capturedDataNet, LLTF);
        
        if sum(contains(knownMACAddresses, mpduCfg.Address2)) ~= 0
          if categorical(convertCharsToStrings(mpduCfg.Address2))~=ypred
            disp(strcat("MAC Address ", mpduCfg.Address2," is known, fingerprint mismatch, ROUTER IMPERSONATOR DETECTED"))
          else
            disp(strcat("MAC Address ", mpduCfg.Address2," is known, fingerprint match"))
          end
        else
          disp(strcat("MAC Address ", mpduCfg.Address2," is not recognized, unknown device"));
        end
      end
    end
  end
  release(txSDR)
end

Further Exploration

Capture data from your own routers as explained in Appendix: Known and Unknown Router Data Collection, train the neural network with this data, and test the performance of the network.

Appendix: Helper Functions

Appendix: Known and Unknown Router Data Collection

Use rfFingerprintingRouterDataCollection to collect data from known (i.e. trusted) routers. This function extracts L-LTF signals present in 802.11a/g/n/ac OFDM Non-HT beacons frames transmitted from commercial 802.11 hardware. For more information see the IEEE® 802.11™ WLAN - OFDM Beacon Receiver with USRP® Hardware (Communications Toolbox Support Package for USRP Radio) example. L-LTF signals and corresponding router MAC addresses are used to train the RF fingerprinting neural network. This method works best if the routers and their antennas are fixed and hard to move unintentionally. For example, in most office environments, routers are mounted on the ceiling. Follow these steps:

  1. Connect an ADALM-PLUTO radio to your PC to use as the observer radio.

  2. Place the radio in a central location where it can receive signals from as many routers as possible. Fix the radio so that it does not move. If possible, place the observer radio on the ceiling or high on a wall.

  3. Determine the channel number of the routers. You can use a Wi-Fi analyzer app on your phone to find out the channel numbers.

  4. Start data collection by running "rfFingerprintingRouterDataCollection(channel)" where channel is the Wi-Fi channel number

  5. Monitor the "max(abs(LLTF))" value. If it is above 1.2 or smaller than 0.01, adjust the gain of the receiver using the GAIN input of rfFingerprintingRouterDataCollection function.

Use the helper functions rfFingerprintingUnknownClassDataCollectionTx and rfFingerprintingUnknownClassDataCollectionRx to collect data from unknown routers. These functions set two ADALM-PLUTO radios to transmit and receive L-LTF signals. The received signals are combined with the known router signals to train the neural network. You need two ADALM-PLUTO radios, preferably connected to two separate PCs. Follow these steps:

  1. Connect an ADALM-PLUTO radio to a stationary PC to act as the unknown router

  2. Start transmissions by running "rfFingerprintingUnknownClassDataCollectionTx"

  3. Connect another ADALM-PLUTO radio to a mobile PC to act as the observer

  4. Start data collection by running "rfFingerprintingUnknownClassDataCollectionRx". This function by default collects 200 frames per location. Each location represents a different unknown router.

  5. When the function instructs you to move to a new location, move the observer radio to a new location. By default, this function collects data from 10 locations.

  6. If the observer does not receive any beacons or it rarely receives beacons, move the observer closer to the transmitter.

  7. Once the data collection is done, call "release(sdrTransmitter)" in the transmitting radio's MATLAB session.

Selected Bibliography

[1] K. Sankhe, M. Belgiovine, F. Zhou, S. Riyaz, S. Ioannidis and K. Chowdhury, "ORACLE: Optimized Radio clAssification through Convolutional neuraL nEtworks," IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Paris, France, 2019, pp. 370-378.