How to speed up MEX function?

21 visualizzazioni (ultimi 30 giorni)
Yifan Lin
Yifan Lin il 1 Nov 2022
Commentato: James Tursa il 7 Nov 2022
following mex code is running too slow, but I don't know why it is and how to make it faster. Any help is greatly appreciated!
calculate_my_way.cpp
#include "mex.hpp"
#include "mexAdapter.hpp"
#include <cmath>
class MexFunction : public matlab::mex::Function {
public:
void operator()(matlab::mex::ArgumentList outputs, matlab::mex::ArgumentList inputs) {
matlab::data::TypedArray<double> var0 = inputs[0];
matlab::data::TypedArray<double> var1 = inputs[1];
matlab::data::TypedArray<double> var2 = inputs[2];
matlab::data::TypedArray<double> var3 = inputs[3];
auto var0Iter = var0.begin();
auto var1Iter = var1.begin();
auto var2Iter = var2.begin();
auto var3Iter = var3.begin();
const int numOfElements = var0.getNumberOfElements();
double buffer = 0;
for (int x = 0; x<numOfElements; x++)
{
buffer = std::sin(*var0Iter) + std::sin(*var1Iter) + std::sin(*var2Iter) + std::cos(*var3Iter);
*var0Iter = buffer;
buffer = std::sin(*var1Iter + *var2Iter) + std::cos(*var3Iter);
*var1Iter = buffer;
var0Iter++;
var1Iter++;
var2Iter++;
var3Iter++;
}
outputs[0] = std::move(var0);
outputs[1] = std::move(var1);
}
};
It's just simple calculation, but this code runs even slower than native distance function which performs a lot more complicated calculation than just a few sin+cos.
I'm using compiler that came with Visual Studio 2017. below is how I run mex and the compiler setup info.
mex -v calculate_my_way.cpp
...
Compiler location: C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\
...
OPTIMFLAGS : /O2 /Oy- /DNDEBUG
and this is how I am seeing performance issues.
clear
size_test = 1e7;
var1 = zeros(size_test, 1);
var2 = zeros(size_test, 1);
var3 = zeros(size_test, 1);
var4 = zeros(size_test, 1);
cant_beat_me = @() distance(var1,var2,var3,var4);
elapsed_time = timeit(cant_beat_me);
mex_slow = @() calculate_my_way(var1,var2,var3,var4);
elapsed_time = timeit(mex_slow);
  15 Commenti
Bruno Luong
Bruno Luong il 3 Nov 2022
By curiosity I code the same calculation in C. Time is 0.24 sec; twice faster than C++ (0.5 sec) but 60% slower than MATLAB (0.147 sec).
/* mex -g -R2018a calculate_C_way.c */
#include "mex.h"
#include <math.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int i, n;
double *var0Iter, *var1Iter, *var2Iter, *var3Iter, *out0Iter, *out1Iter;
n = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
plhs[1] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
var0Iter = mxGetDoubles(prhs[0]);
var1Iter = mxGetDoubles(prhs[1]);
var2Iter = mxGetDoubles(prhs[2]);
var3Iter = mxGetDoubles(prhs[3]);
out0Iter = mxGetDoubles(plhs[0]);
out1Iter = mxGetDoubles(plhs[1]);
for (i = 0; i < n; i++) {
*out0Iter = sin(*var0Iter) + sin(*var1Iter) + sin(*var2Iter) + cos(*var3Iter);
*out1Iter = sin(*var1Iter + *var2Iter) + cos(*var3Iter);
out0Iter++;
out1Iter++;
var0Iter++;
var1Iter++;
var2Iter++;
var3Iter++;
}
}
Yifan Lin
Yifan Lin il 3 Nov 2022
@Bruno Luong, Thanks! I was also curious and wanted to give this a try, but you beat me to it! Yes, apparently C++ API is slower than C API for MATLAB. Ref: this post - Is C++ MEX API significantly slower than the C MEX API? - MATLAB Answers - MATLAB Central (mathworks.com). I've also tried openmp like you suggested, but the problem was, I was using VS2017, so I couldn't do #pragma omp simd. I'll wait for my VS2019 install to finish and try again there with the C API.

Accedi per commentare.

Risposta accettata

Bruno Luong
Bruno Luong il 3 Nov 2022
Modificato: Bruno Luong il 3 Nov 2022
Last experience, Time with C OpenMP, Intel Parallel Studio XE 2022
CIntel_elapsed_time = 0.0574 [sec]
2.5 faster than MATLAB (finally I beat MATLAB).
To have fast mex: Use C-API (not Cpp), Make it multi-thread, Select a decent compiler.
/* Compile with intel compiler
mex -O COMPFLAGS="$COMPFLAGS /MD /Qopenmp" -R2018a calculate_C_way.c */
#include "mex.h"
#include <math.h>
/* Set to 1 to Enable OPENMP
to 0 to disable it */
#define OPENMP_FLAG 1
#if OPENMP_FLAG == 1
#include <omp.h>
#endif
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int i, n;
double *var0Iter, *var1Iter, *var2Iter, *var3Iter, *out0Iter, *out1Iter;
n = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
plhs[1] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
var0Iter = mxGetDoubles(prhs[0]);
var1Iter = mxGetDoubles(prhs[1]);
var2Iter = mxGetDoubles(prhs[2]);
var3Iter = mxGetDoubles(prhs[3]);
out0Iter = mxGetDoubles(plhs[0]);
out1Iter = mxGetDoubles(plhs[1]);
#if OPENMP_FLAG==1
#pragma omp parallel for default(none) private(i) \
schedule(static) \
shared(n, out0Iter, out1Iter, var0Iter, var1Iter, var2Iter, var3Iter)
#endif
for (i = 0; i < n; i++) {
out0Iter[i] = sin(var0Iter[i]) + sin(var1Iter[i]) + sin(var2Iter[i]) + cos(var3Iter[i]);
out1Iter[i] = sin(var1Iter[i] + var2Iter[i]) + cos(var3Iter[i]);
}
}
  2 Commenti
Yifan Lin
Yifan Lin il 3 Nov 2022
@Bruno Luong Thank you very much!!!! This is exactly what I was looking for!
James Tursa
James Tursa il 7 Nov 2022
Typically, instead of this
#define OPENMP_FLAG 1
#if OPENMP_FLAG == 1
#include <omp.h>
#endif
you can use this:
#ifdef _OPENMP
#include <omp.h>
#endif
The _OPENMP macro is defined by the compiling environment when OpenMP is available.

Accedi per commentare.

Più risposte (1)

Bruno Luong
Bruno Luong il 2 Nov 2022
Modificato: Bruno Luong il 2 Nov 2022
I don't know well C++, but I have practiced quite a lot mex C.
It looks like this statement just move a bunch of data
outputs[0] = std::move(var0);
outputs[1] = std::move(var1);
ALso I wonder if your input "0, and 1 would change
*var0Iter = buffer;
...
*var1Iter = buffer;
after calling the mex, which is NOT allowed.
  2 Commenti
Yifan Lin
Yifan Lin il 2 Nov 2022
@Bruno Luong! Another one of your answer here helped me tremendously a few years back! thank you!
I've tested the var0 and var1 value, they did change. And they get moved to the output.
So, [a,b] = calculate_my_way(0,0,0,0); [a,b] will be both 1.
I have a suspicion that this slowness may be either
1. MSVC is not as good as the one Mathworks uses (probably Intel Parallel Studio)
2. the C++ Mex function calling may be problematic with some massive overhead that I don't know.
3. I am just not doing something right in my c++ code?
Bruno Luong
Bruno Luong il 2 Nov 2022
" Another one of your answer here helped me tremendously a few years back! thank you! "
Oh... realy glad to read that...

Accedi per commentare.

Categorie

Scopri di più su MATLAB Compiler in Help Center e File Exchange

Prodotti


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by