TUTORIAL: Why Variables Should Not Be Named Dynamically (eval)

2.804 visualizzazioni (ultimi 30 giorni)
Summary:
Dynamically accessing variable names can negatively impact the readability of your code and can cause it to run slower by preventing MATLAB from optimizing it as well as it could if you used alternate techniques. The most common alternative is to use simple and efficient indexing.
Explanation:
Sometimes beginners (and some self-taught professors) think it would be a good idea to dynamically create or access variable names, the variables are often named something like these:
  • matrix1, matrix2, matrix3, matrix4, ...
  • test_20kmh, test_50kmh, test_80kmh, ...
  • nameA, nameB, nameC, nameD,...
Good reasons why dynamic variable names should be avoided (click the links to jump to the "answers" below):
There are much better alternatives to accessing dynamic variable names:
Note that avoiding eval (and assignin, etc.) is not some esoteric MATLAB restriction, it also applies to many other programming languages as well:
MATLAB Documentation:
If you are not interested in reading the answers below then at least read MATLAB's own documentation on this topic Alternatives to the eval Function, which states "A frequent use of the eval function is to create sets of variables such as A1, A2, ..., An, but this approach does not use the array processing power of MATLAB and is not recommended. The preferred method is to store related data in a single array." Data in a single array can be accessed very efficiently using indexing.
Note that all of these problems and disadvantages also apply to functions load (without an output variable), assignin, evalin, and evalc, and the MATLAB documentation explicitly recommends to "Avoid functions such as eval, evalc, evalin, and feval(fname)".
The official MATLAB blogs explain why eval should be avoided, the better alternatives to eval, and clearly recommend against magically creating variables. Using eval comes out at position number one on this list of Top 10 MATLAB Code Practices That Make Me Cry. Experienced MATLAB users recommend avoiding using eval for trivial code, and have written extensively on this topic.
  11 Commenti
Sivaprakasam M
Sivaprakasam M il 5 Apr 2023
Helped me avoid creating a dynamic variable name and push the data into cell arrays.
Thanks Stephen
Stephen23
Stephen23 il 5 Apr 2023
Modificato: Stephen23 il 5 Apr 2023
@Sivaprakasam M: this page already includes explanations of several common approaches to avoiding dynamic variable names, with links to many many examples: search for the text "better alternatives". If reading this page does not give you a solution, then please ask a new question with sufficient explanation that someone can help you.

Accedi per commentare.

Risposta accettata

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 11 Dic 2017
  2 Commenti
Cris Luengo
Cris Luengo il 10 Lug 2018
The 2nd link, to the newsreader, is dead. (Long live the newsreader!)
Stephen23
Stephen23 il 15 Lug 2018
Modificato: Stephen23 il 16 Ott 2019
@Cris Lengo: you can still access that newsreader thread here:
The relevant text is:
MATLAB is not lying to you.
When you run your function, MATLAB needs to determine what each identifier
you use is as part of the process of parsing the function. At that time,
there's no indication in your code that debug should be a variable; however,
there is a function named debug. Therefore, MATLAB decides that the
instances of debug in the code should be calls to that function. When the
code is actually executed, a variable named debug is created, and WHICH
reflects that fact -- but at that point, it's too late for MATLAB to "change
its mind" and it tries to call the debug function on the last line. DEBUG
is a script file, though, and so you correctly receive an error.
This is why you SHOULD NOT "poof" variables into the workspace at runtime,
whether via EVALIN, ASSIGNIN, EVAL, or LOAD.
--
Steve Lord

Accedi per commentare.

Più risposte (19)

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 15 Feb 2022
Slow
The MATLAB documentation Alternatives to the eval Function explains that code that uses eval is slower because "MATLAB® compiles code the first time you run it to enhance performance for future runs. However, because code in an eval statement can change at run time, it is not compiled".
MATLAB uses JIT acceleration tools that analyze code as it is being executed, and to optimize the code to run more efficiently. When eval is used the JIT optimizations are not effective, because every string has to get compiled and run again on every single iteration. This makes loops with eval very slow. This is also the reason why not just creating variables with dynamic variable names is slow, but accessing them is also slow.
Even the eval hidden inside of str2num can slow down code:

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 24 Nov 2022
Security Risk
eval will evaluate any string at all, no matter what commands it contains. Does that sound secure to you? This string command might be malicious or simply a mistake, but it can do anything at all to your computer. Would you run code which could do anything at all to your computer, without knowing what it was about to do?
For some users the surprising answer is "yes please!".
For example, try running this (taken from Jos' answer here):
eval(char('fkur*)Ykvj"GXCN"{qw"pgxgt"mpqy"yjcv"jcrrgpu0"Kv"eqwnf"jcxg"hqtocvvgf"{qwt"jctfftkxg"000)+'-2))
Did you really run it on your computer even though you had no idea what it would do? Every time code gets a user input and evaluates it gives that user the ability to run anything at all. Does that sound secure to you?
  3 Commenti
Adam Danz
Adam Danz il 22 Ott 2019
Modificato: Adam Danz il 22 Ott 2019
Hint: the char(str-2) is a caesar-cipher that produces a command that is exectued by eval().
Steven Lord
Steven Lord il 23 Ott 2019
Running the char command is safe, that will just create a char vector you can read. Remove the eval() around the char command and it won't execute the command stored in the char vector.

Accedi per commentare.


Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 25 Nov 2022
Difficult to Work With
Many beginners come here with questions that are basically some version of "I have lots of numbered variables but I cannot figure out how to do this simple operation...", or "my code is very slow/complex/buggy... how can I make it better?":
Even advocates of eval get confused by it, fail to make it work properly, and can't even figure out why, as these two examples clearly show:
Why can't they figure out why it does not work?:
  • Totally obfuscated code due to indirect code evaluation.
  • More complex than it needs to be.
  • The code helper tools do not work.
  • Syntax highlighting does not work.
  • Static code checking does not work.
  • No useful error messages, etc. etc.
Writing code is hard. Don't make it even harder by turning off the tools that check and help improve your code.

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 25 Nov 2022
Buggy
Using eval makes it really hard to track down bugs, because it obfuscates the code and disables lots of code helper tools. Why would you even want to use a tool that makes it harder to debug and fix your code?
Here are some examples to illustrate how what should have been simple operations become very difficult to debug because of the choice to use eval:
Code that generates variable names dynamically based on imported data or user inputs is also susceptible to the names reaching the name length limit:
This quote sums up debugging eval based code: "I've never even attempted to use it myself, but it seems it would create unreadable, undebuggable code. If you can't read it and can't fix it what good is it?" Note that eval's equally evil siblings evalc, evalin and assignin also make code slow and buggy:

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 7 Nov 2021
Obfuscated Code Intent
What does this code do?:
x1 = [119,101,98,40,39,104,116,116,112,58,47,47,119,119,119];
x2 = [46,121,111,117,116,117,98,101,46,99,111,109,47,119,97];
x3 = [116,99,104,63,118,61,100,81,119,52,119,57,87,103,88];
x4 = [99,81,39,44,39,45,98,114,111,119,115,101,114,39,41];
eval(char([x1,x2,x3,x4]))
Unfortunately eval makes it easy to write code which is hard to understand: it is not clear what it does, or why. If you ran that code without knowing what it does, you should know that it could have deleted all of your data, or sent emails to everyone on your contact list, or downloaded anything at all from the internet, or worse...
Because eval easily hides the intent of the code many beginners end up writing code that is very hard to follow and understand. This makes the code buggy, and also harder to debug! See these examples:
Properly written code is clear and understandable. Clear and understandable code is easier to write, to bug-fix, and to maintain. Code is read more times than it is written, so never underestimate the importance of writing code that is clear and understandable: write code comments, write a help section, use consistent formatting and indentation, etc.
  4 Commenti
Walter Roberson
Walter Roberson il 24 Lug 2019
Modificato: Walter Roberson il 24 Lug 2019
I just encountered someone using num2str() on a computed variable name, in order to have the effect of an eval() without using eval() directly in the code. This is, needless to say, obscure intent.
Stephen23
Stephen23 il 7 Nov 2021
Modificato: Stephen23 il 7 Nov 2021
@Walter Roberson: should that example be STR2NUM? (which contains EVAL inside)

Accedi per commentare.


Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 25 Nov 2022
Code Helper Tools do not Work
The MATLAB editor contains many tools that advanced users continuously make use of, and beginners should particularly appreciate when learning MATLAB. However none of these tools work with code hidden inside eval:
Note that these do not work when using eval, evalc, etc. to magically create or access variable names. Would you want to disable the tools that help you to write functioning code? Here are examples of how eval hides code errors and makes it hard to debug code:
  1 Commento
Tom Hawkins
Tom Hawkins il 7 Feb 2019
On this topic, it would be great if the Code Analyzer and checkcode would actually flag a warning when eval etc. are used. Perhaps that would cut down the number of questions about them on here?

Accedi per commentare.


Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 24 Nov 2022
Alternative: Indexing into Cell Array or ND-Array
Oftentimes when a user wants to use eval they are trying to create numbered variables, which are effectively an index joined onto a name. It is usually better to turn that pseudo-index into a real index: MATLAB is fast and efficient when working with indices, and using indices will make code much much simpler than anything involving dynamic variable names:
Using ND-arrays is a particularly efficient way of handling data: many operations can be performed on complete arrays (known as code vectorization), and ND-arrays are easy to get data in and out of, and reduces the chance of bugs:
Or simply put the data into the cells of a cell array:
And some real-world examples of where indexing is much simpler than eval:

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 15 Feb 2022

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 25 Nov 2022
Alternative: load into a Structure, not into the Workspace
The MATLAB documentation explains this in detail:
In almost all cases where data is imported programmatically (i.e. not just playing around in the command window) it is advisable to load data into an output argument (which is a structure if the file is a .mat file):
S = load(...);
The fields of the structure can be accessed directly, e.g:
S.X
S.Y
or by using dynamic fieldnames. Note that this is the inverse of saving the fields of a scalar structure.
It is important to note that (contrary to what some users seem to think) it is actually easier and much more robust to save and load data within a loops when the variable names in the .mat files do not change, as having to process different variable names in each file actually makes saving/loading the file data more complex, inefficient, and fragile.
Summary: when using a loop, keep all variable names the same!
Here are real-world examples of loading into variables:
And finally Steven Lord's comment on load-ing straight into the workspace:

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 7 Nov 2021
Other Languages: do not use eval!
In case you think that avoiding dynamic variable names is just some "weird MATLAB thing", here is the same discussion for some other programming languages, all advising "DO NOT create dynamic variable names":
Some languages might use, require, or otherwise encourage dynamic variable names: if that is how they work efficiently, then so be it. But what is efficient in one language means nothing about the same approach in other languages... if you wish to use MATLAB efficiently, make your code easier to work with, and write in a way that other MATLAB users will appreciate, then that means learning how to use MATLAB features and tools:

Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 7 Nov 2021
Alternative: Non-Scalar Structure (with Indexing)
Using a non-scalar structure is much simpler than trying to access dynamic variable names. Here are some examples:
A very neat example is using the output from DIR to store imported file data:
S = dir(..);
for k = 1:numel(S)
S(k).data = readmatrix(S(k).name);
end

Stephen23
Stephen23 il 19 Lug 2017
Modificato: Stephen23 il 25 Nov 2023
Magically Making Variables Appear in a Workspace is Risky
This leads to many subtle bugs that are extremely difficult to track down, if they are even noticed at all!
1) For a start variables of the same name will be overwritten without warning. Even just a spelling mistake or adding extra variables to a MAT file can change the behavior of your code, and because it depends on the data files that you are working with, can be very difficult to track down.
2) Importing multiple files in a loop can ruin your data: consider what will happen if your code processes a sequence of MAT files, which you think all contain the same variables. But one of them contains different variables (yeah, I know, your data files are perfect... sure). Consider what happens in badly-written, fragile code that simply LOADs directly into the workspace: it will happily process the data from the previously loaded file, without giving you any warning or notification that your data are now from the wrong file. Processing continues using the wrong data.
3) There is another serious yet subtle problem, which is caused by the MATLAB parser finding alternative functions/objects/... and calling those instead of using the magically-created variable: basically if the variable does not exist then the parser does its best to find something that matches where the name is called/used later... and it might just find something! The documentation also explains this:
Some example threads discussing this topic:
Or in some cases the parser might not find anything:
The solution is simple: do not magically "poof" variables into existence: Always load into a structure, and never create variable names dynamically.
  6 Commenti
Guillaume
Guillaume il 21 Ago 2019
assignin and evalc also belong to that list. I can't think of any others right now.
The reparsing that you describe is most likely what was happening is past versions. But at least for load, it's clear that mathworks are moving away from that design and are not interested in supporting implicitly defined variables. I.e.
function myfunc()
load data.mat %creates a x variable
disp(x(:));
end
no longer works (or will no longer work). See R2017a release notes.
I'm fine with that. If the optimiser has to detect whether or not it can optimise or must leave it later to reparse will have an impact on performance for both cases since it must do that detection.
Walter Roberson
Walter Roberson il 21 Ago 2019
There is currently some name re-resolution being done whenever the MATLAB path changes, including when you cd() -- which is a reason to avoid cd() in code.
In the past, I put some thought into the kinds of structures you would have to put in place in order to handle that situation efficiency . I did not follow it through, though; just some thought experiments.

Accedi per commentare.


Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 25 Nov 2022
Confuses Data with Code
The inclusion of data and meta-data within variable names (e.g. naming a variable with the user's input, the name of a test subject, or (very commonly) adding an index onto a variable name) is a subtle (but closely related) problem, and it should definitely be avoided. This quote from Image Analyst explains the problem succinctly: "When you start writing code to generate variable names, you're no longer writing code to process your data, you're writing code to generate the code that will process your data, and the increased complexity of this metaprogramming is always an added risk (of bugs, security issues, etc.)"
Read these discussions for an explanation of why it is a poor practice to put data and meta-data in variable names:
In many cases that meta-data is just a de-facto index, i.e. a value that proscribes the order of the data. But in that case the de-facto index should be turned into a much more efficient real numeric index:
  2 Commenti
Samuel Gray
Samuel Gray il 26 Gen 2022
...everything is data, the question is what do you do with it
Walter Roberson
Walter Roberson il 26 Gen 2022
MATLAB supports targetting systems with Harvard architecture -- systems where the instructions and the data live in different address spaces, so the instructions are not data on such systems.

Accedi per commentare.


Stephen23
Stephen23 il 26 Set 2016
Modificato: Stephen23 il 7 Nov 2021
Alternative: Use more Efficient Ways to Pass Variables between Workspaces (applies to evalin, assignin, etc)
Use nested functions, or pass arguments, or use any of the other efficient ways to pass data between workspaces:

Stephen23
Stephen23 il 30 Nov 2017
Modificato: Stephen23 il 24 Nov 2022
PS: eval is Not Faulty:
Some users apparently think that eval (and friends) must be faulty and should be removed from MATLAB altogether. They ask "if eval is so broken, why has it not been removed?"... but it is important to understand that the problem is caused by magically accessing variable names regardless of what tool or operation is used, and that eval (or assignin, or evalin, or load without an output argument, etc.) is simply being used inappropriately because there are much better methods available ( better in the sense faster, neater, simpler, more robust, etc). Read these discussions for good examples of this confusion:
It is important to note that any feature of a language can be used inefficiently or in an inappropriate way, not just eval, and this is not something that can be controlled by the language itself. For example, it is common that someone might solve something with slow loops and without preallocating the output arrays: this does not mean that for loops are "faulty" and need to be removed from MATLAB!
It is up to the programmer to write efficient code.

Stephen23
Stephen23 il 17 Apr 2019
Modificato: Stephen23 il 17 Apr 2019
Alternative: save the Fields of a Scalar Structure
The save command has an option for saving the fields of a scalar structure as separate variables in a .mat file. For example, given a scalar structure:
S.A = 1;
S.B = [2,3];
this will save variables A and B in the .mat file:
save('myfile.mat','-struct','S')
This is the inverse function of loading into a structure. Some threads showing how this can be used:

Steven Lord
Steven Lord il 30 Apr 2019
Modificato: Steven Lord il 30 Apr 2019
Alternative: Use a table or timetable Array
table (introduced in release R2013b) and timetable (introduced in release R2016b) arrays allow you to store data with row and/or column names with which you can access the data. For example, if you create a table with variables named Age, Gender, Height, Weight, and Smoker and rows named with the last names of the patients:
load patients
patients = table(Age,Gender,Height,Weight,Smoker,...
'RowNames',LastName);
you can ask for all the ages of the first five patients:
patients(1:5, 'Age')
or all the data for the patients with last names Smith or Jones:
patients({'Smith', 'Jones'}, :)
You can also add new variables to the table, either by hard-coding the name of the variable:
% Indicate if patients are greater than five and a half feet tall
patients.veryTall = patients.Height > 66
or using variable names stored in char or string variables. The code sample below creates new variables named over40 and under35 in the patients table using different indexing techniques.
newname1 = 'over40';
patients.(newname1) = patients.Age > 40;
newname2 = 'under35';
patients{:, newname2} = patients.Age < 35;
patients(1:10, :) % Show the first ten rows
The code sample below selects either Height or Weight and shows the selected variable for the fifth through tenth patients using dynamic names.
if rand > 0.5
selectedVariable = 'Height';
else
selectedVariable = 'Weight';
end
patients.(selectedVariable)(5:10)
See this documentation page for more information about techniques you can use to access and manipulate data in a table or timetable array. This documentation page contains information about accessing data in a timetable using the time information associated with the rows.
  1 Commento
Stephen23
Stephen23 il 30 Apr 2019
Modificato: Stephen23 il 7 Nov 2021
Simpler and more robust way to generate a table from that .mat file:
S = load('patients.mat');
T = struct2table(S,'RowNames',S.LastName);

Accedi per commentare.


Econstudent
Econstudent il 17 Gen 2017
You discuss at length why we shouldn't A, B or C and you also comment on how we could access certain objects.
Now, suppose we need to import a few time series -- but I can only import those series one at a time. The intention behind creating a sequence of variables inside a loop is often to store those time series in distinct object every time. That is, you want to assign the data to a different object every time and do it considerably more than once...
What other choice do you have besides creating objects within your loop?
  18 Commenti
Walter Roberson
Walter Roberson il 20 Feb 2022
"Do I really need to explain to you what you got wrong, very wrong, in this summary, or just pay you to go back and read it in detail and then comment? I thought that's what happened when I paid for my updated Matlab license, but I see that I was wrong."
You need to explain to me what I "got wrong, very wrong".
But first you need to look at any significant C program intended for multiple operating systems, and look at the sheer amount of #ifdef in the headers.
Question for you: Which vendor(s) have certified a Unix operating system for Intel x86 or x64 architecture within the last 15 years? (Since, after all, if you are working on a system that is not standards compliant, then you need to configure around non-standard behaviour...)
Samuel Gray
Samuel Gray il 20 Feb 2022
Modificato: Samuel Gray il 20 Feb 2022
"But first you need to look at any significant C program intended for multiple operating systems, and look at the sheer amount of #ifdef in the headers. "
I think this is only a problem because we don't know exactly which #defines need to be included in the program, and what values they need to have, and where to get them.
So there's a lack of standardization in C/C++ You could always write programs in a "higher" language like Python, and let the developers of Python deal with those "C-level" problems. I'm sure that MS would help you with them if you'd just adopt VS 2019+ as your IDE, no problem.
I personally agree, C is one of the most frustrating languages to program in for that simple reason. C++ is like trying to program in C after taking acid. Let's make it harder by just throwing the code at a library full of header-files that are platform-specific. But part of that is because of Microsoft being Microsoft and walking into a part of people playing a game and changing the rules and changing all of the pieces and calling it the same game. They are only relevant to OS when they are in control. This is why you have entire segments of the x86 market that refuse to go anywhere near Windows.
But the BSD crowd is just as responsible for this for their simple failure to adhere to their own development guidelines and make makefiles that spell out for even the greenest C noob, exactly how to make and install their programs. And of course even that depends on the exact build environment. The one thing that MS is very good at is glomming onto OS without making OS-style mistakes. They do however often make MS-style mistakes and that is one reason why we still use Matlab. Without Matlab you'd have a bunch of little camps of people programming in different languages with little in common. And the great thing about having Matlab on Linux now is that hopefully you guys will abandon COM. And the Matlab features that rely on COM and likewise on Windows. Much of which could be written in VBA just as easily. COM is just a desperate attempt to duplicate functionality that's been in Unix for decades and now that Canonical has partnered with MS and has Ubuntu bundled with Windows, why would you even need COM in the first place? You can bundle Matlab with Ubuntu and there you go.
You just have to figure-out how to charge for Matlab ;)
(that's what the toolkits are for!)
...ultimately the BSD crowd is elitist and doesn't really care if someone finds it difficult to develop in their personal favorite flavor of Unix. That just means more work and more income for them. You want something easy to develop in, there's always Python, right? Programming in Python is a great idea until you find out that Python also is dependent on library-file versions. And certifying code? LOL

Accedi per commentare.


John Dzielski
John Dzielski il 19 Feb 2022
I have a question about a specific use case for the eval command. It is one I use frequently, and I would like to understand if and why it is bad. I data sets from a piece of instrumentation where either the filename or the variable name often includes some identifying string and some sort of numbering. When I write analysis scripts, I will typically assume the data is stored in a variable called something like 'data'. I will use a command like eval(['data=',namedVariable]) to assigne the value to 'data' and then run the script. I will often use the reverse of the argument to copy the processed data back to the same variable and save it to a file. These scripts are often LiveScripts and the plot titles often derive from 'namedVariable', so a function call is not a useful solution here. What is wrong with doing this? (If anything).
  14 Commenti
John Dzielski
John Dzielski il 20 Feb 2022
Except for having tried to implement it. I think I see the alternatives to each of the cases that I was using eval in. Some unfamiliar syntax that I'll need to read the documentation about, but thank you.
Rik
Rik il 20 Feb 2022
There isn't any shame in not knowing a specific syntax that avoids the need for eval. Your mistake is in repeating 'I don't think there is a way to avoid eval here'. As we have now demonstrated several times, there usually is.
Please post the next one as a separate question so others can find the solution as well. (and don't assume the only way is eval)

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by