Bug on polyfit output?

Hi, I am wondering why the results change when I call polyfit with the tilde ('~'), in order to obviously surpress the remaining outputs:
>> p = polyfit([1 2 3 5 10], [5 65 84 2 3],1)
p =
-4.7402 51.7087
BUT
>> [p,~,~] = polyfit([1 2 3 5 10], [5 65 84 2 3],1)
p =
-16.8925 31.8000
I thought in both cases p should contain the same coefficients. Does anybody know why there is a difference? The ~ method works fine with other functions for example like size:
>> [p,~] = size([11 11; 11 11;11 11])
p =
3
I use R2016a. Looking forward to your answers. Kind regards!

 Risposta accettata

dpb
dpb il 8 Giu 2018
Modificato: dpb il 8 Giu 2018
"Feature" or "Quality of Implementation" depending on your viewpoint...
help polyfit
...
[p,S,mu] = polyfit(x,y,n) also returns mu, which is a two-element vector with centering
and scaling values.
mu(1) is mean(x), and mu(2) is std(x). Using these values, polyfit centers x at zero
and scales it to have unit standard deviation
...
I'd never tried it before with the tilde as the third output argument so wasn't aware it (the tilde, that is) was being counted as if the argument were there, but clearly it is.
>> x=[1 2 3 5 10];
>> [mean(x) std(x)]
ans =
4.2000 3.5637
>> [p,~,mu] = polyfit([1 2 3 5 10], [5 65 84 2 3],1)
p =
-16.8925 31.8000
mu =
4.2000
3.5637
>>
It comes from ancient history of how polyfit was initially implemented; truthfully to have the output variable determine whether the independent variable is/is not scaled is/was a less-than-optimal design and almost certainly wouldn't have made the cut under today's ideas of software design/interface. But, 30 years ago or so when first implemented ideas were far different than are today.
ADDENDUM: However, what's the purpose of using the tilde for trailing return value position holders that you don't want, anyway? Any number of output variables beyond those provided for are automagically dropped; the only purpose/need for the tilde is to not return one (or more) arguments that are positioned prior to one that is desired.
Of course, here's a case because of the unusual input design that the output is dependent upon the number of inputs that if you want the scaling you have to provide the output argument.
I've found it somewhat surprising that TMW hasn't introduced a more capable and modern version into base product rather than restricting only to the toolboxes (which I find somewhat cumbersome albeit more flexible).

19 Commenti

dpb
dpb il 8 Giu 2018
Modificato: dpb il 8 Giu 2018
Following up, it appears there's no way to actually determine in the called function whether an output variable position is the tilde within the function so there's really no way to fix the behavior (if it were desired to change it anyway) nor to make any specific energy-saving early branching of not calculating outputs that are just going to be thrown away, anyway. All one gets is the count and a tilde placeholder counts just the same as a real output variable in the position.
Seems like a worthwhile enhancement request...not for this particular problem, per se, but as a general programming feature.
Jan
Jan il 9 Giu 2018
+1. Exactly. This is the documented behavior.
I do not think that a function should be able to determine, if the output in the caller is a tilde. The tilde is thought to use a placeholder. The intention is to let the function work as if the output argument is existing. Then delivering the information, that it is a placeholder only is somewhat indirect. It is easier to use a further input argument, which defines, what is wanted as output.
dpb
dpb il 9 Giu 2018
Fail to see any reason why not...if the output is going to be trashed anyway, the function might as well have the opportunity to not compute it and save the effort just as it could if the corresponding output argument weren't requested at all.
John D'Errico
John D'Errico il 9 Giu 2018
But IF your intention is to use that form, whether or not you will trash the output, polyfit still needs to work that way. The THREE output version of polyfit needs to be consistent.
There may even be some cases where a user wants the first of those three outputs, from the three argument form.
I would consider this a bug if it did not work exactly as it does.
dpb
dpb il 9 Giu 2018
Modificato: dpb il 9 Giu 2018
My proposal wouldn't change anything about how it currently works; just give the writer of the function the facility to decide what it (the function) should do in that case.
In no way I would propose breaking compatibility with the existing interface of polyfit in particular (no matter how badly designed it is) nor any other function; it would only be for new code that chose to make use of the additional information.
John D'Errico
John D'Errico il 9 Giu 2018
Modificato: John D'Errico il 9 Giu 2018
But I would argue it very much changes the way the code behaves on an existing case. And this is something TMW strives mightily to avoid.
Currently, polyfit has one behavior for the single return argument call, and another for the triple return. If you chose not to use the second and third returned arguments, that is not a problem, just your call, and your lack of need for those other arguments. Maybe you already know those arguments, so have no need to recompute them. And there are surely people out there who use it like that. This is the existing (so the expected) behavior is to return the three arg form, whenever three args are returned, and it has done so for multiple years now.
But it seems it would be strongly inconsistent if polyfit decided to use the one argument form if it somehow knew that even though you called it with the three arg form, you were not going to use two of those arguments. To me, this is just pleading for people to send in bug reports. It will be difficult to explain in the documentation why one form is used over another. Highly confusing to users ... "yes, we return one form or another, depending on many variables, along with a call to rand on alternate Tuesdays." Well, yes, that is a bit strong. :)
Remember that, if you change the behavior of supplied code between release, there will still be many people using an older release. So even if someone is not now using polyfit in any specific way in an older release, there may well be someone who will make that choice in the future.
In the end, you can feel free to want TMW to change polyfit as you wish. But I would predict that will never happen, not without an edict from way above, and a very good rationale for that action.
dpb
dpb il 9 Giu 2018
Modificato: dpb il 9 Giu 2018
You misunderstand what I'm suggesting, John.
I'm saying the enhancement should be to add a feature that allows the coder to determine if the positional argument is a tilde or not and determine what to do on that basis, NOT to arbitrarily ignore it as if it weren't there.
The implementation (*) would NOT be to have the behavior of nargout to change to not report tilde in its count; it would still behave just as it does.
While I think (and have always thought from the first time I discovered it lo! those many years ago) that the interface to polyfit is a very poor choice for how to indicate to standardize the independent variable, as noted I'd never suggest breaking it or to introduce other compatibility issues.
With the existence of the new feature, there would be no change to polyfit at all; it is the aberrant case of the third positional output argument changes definition of the first so there's nothing that can be gained computationally, it must compute the statistics from which to standardize. Other functions that may have auxiliary secondary outputs that aren't needed and may be expensive to compute could make use of the new feature if desired but that would have no bearing on existing code prior to the feature being introduced.
I'd love to see TMW introduce a better yet simple tool similar to poly[fit|val] into the base product, but doesn't seem likely to happen. Of course, this suggestion is unlikely in the extreme, too, probably... :) There are any number of other warts I'd put higher up on the wish list but this does seem potentially useful facility.
(*) One possible (not necessarily best and certainly not only) implementation would be to add a second optional input argument (while pretty rarely used I would imagine, the present one is a function handle|character vector to query output definition for a function from its definition) that would be a flag to return an array by position of logical that indicated the corresponding output positional argument as specified|tilde; the first output variable would still be the count, the second array would be of length(count).
Called with zero or one argument a function handle or string, the result would still be the same as present; with one argument not a function handle or function string or with two the other information would be returned as well.
Image Analyst
Image Analyst il 9 Giu 2018
But if an input that flagged whether some outputs were tilde were allowed, then you'd have to know to send in that input. And if you knew that, then you'd know that you're using tildes for outputs, so why wouldn't you just then get rid of the tildes?
I don't see the use case for using tildes and then telling the function to ignore the tildes that I'm using? Just don't use them and avoid the problem.
It's like putting another mailbox in front of your house and then calling the post office to tell them to ignore the second mailbox. If you don't want the second mailbox, just don't put one out there, and save yourself a call to the post office.
dpb
dpb il 9 Giu 2018
@IA -- I don't envision implementation as "another input" but an enhanced nargout or alternate to it that is just another tool in the toolset.
The tilde itself wouldn't be ignored; it may be important in position count as is in polyfit; the use case would be that if that trailing variable didn't require the preceding one to be computed that those calculations could be skipped.
Granted, it's probably a small population of cases and I'm not arguing it is "way up there" on the list of needs but could see the occasional use.
The better route is to not design functions that work like polyfit does.
John D'Errico
John D'Errico il 9 Giu 2018
Modificato: John D'Errico il 9 Giu 2018
I do agree that I've never really liked the idea that polyfit returns a different model, depending on the number of outputs. That seems to confuse people. It is what it is though.
dpb
dpb il 9 Giu 2018
" I've never really liked the idea that polyfit returns a different model, depending on the number of outputs."
+235.7 :) But, my distaste aside, compatibility is more important by far, yes. But, yes, also, it is confusing.
Which other commands are sensitive for the number of inputs?
  • size :
x = rand(2,3,4);
s = size(x) % s=2
[s, ~] = size(x) % s=2
[s, t] = size(x) % s=2, t=12
[s, t, ~] = size(x) % s=2, t=3
  • ode45: With 1 output, the struct SOL is replied, not t . In addition it is ignored if tspan is a vector (bug report submitted, problem tested with R2009a and R2016b only).
  • Further commands?
dpb
dpb il 9 Giu 2018
I don't have any to add otomh; probably there are some others.
That the list is short is indicative it's not a mainstream coding paradigm (fortunately :) ).
SIZE() of course is at least discernible as to what one gets and why; none of those should be a major surprise.
The ODE solvers have part of the same heritage that their implementation came "way back when" before there was any such idea as OOP implemented and to create a struct for the deval function and to "package" the solution was about the only implementation choice at the time it was introduced (when, precisely, struct came into being I don't know exactly, somewhere between 1992 (printed doc doesn't include a release number) and Release 5 (~1999); in R5 ODEmn doesn't yet have the SOL output even though struct does exist so it's of some vintage but not totally ancient but was, at the time, the only real option.
I'd assert that polyfit which goes even farther back, just is a poor choice for how to have designed the interface because as John notes, it's a model difference based on an output variable specification rather than an input and rather than just either a rearrangement or suppression of the same data as in SIZE() or presentation of results for ODEmn which is a fundamentally different animal and could have been done differently with existing facilities.
But, again, while it was unfortunate choice, history is history and it shouldn't be broken (I think it should be deprecated and replaced, myself, but it's again not one of those items that's significant enough so as to be worth it in the larger scheme of what should be worked on).
$0.02, imo, ymmv, etc., etc., etc., ..., of course :)
Guillaume
Guillaume il 9 Giu 2018
find is another function that changes behaviour depending on the number of outputs requested (linear vs 2d indexing).
Walter Roberson
Walter Roberson il 9 Giu 2018
solve() returns a struct if it has a single output.
There are a number of routines that if not given any output plot something, but if given an output instead compute and return something without plotting. hist() is an example.
Thanks @ dpb and the others for your detailed explanation. I was not aware this 'special feature' is a result from Matlab programming history a few decades ago.
Just some comments:
- I used the tilde for function calls that have more than one output, just to remember it has more outputs than the requested ones. Maybe this is just a (bad) habit. In my optinion this somehow improves the readability of the code. When someone else (usually occasionally non-expert Matlab-user just like me) works with my code, tilde shows him that the function would offer more than the requested output.
- I was aware that there are functions whose outputs depend on how they are called. Like for 'size' or others mentioned in the comments above. However, up to now I did not encounter a function that chagnes the output values like polyfit(), depending on nargout. I excpected the regression coefficients in p should always be the same. My fault was to compare p from sole polyfit call
p1 = polyfit(x,y,1)
towards p from tilde polyfit call (in this case p refers to altered x data).
[p2,~,~] = polyfit(x,y,1)
When you call polyval with p2 it is necessary to feed it with these mu-scaled x data:
% x,y just some fantasy data
[~,~,mu]=polyfit(x,y,1);
plot(x,y,'ok',x,polyval(p1,x),'-r',x,polyval(p2,(x-mu(1))/mu(2)),'--b')
I hope I summed up correctly and this helps someone else someday. Have a nice day!
dpb
dpb il 10 Giu 2018
Modificato: dpb il 13 Giu 2018
Ah! A reason for using tilde I hadn't thought of...some sense to that altho I suspect you'll grow tired of it with experience. :)
Speaking of which and the previous discussion, anybody recall when ~ was introduced--it's a relatively new feature altho I don't recall just when and was too lazy to go try to look it up. With the newer machine now I don't have the older releases still installed earlier than R2014b.
ADDENDUM
The precis for polyfit is accurate; other than the inconsistency in model based on whether the output argument is/isn't given (to normalize should be a user input imo even granted that one somewhere needs the normalizing statistics returned), my complaint extends beyond polyfit to polyval that then requires the user to do the explicit standardization instead of optional input.
As noted, all in all the pair really should be updated/modernized but without all the excess baggage of the objects in the toolboxen so they're still "lean 'n mean" and easy to interface which is all that is wanted/needed more often than not.
Jan
Jan il 10 Giu 2018
The ~ was introduced in R2009b.
Guillaume
Guillaume il 10 Giu 2018
I used the tilde for function calls that have more than one output, just to remember it has more outputs than the requested ones
For some functions this is not a good idea and will slow down your code. If you don't request the output at all, some functions will not go through the process of calculating the extra outputs. By using ~ you force the function to calculate the output, which you then discard.

Accedi per commentare.

Più risposte (0)

Richiesto:

il 8 Giu 2018

Modificato:

dpb
il 13 Giu 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by