How to effectively use look ahead with regexp?

Question

pietro il 26 Giu 2017

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp

Modificato: Stephen23 il 27 Giu 2017

Hi all,

I'm doing some coding with regular expressions, but there are a couple of things I can't understand. Look at the following

1. searching the letter 'r' followed by a number:

regexp('19f/4r power shift','(?<=\d*) ?r')
ans = 
  6    12
regexp('19f/4r power shift','(?<=\d)\s?r')
ans = 
    6

Why the '*' change so much the result? The 'r' at the 12th position is not followed by any number.

2- Searching for the word 'Reverser' that is not preceded by the words 'power' or 'powr'.

regexp('power  Reverser','(?<!powe?r) *-? *Reverser','match')
ans = 
    ' Reverser'

Reverser is preceded by the string 'power', so it shouldn't be selected.

Why do these occur?

Thanks

Best regards,

Pietro

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Stephen23 il 26 Giu 2017

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp#answer_271972

Modificato: Stephen23 il 26 Giu 2017

Apri in MATLAB Online

1. "searching the letter 'r' followed by a number." Actually you seem to be wanting to search for the letter 'r' preceded by a number, not "followed by". Only the second of your regexps does this. By adding the * to the first regexp you make the digits optional (the asterisk matches zero or more times!) So clearly the second r in that short string matches your first regular expression: it constitutes an 'r' preceded by zero spaces (permitted by the ?) and by zero digits (permitted by the *).

You could use + (match one or more) rather than * (match zero or more):

regexp('19f/4r power shift','(?<=\d+)\s?r')

but this is not really necessary: matching one digit is enough because if there are multiple digits then there is also one digit.

2. This is a much more subtle problem. The basic problem here is the optimism of regular expressions, and that * on the space character. What happens is that the regular expression parser keeps on trying new combinations to match as much of the string as possible, which clearly differs from how you perceive its operation (you want it to quit after matching that lookaround once).

The regular expression will correctly match 'power', but then it notices that you placed an asterisk * on the space. When it tries, for example, one space character preceding that word then your lookaround is satisfied: if it matches one space with the optional spaces ' *' regex, then the look around is also satisfied because what precedes that one space? Another space character! Therefore the lookaround is happy (one space is not equal to 'power'), and the regular expression parser is happy because it wants to match as much of the string as possible. Therefore it picks this option.

Basically what you seem to want is a pessimistic parser (you want to return no match if any one combination is a match to that lookaround, even if others do not match the lookaround), but in reality regexp parsers are optimistic: they return a match if any one combination is a match. They reject the one case that you are interested in because other cases better fulfill their basic operational principal: match as much as possible, however it can.

To see what parts of the strings are matched you should look at using a dynamic regular expression, e.g. adding:

(?@disp($1))

into your regexp and seeing how the string is parsed.

Do you really need to match an unknown number of space characters?

2 Commenti
Mostra NessunoNascondi Nessuno

pietro il 26 Giu 2017

I got it!!! thanks a lot

Stephen23 il 27 Giu 2017

Modificato: Stephen23 il 27 Giu 2017

Apri in MATLAB Online

You could move the space inside the lookaround:

>> regexp('power  Reverser','(?<!powe?r *)Reverser','match')
ans = 
     {}
>> regexp('power X Reverser','(?<!powe?r *)Reverser','match')
ans = 
    'Reverser'

Accedi per commentare.

How to effectively use look ahead with regexp?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

How to effectively use look ahead with regexp?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno