Correction of misspelled words in data source

Question

Sandeep Kapour il 14 Apr 2021

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/802411-correction-of-misspelled-words-in-data-source

Commentato: Walter Roberson il 15 Apr 2021

Hello,

i want to extract some data and I am using the "extractAfter" function, which works very well. My data source or measurement data has some problem for example: extractAfter(data, 'Signal1')

Signal1: 5, Signal2: 6

Signal1: 6, Signal2: 5

Sinal1: 8, Signal2: 5

Signal1: 10, Sigal2: 3

The problem is that Sinal1 and Sigal2 is not spelled correctly. Is it possible to change Sinal1 to Signal1 and Sigal2 to Signal2 automatically, because my data is very large. I am using the MATLAB version 2019b.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Cris LaPierre il 14 Apr 2021

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/802411-correction-of-misspelled-words-in-data-source#answer_675326

Apri in MATLAB Online

If you are using extractAfter, your data must be text. If so, have you tried using the replace function?

data = ["Signal1: 5, Signal2: 6";"Signal1: 6, Signal2: 5";"Sinal1: 8, Signal2: 5";"Signal1: 10, Sigal2: 3"]
data = 4×1 string array
    "Signal1: 5, Signal2: 6"
    "Signal1: 6, Signal2: 5"
    "Sinal1: 8, Signal2: 5"
    "Signal1: 10, Sigal2: 3"
replace(data,["Sinal1","Sigal2"],["Signal1","Signal2"])
ans = 4×1 string array
    "Signal1: 5, Signal2: 6"
    "Signal1: 6, Signal2: 5"
    "Signal1: 8, Signal2: 5"
    "Signal1: 10, Signal2: 3"

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Cris LaPierre il 14 Apr 2021

Modificato: Cris LaPierre il 14 Apr 2021

Apri in MATLAB Online

I'd suggest replaceBetween with pattern matching, but unfortunately, pattern was introduced in 20b. Since you are on 19b, try using regexprep instead.

data = ["Signal1: 5, Signal2: 6";"Signal1: 6, Signal2: 5";"Sinal1: 8, Signal2: 5";"Signal1: 10, Sigal2: 3"];
newStr = regexprep(data,'\<S\w*(?=1:|2:)',"Signal")
newStr = 4×1 string array
    "Signal1: 5, Signal2: 6"
    "Signal1: 6, Signal2: 5"
    "Signal1: 8, Signal2: 5"
    "Signal1: 10, Signal2: 3"

Walter Roberson il 15 Apr 2021

Apri in MATLAB Online

Variants:

data = ["Signal1: 5, Signal2: 6";"Signal1: 6, Signal2: 5";"Sinal1: 8, Signal2: 5";"Signal1: 10, Sigal2: 3"];
newStr = regexprep(data,'\<S\w+(\d+):','Signal$1:')
newStr = 4×1 string array
    "Signal1: 5, Signal2: 6"
    "Signal1: 6, Signal2: 5"
    "Signal1: 8, Signal2: 5"
    "Signal1: 10, Signal2: 3"
newStr = regexprep(data,'\<S\w+(?=\d+:)','Signal')
newStr = 4×1 string array
    "Signal1: 5, Signal2: 6"
    "Signal1: 6, Signal2: 5"
    "Signal1: 8, Signal2: 5"
    "Signal1: 10, Signal2: 3"
newStr = regexprep(data, '\<S\D+', 'Signal')
newStr = 4×1 string array
    "Signal1: 5, Signal2: 6"
    "Signal1: 6, Signal2: 5"
    "Signal1: 8, Signal2: 5"
    "Signal1: 10, Signal2: 3"

The first of the variations explicitly keeps a sequence of digits and drops it at the end of 'Signal'. The characters up to that point must be "word building characters", which are the letters and the digits and underscore. For example 'S1gn_l2' would match but not 'S1gn-l1' because '-' is not "word-building"

The second of the variations stops the search when it finds digits followed by colon, and replaces up to there. It differs from Cris's suggestion in that it handles any sequence of digits, not just '1' or '2'. Again the characters matched must be "word-building"

The third of the variations matches any non-digit after the S, stopping at the first digit. For example 'S1gn_l2' would stop matching between the S and the 1, and 'S!gn-l2' would be happily matched. But 'Signal: 5' with the digit missing before the colon woud be replaced with 'Signal5', and if the input were a continuous character string instead of a cell array of character vectors or a string array, then \D+ would be happy to cross line boundaries to find digits it expected. For example: 'Signal?: Nan\nSignal1: 5' would get replaced by 'Signal5' because as far as \D+ is concerned, newline is a valid non-digit character.... but as you can see, the code is shorter and sometimes your variations to be matched are well-defined and you can get away with it.

Accedi per commentare.

Answer 2

Walter Roberson il 14 Apr 2021

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/802411-correction-of-misspelled-words-in-data-source#answer_675331

Apri in MATLAB Online

If you are dealing with a text file, I would suggest rewriting in terms of regexp() with named tokens

S = sprintf('Signal1: 5, Signal2: 6.2\nSignal1: 6e-3, Signal2: 5\nSinal1: 8, Signal2: 5\nSignal1: 10, Sigal2: 3')
S = 
    'Signal1: 5, Signal2: 6.2
     Signal1: 6e-3, Signal2: 5
     Sinal1: 8, Signal2: 5
     Signal1: 10, Sigal2: 3'
parts = regexp(S, 'Sig?n?al1: (?<s1>[\d.eE+-]+), Sig?n?al2: (?<s2>[\d.eE+-]+)', 'names')
parts = 1×4 struct array with fields:
    s1
    s2
s1 = str2double({parts.s1})
s1 = 1×4
    5.0000    0.0060    8.0000   10.0000
s2 = str2double({parts.s2})
s2 = 1×4
    6.2000    5.0000    5.0000    3.0000

Your example only shows integer values. If that is all that is permitted, then change the

[\d.eE+-]+

to

\d+

The version I coded permits positive and negative values and decimals and exponentiation using either 'e' or 'E' ... but does not permit complex numbers.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Correction of misspelled words in data source

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (2)

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

Correction of misspelled words in data source

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (2)

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti