Categorical Data preprocessing for Data mining
2 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello friends
I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.
Oh, thanks
I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.
For instance, when trying to correct those for world bank I tried this expression which is still failing
pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');
Here i was testing the expression in Atom, but it fails to correctly replace those selected words
However, I am still wondering if there could be another "faster" way of approaching the issue!
1 Commento
Risposte (0)
Vedere anche
Categorie
Scopri di più su Analysis of Variance and Covariance in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!