Handling Undesirable Characters in Numeric Columns When Reading a CSV File

2 visualizzazioni (ultimi 30 giorni)
Dear all,
I need to read a .csv file which has some numeric columns that contain undesirable characters, so when I import it using readtable, the lines with these characters ( [u.]) are showed as NAN. Is it possible to remove it before? Or maybe read it as char, then remove and convert to double?
Here some lines of the file:
  3 Commenti
Stephen23
Stephen23 il 31 Gen 2024
Modificato: Stephen23 il 31 Gen 2024
Perhaps you could define square brackets as a comment.
In any case, please upload a sample data file by clicking the paperclip button.

Accedi per commentare.

Risposta accettata

Stephen23
Stephen23 il 31 Gen 2024
Modificato: Stephen23 il 31 Gen 2024
You should NOT make changes to the original data file.
fnm = 'p1.csv';
opt = detectImportOptions(fnm, 'FileType','delimitedtext',...
'Delimiter',';', 'DecimalSeparator',',', 'VariableNamingRule','preserve');
opt = setvartype(opt,'double');
opt = setvartype(opt,'MoveStop','datetime');
opt = setvartype(opt,'Operador','string');
idx = strcmp('double',opt.VariableTypes);
opt = setvaropts(opt,idx,'TrimNonNumeric',true);
tbl = readtable(fnm,opt)
tbl = 3389×19 table
MoveStop Duração [s] Modo Direção Tamanho Peso [t] Peso Esquerdo 1 [t] Peso Direito 1 [t] Peso Esquerdo 2 [t] Peso Direito 2 [t] Transl Pos Inicial [mm] Carro Pos Inicial [mm] Elev Pos Inicial [mm] Transl Pos Final [mm] Carro Pos Final [mm] Elev Pos Final [mm] Número_OS Operador Move Status ____________________ ___________ ____ _______ _______ ________ ___________________ __________________ ___________________ __________________ _______________________ ______________________ _____________________ _____________________ ____________________ ___________________ _________ _________ ___________ 01-Oct-2023 05:25:57 328 1 32 32 37 18 19 0 0 73496 3286.9 4866.2 1.6556e+05 5418.2 10321 0 "lcosta" 328 01-Oct-2023 05:30:57 262 1 0 40 3 -2 5 0 0 1.6556e+05 5420.9 94.5 1.5267e+05 10489 8663.6 0 "lcosta" 262 01-Oct-2023 05:37:58 291 1 0 40 2 1 1 0 0 1.5267e+05 10748 93.88 73397 2735.3 6752.4 0 "Jborges" 291 01-Oct-2023 05:43:41 191 1 32 40 45 21 24 0 0 73398 2569.3 6604 1.5291e+05 10936 8771.6 0 "Jborges" 191 01-Oct-2023 05:47:40 206 1 0 40 2 1 2 0 0 1.5291e+05 10936 94.88 73833 11400 6766.8 0 "Jborges" 206 01-Oct-2023 05:52:57 197 1 32 40 45 22 24 0 0 73833 11358 6605.4 1.5276e+05 11357 8774.2 0 "Jborges" 197 01-Oct-2023 06:00:46 440 1 0 32 2 1 2 0 0 1.5276e+05 11356 94.13 32782 3252.8 4939.9 0 "Jborges" 440 01-Oct-2023 06:06:00 302 1 32 32 37 18 19 0 0 32781 3253.7 4749.9 1.6562e+05 7309 10294 0 "Jborges" 302 01-Oct-2023 06:16:12 556 1 0 40 3 -3 6 0 0 1.6562e+05 7305.4 95.5 74087 20369 7048.1 0 "Jborges" 556 01-Oct-2023 06:26:37 590 1 32 40 45 23 22 0 0 74087 20367 6639.9 1.5332e+05 10992 8760.6 0 "Jborges" 590 01-Oct-2023 06:49:39 860 1 0 40 2 1 1 0 0 1.5332e+05 10992 96.13 63814 19000 2607 0 "Giba" 860 01-Oct-2023 06:56:38 396 1 32 40 45 22 24 0 0 63814 19000 2445.6 1.5356e+05 11426 8684.8 0 "Giba" 396 01-Oct-2023 07:00:33 211 1 0 40 2 1 1 0 0 1.5356e+05 11425 94.5 63662 3824.2 4518.4 0 "Giba" 211 01-Oct-2023 07:04:27 212 1 32 40 45 21 24 0 0 63662 3825.1 4356.5 1.5315e+05 11151 8623.2 0 "Giba" 212 01-Oct-2023 07:08:34 212 1 0 40 2 1 1 0 0 1.5315e+05 11150 99.88 63873 19470 4531.9 0 "Giba" 212 01-Oct-2023 07:22:08 693 1 32 40 45 23 23 0 0 63873 19719 4344.2 1.6813e+05 2547.8 10187 0 "Giba" 693
tbl = convertvars(tbl,@isstring,@(s)regexprep(s,'\s+\[.*','')) % optional
tbl = 3389×19 table
MoveStop Duração [s] Modo Direção Tamanho Peso [t] Peso Esquerdo 1 [t] Peso Direito 1 [t] Peso Esquerdo 2 [t] Peso Direito 2 [t] Transl Pos Inicial [mm] Carro Pos Inicial [mm] Elev Pos Inicial [mm] Transl Pos Final [mm] Carro Pos Final [mm] Elev Pos Final [mm] Número_OS Operador Move Status ____________________ ___________ ____ _______ _______ ________ ___________________ __________________ ___________________ __________________ _______________________ ______________________ _____________________ _____________________ ____________________ ___________________ _________ _________ ___________ 01-Oct-2023 05:25:57 328 1 32 32 37 18 19 0 0 73496 3286.9 4866.2 1.6556e+05 5418.2 10321 0 "lcosta" 328 01-Oct-2023 05:30:57 262 1 0 40 3 -2 5 0 0 1.6556e+05 5420.9 94.5 1.5267e+05 10489 8663.6 0 "lcosta" 262 01-Oct-2023 05:37:58 291 1 0 40 2 1 1 0 0 1.5267e+05 10748 93.88 73397 2735.3 6752.4 0 "Jborges" 291 01-Oct-2023 05:43:41 191 1 32 40 45 21 24 0 0 73398 2569.3 6604 1.5291e+05 10936 8771.6 0 "Jborges" 191 01-Oct-2023 05:47:40 206 1 0 40 2 1 2 0 0 1.5291e+05 10936 94.88 73833 11400 6766.8 0 "Jborges" 206 01-Oct-2023 05:52:57 197 1 32 40 45 22 24 0 0 73833 11358 6605.4 1.5276e+05 11357 8774.2 0 "Jborges" 197 01-Oct-2023 06:00:46 440 1 0 32 2 1 2 0 0 1.5276e+05 11356 94.13 32782 3252.8 4939.9 0 "Jborges" 440 01-Oct-2023 06:06:00 302 1 32 32 37 18 19 0 0 32781 3253.7 4749.9 1.6562e+05 7309 10294 0 "Jborges" 302 01-Oct-2023 06:16:12 556 1 0 40 3 -3 6 0 0 1.6562e+05 7305.4 95.5 74087 20369 7048.1 0 "Jborges" 556 01-Oct-2023 06:26:37 590 1 32 40 45 23 22 0 0 74087 20367 6639.9 1.5332e+05 10992 8760.6 0 "Jborges" 590 01-Oct-2023 06:49:39 860 1 0 40 2 1 1 0 0 1.5332e+05 10992 96.13 63814 19000 2607 0 "Giba" 860 01-Oct-2023 06:56:38 396 1 32 40 45 22 24 0 0 63814 19000 2445.6 1.5356e+05 11426 8684.8 0 "Giba" 396 01-Oct-2023 07:00:33 211 1 0 40 2 1 1 0 0 1.5356e+05 11425 94.5 63662 3824.2 4518.4 0 "Giba" 211 01-Oct-2023 07:04:27 212 1 32 40 45 21 24 0 0 63662 3825.1 4356.5 1.5315e+05 11151 8623.2 0 "Giba" 212 01-Oct-2023 07:08:34 212 1 0 40 2 1 1 0 0 1.5315e+05 11150 99.88 63873 19470 4531.9 0 "Giba" 212 01-Oct-2023 07:22:08 693 1 32 40 45 23 23 0 0 63873 19719 4344.2 1.6813e+05 2547.8 10187 0 "Giba" 693
tbl(180:190,:) % checking the rows where [u.] first appears:
ans = 11×19 table
MoveStop Duração [s] Modo Direção Tamanho Peso [t] Peso Esquerdo 1 [t] Peso Direito 1 [t] Peso Esquerdo 2 [t] Peso Direito 2 [t] Transl Pos Inicial [mm] Carro Pos Inicial [mm] Elev Pos Inicial [mm] Transl Pos Final [mm] Carro Pos Final [mm] Elev Pos Final [mm] Número_OS Operador Move Status ____________________ ___________ ____ _______ _______ ________ ___________________ __________________ ___________________ __________________ _______________________ ______________________ _____________________ _____________________ ____________________ ___________________ __________ _________ ___________ 02-Oct-2023 02:04:27 3854 2 0 40 2 1 1 0 0 63764 11557 93.63 24777 11689 8665.8 2.1574e+05 "Jborges" 3854 02-Oct-2023 02:08:16 199 2 16 40 45 23 22 0 0 24777 11694 8377.2 63538 3200.8 10340 2.1574e+05 "Jborges" 199 02-Oct-2023 02:11:45 165 2 0 40 2 1 1 0 0 63537 3203.5 94.38 45901 11403 6518.2 0 "Jborges" 165 02-Oct-2023 02:26:48 0 2 16 40 2 1 1 0 0 1.4162e+05 11402 32.5 24373 2823.2 8547.4 2.1576e+05 "Jborges" 2.8323e-317 02-Oct-2023 02:31:22 204 2 16 40 45 21 25 0 0 24374 2819.6 8440.5 64000 20097 8528.5 2.1576e+05 "Jborges" 204 02-Oct-2023 02:36:50 0 2 0 40 2 1 1 0 0 64000 20095 94.63 25073 20399 10339 2.1577e+05 "Jborges" 2.8323e-317 02-Oct-2023 02:40:31 150 2 16 40 45 23 22 0 0 25073 20398 10172 63747 11602 8764.4 2.1577e+05 "Jborges" 150 02-Oct-2023 02:44:49 0 2 0 40 2 1 1 0 0 63747 11603 37.75 24722 11622 10342 2.1577e+05 "Jborges" 2.8323e-317 02-Oct-2023 02:48:56 164 2 16 40 45 23 22 0 0 24722 11625 10238 63458 3208.9 8733.1 2.1577e+05 "Jborges" 164 02-Oct-2023 02:53:25 0 2 0 40 2 1 1 0 0 63795 3208 27.75 24403 2856.4 10354 2.1578e+05 "Jborges" 2.8325e-317 02-Oct-2023 02:58:08 0 2 16 40 46 22 23 0 0 24403 2856.4 10200 64062 20212 6817.1 2.1578e+05 "Jborges" 2.8323e-317

Più risposte (1)

Austin M. Weber
Austin M. Weber il 31 Gen 2024
Modificato: Austin M. Weber il 31 Gen 2024
If you have Microsoft Excel (or Google Sheets) you can use the Replace Tool to get rid of the [u.] in all of the columns. Simply open your CSV file in Excel, choose the "Replace" tool (I think the shortcut in Windows is Ctrl+H). A dialog box should appear. In the "Find what:" box type [u.] and leave the "Replace with:" box blank. Then click the "Replace All" button. This should get rid of all instances of [u.] in your spreadsheet. Just save the CSV and then you can load it into MATLAB as normal.
Alternatively, you could probably also do this in MATLAB by converting the numeric columns into strings and then using the strrep function:
% Read table data
T = readtable('table_name.csv')
% Extract the numeric columns in the table
numeric_data = T{:,2:10};
% Convert to strings
numeric_strings = string(numeric_data);
% Delete [u.]
numeric_strings = strrep(numeric_strings, '[u.]','');
% Convert back to numeric data
numeric_data = str2num(numeric_strings);
% Insert back into the table
T{:,2:10} = numeric_data;
Try either of these and let me know if they work.
  5 Commenti
Austin M. Weber
Austin M. Weber il 31 Gen 2024
@Geovane Gomes, I did not see that you had shared your CSV file before I submitted my original answer. But I just I did the Excel procedure with the file and it seems to work perfectly. When I finished, I saved the file as a tab-delimited text file rather than a CSV so that the commas in your numbers aren't confused as delimiters. You can load the file in MATLAB by typing:
T = readtable('p1_tab_delimited.txt','Delimiter','tab');
Stephen23
Stephen23 il 31 Gen 2024
"I think I'll have to remove it using another tool before import."
Not required, you can use the import options.

Accedi per commentare.

Tag

Prodotti


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by