MATLAB Answers

Tobias
0

How can I replace a cell array with numeric array for a big dataset?

Asked by Tobias
on 24 Jun 2019 at 14:23
Latest activity Commented on by Tobias
on 24 Jun 2019 at 16:13
Hey all,
I've got to analyse some financial data which I got in a an Excel Spredsheet. To test my code I use a spredsheet with only one dataset which contains 4 columns. The first one has the date (dd-mmm-jjjj), the other three got numeric data from shares (like share price). Later on there will be much more, since I do analyse the data of a whole stock market for a longer time period.
This is how I started and where I got my first problem:
dataFromTable=readtable('Test_Apple.xlsx')
after getting the 2nd and 3rd columns as string arrays I tried this:
dataFromTable=importdata('Test_Apple.xlsx')
dataFromTable=dataFromTable.data
but this gave the numbers in a way out I didn't expected. The matrix did change the numbers somehow from 3,000,000.00 to 0.3*10^7 for smaller numbers it changed to 0.0000 which isn't helpful, therfore I wanted to stay with readtable, since I want to use the date column as well it seems necessary to do so.
The table did show the numbers correct, but the 2nd and 3rd columns where given as cell arrays. Changing those with cell2mat for each column helped to get numeric arrays, but only if I did it for a few rows, otherwise I got the Error "Error using cat Dimensions of arrays being concatenated are not consistent." Is this because of the NaN in the Dataset?
This gave the error:
dataFromTable.APPLE_TURNOVERBYVALUE=cell2mat(dataFromTable.APPLE_TURNOVERBYVALUE)
This worked without an error, but doesn't help since it isn't the whole dataset:
dataFromTable=dataFromTable(1:4,1:end)
dataFromTable.APPLE_TURNOVERBYVALUE=cell2mat(dataFromTable.APPLE_TURNOVERBYVALUE)
dataFromTable.APPLE_TURNOVERBYVOLUME=cell2mat(dataFromTable.APPLE_TURNOVERBYVOLUME)
This helped to get rid of the cell arrays and works for the example, but if there are hundreds of shares I'm gonna analyse it should be easier somehow - but I didn't figure out how yet. Also it did not work for the whole dataset, somehow it should be possible to replace the cell arrays in the table with numeric arrays or get the correct type of arrays right away with readtable.
Any help is much appreciated!

  0 Comments

Sign in to comment.

1 Answer

Answer by Guillaume
on 24 Jun 2019 at 14:37
 Accepted Answer

I'm a bit unclear on why you're trying to use cell2mat. Once you've imported the data into a table then you should be using table tools, not cell array tools.
If it's because readtable made a mistake in the import, then there are plenty of ways to give hints to readtable so it does the job correctly. There's plenty of names values pairs arguments and if all else fail you can use detectImportOptions.
In your case, to avoid readtable importing numbers as text (because the columns contain 'NA'), tell it that 'NA' means invalid:
dataFromTable = readtable('Test_Apple.xlsx', 'TreatAsEmpty', 'NA')
If on R2019a, you may want to use readtimetable instead of readtable. On earlier versions, I would recommend that you convert to a timetable after readtable:
dataFromTable = readtimetable('Test_Apple.xlsx', 'TreatAsEmpty', 'NA') %R2019a and onward
dataFromTable = table2timetable(readtable('Test_Apple.xlsx', 'TreatAsEmpty', 'NA')) %before R2019a

  1 Comment

it worked and I'm gonna check out how to work with timetable, thx! :)
just didn't know better, that's why :(

Sign in to comment.