Working with very big data faster ?

Dear Matlab users,
I have to deal with very big data(Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using "textscan" function. After reading, I need to detect invalid data (points with 0,0,0 coordinates) and then I need to do some mathematical operations on each point or each line in the data. In my way, first I read data with "testscan" and then I assign this data to a matrix. Secondly, I use for loops for detecting invalid points and doing some mathematical operations on each point or line in the data. A sample of my code is shown as below. Is there a way of avoiding for loops or what is the best way of speeding up this computation? I am looking forward to hearing from you
fileID = fopen('some ascii data with more than 10 000 000 points');
original_data = textscan(fileID,'%f %f %f %f %f %f %f', 'delimiter',' ');
fclose(fileID);
column = original_data{1}(1);
row = original_data{1}(2);
t_matrix = [original_data{1}(7) original_data{2}(7) original_data{3}(7) original_data{4}(7)
original_data{1}(8) original_data{2}(8) original_data{3}(8) original_data{4}(8)
original_data{1}(9) original_data{2}(9) original_data{3}(9) original_data{4}(9)
original_data{1}(10) original_data{2}(10) original_data{3}(10) original_data{4}(10)];
coordinate_list(:,1) = original_data{1}(11:length(original_data{1}));
coordinate_list(:,2) = original_data{2}(11:length(original_data{2}));
coordinate_list(:,3) = original_data{3}(11:length(original_data{3}));
coordinate_list(:,4) = 0;
coordinate_list(:,5) = original_data{4}(11:length(original_data{4}));
%detect invalid points and transform each point with t_matrix
for i = 1:length(coordinate_list)
if coordinate_list(i,1) == 0 && coordinate_list(i,2) == 0 && coordinate_list(i,3) == 0
transformed_list(i,:) = NaN;
else
%transformed_list(i,:) = coordinate_list(i,:)*t_matrix;
transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
transformed_list(i,5) = coordinate_list(i,5);
end
i
end

6 Commenti

KSSV
KSSV il 26 Set 2016
You have not initialized transformed_list()...this makes codes slow. You must considering initializing.
Have you run the profiler on your code?
doc profile
You should always do this before making any attempt at speeding up your code, otherwise how do you know which part is taking the longest time? Assumptions are generally a very bad idea!
First of all thanks for kind answers.
I initialized "transfromed_list()" and this change speeded up my code significantly for a point cloud which stores around 1000000 points. When I take a look at "profile" tool, it is reported that "textscan" takes %37 and line
transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
takes %35 of all computation time. By the way, I tried it with another point cloud (stores around 5 500 000) and profile tool reported same results.
Any comments on alternatives of these lines ?
Thanks in advance
KSSV
KSSV il 26 Set 2016
does your text file have any texts inside? or only numbers? Can you attach a sample of the text file?
I have two types of files. First one is .xyz file and the other one is .ptx file. sample of these files are shown as below:
Sample for .xyz file:
4826 4487 22.85660000 3.72010000 1.29630000 2253
4826 4488 22.86470000 3.71390000 1.29600000 2410
4826 4489 22.87220000 3.70790000 1.29560000 2373
4826 4490 22.87940000 3.70180000 1.29590000 2420
4826 4491 22.88530000 3.69520000 1.29550000 2465
4826 4492 22.89090000 3.68890000 1.29570000 2440
4826 4493 22.89960000 3.68310000 1.29470000 2459
4826 4494 22.90710000 3.67660000 1.29580000 2477
4826 4495 22.91490000 3.67060000 1.29550000 2396
4826 4496 22.91990000 3.66420000 1.29570000 2452
4826 4497 22.92640000 3.65750000 1.30260000 2455
4826 4498 22.93360000 3.65150000 1.29560000 2473
................
Sample for .ptx file:
4418
1251
0.00000000 0.00000000 0.00000000
1.00000000 0.00000000 0.00000000
0.00000000 1.00000000 0.00000000
0.00000000 0.00000000 1.00000000
1.00000000 0.00000000 0.00000000 0.00000000
0.00000000 1.00000000 0.00000000 0.00000000
0.00000000 0.00000000 1.00000000 0.00000000
0.00000000 0.00000000 0.00000000 1.00000000
4.68590 10.13050 -0.94340 0.78212 40 40 40
4.68590 10.13040 -0.93980 0.72741 37 37 37
4.68900 10.13720 -0.93690 0.63752 33 33 33
4.69910 10.15900 -0.93540 0.57255 29 29 29
4.71100 10.18470 -0.93380 0.53249 27 27 27
...........
per isakson
per isakson il 26 Set 2016
Modificato: per isakson il 26 Set 2016
Use
textscan( ..., 'CollectOutput',true )
Neither of your two samples matches
textscan(fileID,'%f %f %f %f %f %f %f', 'delimiter',' ');

Accedi per commentare.

Risposte (1)

KSSV
KSSV il 26 Set 2016
Modificato: KSSV il 26 Set 2016
To find whether (x,y,z) are zeros, you need not to run a loop. You can find in single stretch.
id = sum(coordinate_list,2)==0 ; % this output will be logical
idx = find(sum(coordinate_list,2)==0) ; % this output will give positions where are zeros
You can achieve all the loop things with out using for loop.

Categorie

Richiesto:

il 26 Set 2016

Modificato:

il 26 Set 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by