How to reformat this text input-file into this output_file?

Hello guys,
Anyone can help me to reformat this input text file into this corresponding output file.
More specifically, this input file has multiple rows and 5 columns, column 4 have overlapped samples and every sample have many genes from column 5 and many values from column 1 as well. So, i want to create a new file which can show the list of genes correspond to every sample in column 4 and their corresponding values from column 1.
Note: I want to compare only the first 16 digits of every sample with the rest of the other samples.
The input_file
The output file
The data is too large to share it, and thus i just post an example from it.

9 Commenti

I would have to think some more before I could give an idea of how to fully reorganize this, but you could get your list of unique gene codes, and overlapped samples by using unique().
genes = data(unique(data(:,5)),5);
I haven't tested that line, just gives an idea of where to start.
@Bob Nbob, Thanks for yr suggestion, the genes are repeated but with different values as you can see in column1 and at different samples as shown in column 4.
Hope anyone can help me to reorganize this file.
Thanks,
The intention of the two uniques would be to get your column and row headers (i.e. all of the genes and all of the samples). One slow way to set it up after that would be to run a double for loop to fill in values which correspond to the sample and gene of your new table. I'm certain there are better ways of doing this, but here is a sample of what I was thinking.
genes = data(unique(data(:,5)),5);
samples = data(unique(data(:,4)),4);
newdata = cell(size(samples,1)+1,size(genes,1)+1);
newdata(1,2:end) = genes;
newdata(2:end,1) = samples;
for j = 2:size(newdata,1);
for k = 2:size(newdata,2);
if exists(data(data(:,4)==newdata(j,1)&data(:,5)==newdata(1,k),1))
newdata(j,k) = data(data(:,4)==newdata(j,1)&data(:,5)==newdata(1,k),1);
else
newdata(j,k) = [];
end
end
end
It's going to be really inefficient, and you will probably have to mess around with some cell function stuff, but again, it's a starting point for the idea I had.
@Bob Nbob Thanks for willing to help me, i try your piece of code but doesn't work and here the error is shown:
Error using subsindex Function 'subsindex' is not defined for values of class 'cell'.
Error in final_regions (line 7) genes = data(unique(data(:,5)),5);
Actualy, i need to prepare this file for subsequent analysis but i get stuck now
Hope anyone else can help me plz!
This question you have already asked...the work around would be the same as your previous given solutions.
@KSSV approximately, but i think this case need 3 indexes: index for rows, index fo columns and the cell content that's why i get confused.
If you aren't keeping more than one value inside a cell, then you don't need another index; however, indexing for cell arrays can be a bit tricky. With cells parentheses, (), are used to call an entire cell, same as calling the normal element of an array, but curly braces, {}, are used to call the CONTENTS of that cell. Most likely you will need to edit the code I posted to incorporate curly braces to indicate the contents of the cell.
@Bob Nbob I edited yr code by using curly braces but still getting an error
@KSSV Could you plz help me again to do this job ?

Risposte (0)

Questa domanda è chiusa.

Tag

Richiesto:

il 7 Nov 2018

Chiuso:

il 11 Mar 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by