Using multiple datasets to fit parameters simultaneously in SimBiology
2 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Mukti
il 14 Mar 2025
Modificato: Arthur Goldsipe
il 17 Mar 2025
I want to fit a PK model with multiple datasets; every dataset has concentration time courses for different species in the model - how do I do this? The time points in each dataset are not consistent, if that matters. I'm using MATLAB R2024b and the SimBiology Model Analyzer app.
My model has multiple compartments, and Compartment1 has two species called "RNA" and "PROTEIN".
The datasets look something like this:
Dataset1 which corresponds to RNA values in the plasma:

Dataset2 which corresponds to protein levels in the plasma:

I want to fit the model parameters to both the datasets, where I'm mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".
3 Commenti
Arthur Goldsipe
il 15 Mar 2025
Are the model's initial conditions the same for both experiments? In other words, once you fit your model, would you need to do a single simulation or two separate simulations to predict these two concentrations?
Risposta accettata
Arthur Goldsipe
il 15 Mar 2025
Modificato: Arthur Goldsipe
il 17 Mar 2025
You first need to decide whether these two concentration profiles should be treated as part of the same experiment/simulation.
If so, then you need to merge them into a single time course, using NaN to indicate missing measurements (presumably the same way you're using . at time 0). If you want to do that programmatically, you can use MATLAB's join operations. Here's what the merged data might look like using the first 4 rows of your datasets:
rna = table([0;0.08;0.24;0.49], [nan;17.11;8.22;18.6], VariableNames=["Time", "Plasma_RNA"] );
protein = table([0;0.24;1.91;3.1], [nan;10;97.1;90.1], VariableNames=["Time", "Serium_protein"]);
joinedData = outerjoin(rna,protein,Keys="Time",MergeKeys=true)
If they're different experiments, you will just need to stack them and add a grouping variable to indicate which measurment belongs to which experiment. Here's what that would look like using the first 4 rows of your datasets:
rna_id = [table(repmat(1,height(rna), 1), VariableNames="ID"), rna ];
protein_id = [table(repmat(2,height(protein),1), VariableNames="ID"), protein];
stackedData = outerjoin(rna_id,protein_id,Keys=["ID","Time"],MergeKeys=true)
Once you have the data in one of these forms, you can perform the fit in SimBiology using sbiofit or the Model Analyzer app.
0 Commenti
Più risposte (2)
Arthur Goldsipe
il 14 Mar 2025
SimBiology users typically do this by merging the multiple datasets into a single dataset and fitting them constructing an apprporiate fit problem.
If you need more guidance on that, take a look at previous similar questions:
- https://www.mathworks.com/matlabcentral/answers/1765640-how-do-i-fit-pk-models-to-multiple-dose-datasets-using-simbiology-specifically-using-the-command-li
- https://www.mathworks.com/matlabcentral/answers/379867-in-simbiology-is-there-any-way-to-fit-multiple-data-sets-to-the-model-with-a-single-set-of-paramete
- https://www.mathworks.com/matlabcentral/answers/1788540-pooled-data-fit-of-multiple-data-sets-from-different-model-parameters
If you still have remaining questions, I suggest you create a new MATLAB Answers question that provides more details. Ideally, if you could share sample code (data and model) that illustrate your situation. Also please clarify what version of MATLAB you're using and whether you are working in the SimBiology Model Analyzer app or writing your own MATLAB code.
Image Analyst
il 15 Mar 2025
Maybe I'm misunderstanding what you want to do, but why don't you combine both time vectors into a single time vector which you use to interpolate the missing times in each set using something like interp1. Then you will have values of serum and plasma at the same/common time points. Then if you want to do "mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".' you can use polyfit or fitnlm or some other fitting algorithm (see the Regression Learner app on the Apps tab of the tool ribbon) to make a transform/model relating serum to plasma.
1 Commento
Arthur Goldsipe
il 15 Mar 2025
SimBiology doesn't require measurements at the same times for all responses/species. You can just put NaN (not-a-number) in any place where you don't have a measurement.
Alternatively, SimBiology allows you to treat them as two separate time courses (requiring two different model simulations, with potentially different intial conditions or dosing). If they are different conditions, the two time courses just need to be "stacked" on top of each other, and another variable needs to be added to the data to indicate each time course. (I'll add a more complete answer for this shortly.)
Moreover, I strongly discourage interpolating values for at least two reasons:
First, interpolating could result in values that are not consistent with the underlying biology. Biological measurements are often quite noisy and highly nonlinear. So standard inpolation techniques are quite risky.
Second, adding interpolated "measurements" can bias the fitting and provide incorrect statistics in the results. For example, many statistical calculations require the degrees of freedom (dfe), which is the number of observations minuts the number of estimated parameters. Artificially inflating the number of observations will change the dfe, potentially leading to very differ parameter estimates, standard errors, and so forth.
Community
Più risposte nel SimBiology Community
Vedere anche
Categorie
Scopri di più su Scan Parameter Ranges in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!