Using multiple datasets to fit parameters simultaneously in SimBiology

2 visualizzazioni (ultimi 30 giorni)
I want to fit a PK model with multiple datasets; every dataset has concentration time courses for different species in the model - how do I do this? The time points in each dataset are not consistent, if that matters. I'm using MATLAB R2024b and the SimBiology Model Analyzer app.
My model has multiple compartments, and Compartment1 has two species called "RNA" and "PROTEIN".
The datasets look something like this:
Dataset1 which corresponds to RNA values in the plasma:
Dataset2 which corresponds to protein levels in the plasma:
I want to fit the model parameters to both the datasets, where I'm mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".
  3 Commenti
Arthur Goldsipe
Arthur Goldsipe il 15 Mar 2025
Are the model's initial conditions the same for both experiments? In other words, once you fit your model, would you need to do a single simulation or two separate simulations to predict these two concentrations?
Mukti
Mukti il 17 Mar 2025
The initial conditions are the same for both experiments - I would just do one single simulation to predict these two concentrations.

Accedi per commentare.

Risposta accettata

Arthur Goldsipe
Arthur Goldsipe il 15 Mar 2025
Modificato: Arthur Goldsipe il 17 Mar 2025
You first need to decide whether these two concentration profiles should be treated as part of the same experiment/simulation.
If so, then you need to merge them into a single time course, using NaN to indicate missing measurements (presumably the same way you're using . at time 0). If you want to do that programmatically, you can use MATLAB's join operations. Here's what the merged data might look like using the first 4 rows of your datasets:
rna = table([0;0.08;0.24;0.49], [nan;17.11;8.22;18.6], VariableNames=["Time", "Plasma_RNA"] );
protein = table([0;0.24;1.91;3.1], [nan;10;97.1;90.1], VariableNames=["Time", "Serium_protein"]);
joinedData = outerjoin(rna,protein,Keys="Time",MergeKeys=true)
joinedData = 6x3 table
Time Plasma_RNA Serium_protein ____ __________ ______________ 0 NaN NaN 0.08 17.11 NaN 0.24 8.22 10 0.49 18.6 NaN 1.91 NaN 97.1 3.1 NaN 90.1
If they're different experiments, you will just need to stack them and add a grouping variable to indicate which measurment belongs to which experiment. Here's what that would look like using the first 4 rows of your datasets:
rna_id = [table(repmat(1,height(rna), 1), VariableNames="ID"), rna ];
protein_id = [table(repmat(2,height(protein),1), VariableNames="ID"), protein];
stackedData = outerjoin(rna_id,protein_id,Keys=["ID","Time"],MergeKeys=true)
stackedData = 8x4 table
ID Time Plasma_RNA Serium_protein __ ____ __________ ______________ 1 0 NaN NaN 1 0.08 17.11 NaN 1 0.24 8.22 NaN 1 0.49 18.6 NaN 2 0 NaN NaN 2 0.24 NaN 10 2 1.91 NaN 97.1 2 3.1 NaN 90.1
Once you have the data in one of these forms, you can perform the fit in SimBiology using sbiofit or the Model Analyzer app.

Più risposte (2)

Arthur Goldsipe
Arthur Goldsipe il 14 Mar 2025
SimBiology users typically do this by merging the multiple datasets into a single dataset and fitting them constructing an apprporiate fit problem.
If you need more guidance on that, take a look at previous similar questions:
If you still have remaining questions, I suggest you create a new MATLAB Answers question that provides more details. Ideally, if you could share sample code (data and model) that illustrate your situation. Also please clarify what version of MATLAB you're using and whether you are working in the SimBiology Model Analyzer app or writing your own MATLAB code.

Image Analyst
Image Analyst il 15 Mar 2025
Maybe I'm misunderstanding what you want to do, but why don't you combine both time vectors into a single time vector which you use to interpolate the missing times in each set using something like interp1. Then you will have values of serum and plasma at the same/common time points. Then if you want to do "mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".' you can use polyfit or fitnlm or some other fitting algorithm (see the Regression Learner app on the Apps tab of the tool ribbon) to make a transform/model relating serum to plasma.
  1 Commento
Arthur Goldsipe
Arthur Goldsipe il 15 Mar 2025
SimBiology doesn't require measurements at the same times for all responses/species. You can just put NaN (not-a-number) in any place where you don't have a measurement.
Alternatively, SimBiology allows you to treat them as two separate time courses (requiring two different model simulations, with potentially different intial conditions or dosing). If they are different conditions, the two time courses just need to be "stacked" on top of each other, and another variable needs to be added to the data to indicate each time course. (I'll add a more complete answer for this shortly.)
Moreover, I strongly discourage interpolating values for at least two reasons:
First, interpolating could result in values that are not consistent with the underlying biology. Biological measurements are often quite noisy and highly nonlinear. So standard inpolation techniques are quite risky.
Second, adding interpolated "measurements" can bias the fitting and provide incorrect statistics in the results. For example, many statistical calculations require the degrees of freedom (dfe), which is the number of observations minuts the number of estimated parameters. Artificially inflating the number of observations will change the dfe, potentially leading to very differ parameter estimates, standard errors, and so forth.

Accedi per commentare.

Community

Più risposte nel  SimBiology Community

Categorie

Scopri di più su Scan Parameter Ranges in Help Center e File Exchange

Prodotti


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by