join
(Not Recommended) Merge dataset array observations
The dataset data type is not recommended. To work with heterogeneous data,
use the MATLAB®
table data type instead. See MATLAB
table documentation for more information.
Syntax
C = join(A,B)
C = join(A,B,keys)
C = join(A,B,param1,val1,param2,val2,...)
[C,IB] = join(...)
C = join(A,B,'Type',TYPE,...)
C = join(A,B,'Type',TYPE,'MergeKeys',true,...)
[C,IA,IB] = join(A,B,'Type',TYPE,...)
Description
C = join(A,B) creates a dataset array
C by merging observations from the two dataset arrays
A and B. join performs
the merge by first finding key variables, that is, pairs of dataset
variables, one in A and one in B, that share the
same name. Each observation in B must contain a unique combination of
values in the key variables, and must contain all combinations of values that are
present in the keys from A. join then uses these
key variables to define a many-to-one correspondence between observations in
A and those in B. join
uses this correspondence to replicate the observations in B and
combine them with the observations in A to create
C.
C = join(A,B,keys) performs the merge using
the variables specified by keys as the key variables in both
A and B. keys is a positive
integer, a vector of positive integers, a character vector, a string array, a cell array
of character vectors, or a logical vector.
C contains one observation for each observation in
A. Variables in C include all of the variables
from A, as well as one variable corresponding to each variable in
B (except for the keys from B). If
A and B contain variables with identical
names, join adds the suffix '_left' and
'_right' to the corresponding variables in
C.
C = join(A,B,
specifies optional parameter name/value pairs to control how the dataset variables in
param1,val1,param2,val2,...)A and B are used in the merge. Parameters
are:
'Keys'— Specifies the variables to use as keys in bothAandB.'LeftKeys'— Specifies the variables to use as keys inA.'RightKeys'— Specifies the variables to use as keys inB.
You may provide either the 'Keys' parameter, or both the
'LeftKeys' and 'RightKeys' parameters. The
value for these parameters is a positive integer, a vector of positive integers, a
character vector, a string array, a cell array of character vectors, or a logical
vector. 'LeftKeys' or 'RightKeys' must both
specify the same number of key variables, and join pairs the left and
right keys in the order specified.
'LeftVars'— Specifies which variables fromAto include inC. By default,joinincludes all variables fromA.'RightVars'— Specifies which variables fromBto include inC. By default,joinincludes all variables fromBexcept the key variables.
You can use 'LeftVars' or 'RightVars' to include
or exclude key variables as well as data variables. The value for these parameters is a
positive integer, a vector of positive integers, a character vector, a string array, a
cell array of character vectors, or a logical vector.
[C,IB] = join(...) returns an index vector
IB, where join constructs C
by horizontally concatenating A(:,LeftVars) and
B(IB,RightVars). join can also perform more
complicated inner and outer join operations that allow a many-to-many correspondence
between A and B, and allow unmatched observations
in either A or B.
C = join(A,B,'Type',TYPE,...) performs the
join operation specified by TYPE. TYPE is one of
'inner', 'leftouter',
'rightouter', 'fullouter', or
'outer' (which is a synonym for 'fullouter').
For an inner join, C only contains observations corresponding to a
combination of key values that occurred in both A and
B. For a left (or right) outer join, C also
contains observations corresponding to keys in A (or
B) that did not match any in B (or
A). Variables in C taken from
A (or B) contain null values in those
observations. A full outer join is equivalent to a left and right outer join.
C contains variables corresponding to the key variables from both
A and B, and join sorts the
observations in C by the key values.
For inner and outer joins, C contains variables corresponding to
the key variables from both A and B by default, as
well as all the remaining variables. join sorts the observations in
the result C by the key values.
C = join(A,B,'Type',TYPE,'MergeKeys',true,...) includes a single
variable in C for each key variable pair from A
and B, rather than including two separate variables. For outer joins,
join creates the single variable by merging the key values from
A and B, taking values from
A where a corresponding observation exists in
A, and from B otherwise. Setting the
'MergeKeys' parameter to true overrides
inclusion or exclusion of any key variables specified via the
'LeftVars' or 'RightVars' parameter. Setting
the 'MergeKeys' parameter to false is equivalent
to not passing in the 'MergeKeys' parameter.
[C,IA,IB] = join(A,B,'Type',TYPE,...) returns
index vectors IA and IB indicating the
correspondence between observations in C and those in
A and B. For an inner join,
join constructs C by horizontally
concatenating A(IA,LeftVars) and B(IB,RightVars).
For an outer join, IA or IB may also contain
zeros, indicating the observations in C that do not correspond to
observations in A or B, respectively.
Examples
Create a dataset array from Fisher's iris data:
load fisheriris
NumObs = size(meas,1);
NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d'));
iris = dataset({nominal(species),'species'},...
{meas,'SL','SW','PL','PW'},...
'ObsNames',NameObs);
Create a separate dataset array with the diploid chromosome counts for each species of iris:
snames = nominal({'setosa';'versicolor';'virginica'});
CC = dataset({snames,'species'},{[38;108;70],'cc'})
CC =
species cc
setosa 38
versicolor 108
virginica 70Broadcast the data in CC to the rows of iris
using the key variable species in each dataset:
iris2 = join(iris,CC);
iris2([1 2 51 52 101 102],:)
ans =
species SL SW PL PW cc
Obs1 setosa 5.1 3.5 1.4 0.2 38
Obs2 setosa 4.9 3 1.4 0.2 38
Obs51 versicolor 7 3.2 4.7 1.4 108
Obs52 versicolor 6.4 3.2 4.5 1.5 108
Obs101 virginica 6.3 3.3 6 2.5 70
Obs102 virginica 5.8 2.7 5.1 1.9 70Create two datasets and join them using the 'MergeKeys'
flag:
% Create two data sets that both contain the key variable
% 'Key1'. The two arrays contain observations with common
% values of Key1, but each array also contains observations
% with values of Key1 not present in the other.
a = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',...
'VarNames',{'Key1' 'Var1'})
b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',...
'VarNames',{'Key1' 'Var2'})
% Combine a and b with an outer join, which matches up
% observations with common key values, but also retains
% observations whose key values don't have a match.
% Keep the key values as separate variables in the result.
couter = join(a,b,'key','Key1','Type','outer')
% Join a and b, merging the key values as a single variable
% in the result.
coutermerge = join(a,b,'key','Key1','Type','outer',...
'MergeKeys',true)
% Join a and b, retaining only observations whose key
% values match.
cinner = join(a,b,'key','Key1','Type','inner',...
'MergeKeys',true)
a =
Key1 Var1
'a' 1
'b' 2
'c' 3
'e' 11
'h' 17
b =
Key1 Var2
'a' 4
'b' 5
'd' 6
'e' 7
couter =
Key1_left Var1 Key1_right Var2
'a' 1 'a' 4
'b' 2 'b' 5
'c' 3 '' NaN
'' NaN 'd' 6
'e' 11 'e' 7
'h' 17 '' NaN
coutermerge =
Key1 Var1 Var2
'a' 1 4
'b' 2 5
'c' 3 NaN
'd' NaN 6
'e' 11 7
'h' 17 NaN
cinner =
Key1 Var1 Var2
'a' 1 4
'b' 2 5
'e' 11 7