mvksdensity
Kernel smoothing function estimate for multivariate data
Description
f = mvksdensity(x,pts,'Bandwidth',bw)x,
                evaluated at the points in pts using the required name-value
                pair argument value bw for the bandwidth value. The estimation
                is based on a product Gaussian kernel function.
For univariate or bivariate data, use ksdensity instead.
f = mvksdensity(x,pts,'Bandwidth',bw,Name,Value)Name,Value pair arguments. For example, you can
                define the function type that mvksdensity evaluates, such as
                probability density, cumulative probability, or survivor function. You can also
                assign weights to the input values.
Examples
Load the Hald cement data.
load haldThe data measures the heat of hardening for 13 different cement compositions. The predictor matrix ingredients contains the percent composition for each of four cement ingredients. The response matrix heat contains the heat of hardening (in cal\g) after 180 days.
Estimate the kernel density for the first three observations in ingredients.
xi = ingredients(1:3,:);
f = mvksdensity(ingredients,xi,'Bandwidth',0.8);Load the Hald cement data.
load haldThe data measures the heat of hardening for 13 different cement compositions. The predictor matrix ingredients contains the percent composition for each of four cement ingredients. The response matrix heat contains the heat of hardening (in cal/g) after 180 days.
Create an array of points at which to estimate the density. First, define the range and spacing for each variable, using a similar number of points in each dimension.
gridx1 = 0:2:22; gridx2 = 20:5:80; gridx3 = 0:2:24; gridx4 = 5:5:65;
Next, use ndgrid to generate a full grid of points using the defined range and spacing. 
[x1,x2,x3,x4] = ndgrid(gridx1,gridx2,gridx3,gridx4);
Finally, transform and concatenate to create an array that contains the points at which to estimate the density. This array has one column for each variable.
x1 = x1(:,:)'; x2 = x2(:,:)'; x3 = x3(:,:)'; x4 = x4(:,:)'; xi = [x1(:) x2(:) x3(:) x4(:)];
Estimate the density.
f = mvksdensity(ingredients,xi,... 'Bandwidth',[4.0579 10.7345 4.4185 11.5466],... 'Kernel','normpdf');
View the size of xi and f to confirm that mvksdensity calculates the density at each point in xi.
size_xi = size(xi)
size_xi = 1×2
       26364           4
size_f = size(f)
size_f = 1×2
       26364           1
Input Arguments
Sample data for which mvksdensity returns the probability density
                        estimate, specified as an n-by-d
                        matrix of numeric values. n is the number of data points
                        (rows) in x, and d is the number of
                        dimensions (columns).
Data Types: single | double
Value for the bandwidth of the kernel-smoothing window, specified as a scalar value or
                            d-element vector. d is the number
                        of dimensions (columns) in the sample data x. If
                            bw is a scalar value, it applies to all
                        dimensions.
If you specify 'BoundaryCorrection' as
                            'log'(default) and 'Support' as
                        either 'positive' or a two-row matrix,
                            mvksdensity converts bounded data to be unbounded
                        by using log transformation. The value of bw is on the
                        scale of the transformed values.
Silverman's rule of thumb for the bandwidth is
where d is the number of dimensions, n is the number of observations, and is the standard deviation of the ith variate [4].
Example: 'Bandwidth',0.8
Data Types: single | double
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
      Before R2021a, use commas to separate each name and value, and enclose 
      Name in quotes.
    
Example: 'Kernel','triangle','Function,'cdf' specifies
that mvksdensity estimates the cdf of the sample
data using the triangle kernel function.
Boundary correction method, specified as the comma-separated pair
                            consisting of 'BoundaryCorrection' and either
                                'log' or 'reflection'.
| Value | Description | 
|---|---|
| 'log' | 
 
 
 The value of  | 
| 'reflection' | 
 | 
mvksdensity applies boundary correction only when
                            you specify 'Support' as a value other than
                                'unbounded'.
Example: 'BoundaryCorrection','reflection'
Function to estimate, specified as the comma-separated pair
consisting of 'Function' and one of the following.
| Value | Description | 
|---|---|
| 'pdf' | Probability density function | 
| 'cdf' | Cumulative distribution function | 
| 'survivor' | Survivor function | 
Example: 'Function','cdf'
Type of kernel smoother, specified as the comma-separated pair
consisting of 'Kernel' and one of the following.
| Value | Description | 
|---|---|
| 'normal' | Normal (Gaussian) kernel | 
| 'box' | Box kernel | 
| 'triangle' | Triangular kernel | 
| 'epanechnikov' | Epanechnikov kernel | 
You can also specify a kernel function that is a custom or built-in function. Specify the
                            function as a function handle (for example,
                                @myfunction or @normpdf) or as
                            a character vector or string scalar (for example,
                                'myfunction' or 'normpdf').
                            The software calls the specified function with one argument that is an
                            array of distances between data values and locations where the density
                            is evaluated, normalized by the bandwidth in that dimension. The
                            function must return an array of the same size containing the
                            corresponding values of the kernel function.
mvksdensity applies the same kernel to
each dimension.
Example: 'Kernel','box'
Support for the density, specified as the comma-separated pair
consisting of 'support' and one of the following.
| Value | Description | 
|---|---|
| 'unbounded' | Allow the density to extend over the whole real line | 
| 'positive' | Restrict the density to positive values | 
| 2-by-d matrix | Specify the finite lower and upper bounds for the support of
the density. The first row contains the lower limits and the second
row contains the upper limits. Each column contains the limits for
one dimension of x. | 
'Support' can also be a combination of positive, unbounded, and bounded
                            variables specified as [0 -Inf L; Inf Inf U]. 
Example: 'Support','positive'
Data Types: single | double | char | string
Weights for sample data, specified as the comma-separated pair consisting of
                'Weights' and a vector of length size(x,1),
            where x is the sample data.
Example: 'Weights',xw
Data Types: single | double
Output Arguments
Estimated function values, returned as a vector. f and
                            pts have the same number of rows.
More About
A multivariate kernel distribution is a nonparametric representation of the probability density function (pdf) of a random vector. You can use a kernel distribution when a parametric distribution cannot properly describe the data, or when you want to avoid making assumptions about the distribution of the data. A multivariate kernel distribution is defined by a smoothing function and a bandwidth matrix, which control the smoothness of the resulting density curve.
The multivariate kernel density estimator is the estimated pdf of a random vector. Let x = (x1, x2, …, xd)' be a d-dimensional random vector with a density function f and let yi = (yi1, yi2, …, yid)' be a random sample drawn from f for i = 1, 2, …, n, where n is the number of random samples. For any real vectors of x, the multivariate kernel density estimator is given by
where , is the kernel smoothing function, and H is the d-by-d bandwidth matrix.
mvksdensity uses a diagonal bandwidth matrix and a product
                kernel. That is, H1/2 is a square
                diagonal matrix with the elements of vector (h1,
                                h2, …,
                                hd) on the main diagonal. K(x) takes the product
                form K(x) = k(x1)k(x2)
                            ⋯k(xd), where  is a one-dimensional kernel smoothing function. Then, the
                multivariate kernel density estimator becomes
The kernel estimator for the cumulative distribution function (cdf), for any real vectors of x, is given by
where .
The reflection method is a boundary correction method that
                accurately finds kernel density estimators when a random variable has bounded
                support. If you specify 'BoundaryCorrection','reflection',
                    mvksdensity uses the reflection method.
If you additionally specify 'Support' as a two-row matrix
                consisting of the lower and upper limits for each dimension, then
                    mvksdensity finds the kernel estimator as follows.
- If - 'Function'is- 'pdf', then the kernel density estimator is- for Lj ≤ xj ≤ Uj, - where , , and yij is the - jth element of the- ith sample data corresponding to- x(i,j)of the input argument- x. Lj and Uj are the lower and upper limits of the- jth dimension, respectively.
- If - 'Function'is- 'cdf', then the kernel estimator for cdf is- for Lj ≤ xj ≤ Uj. 
- To obtain a kernel estimator for a survivor function (when - 'Function'is- 'survivor'),- mvksdensityuses both and .
If you additionally specify 'Support' as
                    'positive' or a matrix including [0 inf],
                then mvksdensity finds the kernel density estimator by
                replacing [Lj
                        Uj] with [0
                    inf] in the above equations.
References
[1] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press Inc., 1997.
[2] Hill, P. D. “Kernel estimation of a distribution function.” Communications in Statistics – Theory and Methods. Vol. 14, Issue 3, 1985, pp. 605-620.
[3] Jones, M. C. “Simple boundary correction for kernel density estimation.” Statistics and Computing. Vol. 3, Issue 3, 1993, pp. 135-146.
[4] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.
[5] Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, 2015.
Extended Capabilities
Usage notes and limitations:
- Names in name-value pair arguments, including - 'Bandwidth', must be compile-time constants.
- Values in the following name-value pair arguments must also be compile-time constants: - 'BoundaryCorrection',- 'Function', and- 'Kernel'. For example, to use the- 'Function','cdf'name-value pair argument in the generated code, include- {coder.Constant('Function'),coder.Constant('cdf')}in the- -argsvalue of- codegen.
- The value of the - 'Kernel'name-value pair argument cannot be a custom function handle. To specify a custom kernel function, use a character vector or string scalar.
- For the value of the - 'Support'name-value pair argument, the compile-time data type must match the runtime data type.
For more information on code generation, see Introduction to Code Generation and General Code Generation Workflow.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2016a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)