HNS Normal Scale Estimate of Smoothing Parameter. CALL: h = hns(data,kernel) h = one dimensional optimal value for smoothing parameter given the data and kernel. size 1 x D data = data matrix, size N x D (D = # dimensions ) kernel = 'epanechnikov' - Epanechnikov kernel. 'biweight' - Bi-weight kernel. 'triweight' - Tri-weight kernel. 'triangluar' - Triangular kernel. 'gaussian' - Gaussian kernel 'rectangular' - Rectanguler kernel. 'laplace' - Laplace kernel. 'logistic' - Logistic kernel. Note that only the first 4 letters of the kernel name is needed. HNS only gives an optimal value with respect to mean integrated square error, when the true underlying distribution is Gaussian. This works reasonably well if the data resembles a Gaussian distribution. However if the distribution is asymmetric, multimodal or have long tails then HNS may return a to large smoothing parameter, i.e., the KDE may be oversmoothed and mask important features of the data. (=> large bias). One way to remedy this is to reduce H by multiplying with a constant factor, e.g., 0.85. Another is to try different values for H and make a visual check by eye. Example: data = wnormrnd(0, 1,20,1) h = hns(data,'epan'); See also hste, hbcv, hboot, hos, hldpi, hlscv, hscv, hstt, kde
Return 2'nd order moment of kernel pdf | |
Calculates quantile levels which encloses P% of data | |
Difference and approximate derivative. | |
Standard deviation. |
Multivariate Normal Scale Estimate of Smoothing Parameter. | |
Oversmoothing Parameter. | |
2-Stage Solve the Equation estimate of smoothing parameter. | |
Scott-Tapia-Thompson estimate of smoothing parameter. | |
Demonstrate the smoothing parameter impact on KDE |
001 function h=hns(A,kernel) 002 %HNS Normal Scale Estimate of Smoothing Parameter. 003 % 004 % CALL: h = hns(data,kernel) 005 % 006 % h = one dimensional optimal value for smoothing parameter 007 % given the data and kernel. size 1 x D 008 % data = data matrix, size N x D (D = # dimensions ) 009 % kernel = 'epanechnikov' - Epanechnikov kernel. 010 % 'biweight' - Bi-weight kernel. 011 % 'triweight' - Tri-weight kernel. 012 % 'triangluar' - Triangular kernel. 013 % 'gaussian' - Gaussian kernel 014 % 'rectangular' - Rectanguler kernel. 015 % 'laplace' - Laplace kernel. 016 % 'logistic' - Logistic kernel. 017 % 018 % Note that only the first 4 letters of the kernel name is needed. 019 % 020 % HNS only gives an optimal value with respect to mean integrated 021 % square error, when the true underlying distribution 022 % is Gaussian. This works reasonably well if the data resembles a 023 % Gaussian distribution. However if the distribution is asymmetric, 024 % multimodal or have long tails then HNS may return a to large 025 % smoothing parameter, i.e., the KDE may be oversmoothed and mask 026 % important features of the data. (=> large bias). 027 % One way to remedy this is to reduce H by multiplying with a constant 028 % factor, e.g., 0.85. Another is to try different values for H and make a 029 % visual check by eye. 030 % 031 % Example: data = wnormrnd(0, 1,20,1) 032 % h = hns(data,'epan'); 033 % 034 % See also hste, hbcv, hboot, hos, hldpi, hlscv, hscv, hstt, kde 035 036 % Reference: 037 % B. W. Silverman (1986) 038 % 'Density estimation for statistics and data analysis' 039 % Chapman and Hall, pp 43-48 040 041 % Wand,M.P. and Jones, M.C. (1995) 042 % 'Kernel smoothing' 043 % Chapman and Hall, pp 60--63 044 045 046 %Tested on: matlab 5.3 047 % History: 048 % revised pab dec2003 049 % fixed a bug when D>1 050 % revised pab 21.09.99 051 % 052 % updated string comparisons 053 % from kdetools 054 055 if nargin<2|isempty(kernel) 056 kernel='gauss'; 057 end 058 [n, d] = size(A); 059 if (n==1) & (d>1), 060 A=A.'; 061 n=d; 062 d=1; 063 end 064 % R= int(mkernel(x)^2) 065 % mu2= int(x^2*mkernel(x)) 066 [mu2,R] = kernelstats(kernel); 067 AMISEconstant = (8*sqrt(pi)*R/(3*mu2^2*n))^(1/5); 068 069 iqr = abs(diff(qlevels2(A,[75 25]),1,1)); % interquartile range 070 stdA = std(A); 071 h = stdA*AMISEconstant; 072 k = find(iqr>0); 073 if any(k) 074 % use of interquartile range guards against outliers. 075 % the use of interquartile range is better if 076 % the distribution is skew or have heavy tails 077 % This lessen the chance of oversmoothing. 078 h(k)=min(stdA(k),iqr(k)/1.349)*AMISEconstant; 079 end 080 081 return 082 083 084 085 086
Comments or corrections to the WAFO group