EditMissingValues

Modifies pointers to arrays associated with the method of supporting missing values in a dataset.

Syntax

Fortran:

status = vslssseditmissingvalues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates)

status = vsldsseditmissingvalues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates)

status = vslsSSEditMissingValues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates);

status = vsldSSEditMissingValues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates);

Include Files

The Fortran 90 interfaces are specified in the mkl_vsl.f90 include file, and the C interfaces are specified in the mkl_vsl_functions.h include file.

Input Parameters

Name	Type	Description
task	Fortran: TYPE(VSL_SS_TASK) C: VSLSSTaskPtr	Descriptor of the task
nparams	Fortran: INTEGER C: MKL_INT*	Pointer to the number of method parameters
params	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues C: float* for vslsSSEditMissingValues double* for vsldSSEditMissingValues	Pointer to the array of method parameters
init_estimates_n	Fortran: INTEGER C: MKL_INT*	Pointer to the number of initial estimates for mean and a variance-covariance matrix
init_estimates	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues C: float* for vslsSSEditMissingValues double* for vsldSSEditMissingValues	Pointer to the array that holds initial estimates for mean and a variance-covariance matrix
prior_n	Fortran: INTEGER C: MKL_INT*	Pointer to the number of prior parameters
prior	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues C: float* for vslsSSEditMissingValues double* for vsldSSEditMissingValues	Pointer to the array of prior parameters
simul_missing_vals_n	Fortran: INTEGER C: MKL_INT*	Pointer to the size of the array that holds output of the Multiple Imputation method
simul_missing_vals	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues C: float* for vslsSSEditMissingValues double* for vsldSSEditMissingValues	Pointer to the array of size `k*m`, where `k` is the total number of missing values, and `m` is number of copies of missing values. The array holds `m` sets of simulated missing values for the matrix of observations.
estimates_n	Fortran: INTEGER C: MKL_INT*	Pointer to the number of estimates to be returned by the routine
estimates	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues C: float* for vslsSSEditMissingValues double* for vsldSSEditMissingValues	Pointer to the array that holds estimates of the mean and a variance-covariance matrix.

Output Parameters

Name	Type	Description
status	Fortran: INTEGER C: int	Current status of the task

Name

Type

Description

status

Fortran: INTEGER

C: int

Current status of the task

Description

The EditMissingValues routine uses values passed as parameters of the routine to replace pointers to the number and the array of the method parameters, pointers to the number and the array of initial mean/variance-covariance estimates, the pointer to the number and the array of prior parameters, pointers to the number and the array of simulated missing values, and pointers to the number and the array of the intermediate mean/covariance estimates. If an input parameter is NULL, the corresponding parameter in the task descriptor remains unchanged.

Before you call the VSL Summary Statistics routines to process missing values, preprocess the dataset and denote missing observations with one of the following predefined constants:

VSL_SS_SNAN, if the dataset is stored in single precision floating-point arithmetic
VSL_SS_DNAN, if the dataset is stored in double precision floating-point arithmetic

Intel MKL provides the VSL_SS_METHOD_MI method to support missing values in the dataset based on the Multiple Imputation (MI) approach described in [Schafer97]. The following components support Multiple Imputation:

Expectation Maximization (EM) algorithm to compute the start point for the Data Augmentation (DA) procedure
DA function

Note

The DA component of the MI procedure is simulation-based and uses the VSL_BRNG_MCG59 basic random number generator with predefined seed = 2⁵⁰ and the Gaussian distribution generator (ICDF method) available in Intel MKL [Gaussian].

Pack the parameters of the MI algorithm into the params array. Table "Structure of the Array of MI Parameters" describes the params structure.

Structure of the Array of MI Parameters
Array Position	Algorithm Parameter	Description
0	em_iter_num	Maximal number of iterations for the EM algorithm. By default, this value is 50.
1	da_iter_num	Maximal number of iterations for the DA algorithm. By default, this value is 30.
2	ε	Stopping criterion for the EM algorithm. The algorithm terminates if the maximal module of the element-wise difference between the previous and current parameter values is less than ε. By default, this value is 0.001.
3	m	Number of sets to impute
4	missing_vals_num	Total number of missing values in the datasets

You can also pass initial estimates into the EM algorithm by packing both the vector of means and the variance-covariance matrix as a one-dimensional array init_estimates. The size of the array should be at least p + p(p + 1)/2. For i=0, .., p-1, the init_estimates[i] array contains the initial estimate of means. The remaining positions of the array are occupied by the upper triangular part of the variance-covariance matrix.

If you provide no initial estimates for the EM algorithm, the editor uses the default values, that is, the vector of zero means and the unitary matrix as a variance-covariance matrix. You can also pass prior parameters for μ and Σ into the library: μ₀, τ, m, and Λ^-1. Pack these parameters as a one-dimensional array prior with a size of at least

(p² + 3p + 4)/2.

The storage format is as follows:

prior[0], ..., prior[p-1] contain the elements of the vector μ₀.
prior[p] contains the parameter τ.
prior[p+1] contains the parameter m.
The remaining positions are occupied by the upper-triangular part of the inverted matrix Λ^-1.

If you provide no prior parameters, the editor uses their default values:

The array of p zeros is used as μ₀.
τ is set to 0.
m is set to p.
The zero matrix is used as an initial approximate of Λ^-1.

The EditMissingValues editor returns m sets of imputed values and/or a sequence of parameter estimates drawn during the DA procedure.

The editor returns the imputed values as the simul_missing_vals array. The size of the array should be sufficient to hold m sets each of the missing_vals_num size, that is, at least m*missing_vals_num in total. The editor packs the imputed values one by one in the order of their appearance in the matrix of observations.

For example, consider a task of dimension 4. The total number of observations n is 10. The second observation vector misses variables 1 and 2, and the seventh observation vector lacks variable 1. The number of sets to impute is m=2. Then, simul_missing_vals[0] and simul_missing_vals[1] contains the first and the second points for the second observation vector, and simul_missing_vals[2] holds the first point for the seventh observation. Positions 3, 4, and 5 are formed similarly.

To estimate convergence of the DA algorithm and choose a proper value of the number of DA iterations, request the sequence of parameter estimates that are produced during the DA procedure. The editor returns the sequence of parameters as a single array. The size of the array is

m*da_iter_num*(p+(p²+p)/2)

where

m is the number of sets of values to impute.
da_iter_num is the number of DA iterations.
The value p+(p²+p)/2 determines the size of the memory to hold one set of the parameter estimates.

In each set of the parameters, the vector of means occupies the first p positions and the remaining (p²+p)/2 positions are intended for the upper triangular part of the variance-covariance matrix.

Upon successful generation of m sets of imputed values, you can place them in cells of the data matrix with missing values and use the VSL Summary Statistics routines to analyze and get estimates for each of the m complete datasets.

Note

Intel MKL implementation of the MI algorithm rewrites cells of the dataset that contain the VSL_SS_SNAN/VSL_SS_DNAN values. If you want to use the VSL Summary Statistics routines to process the data with missing values again, mask the positions of the empty cells.

See additional details of the algorithm usage model in the Intel® MKL Summary Statistics Library Application Notes document on the Intel® MKL web page.