NDK_PCR_ANOVA

Name: NumXL SDK
Rating: 4.9 (12 reviews)

C/C++
.Net

int __stdcall NDK_PCR_ANOVA	(	double **	X,
		size_t	nXSize,
		size_t	nXVars,
		LPBYTE	mask,
		size_t	nMaskLen,
		double *	Y,
		size_t	nYSize,
		double	intercept,
		WORD	nRetType,
		double *	retVal
	)

Returns an array of cells for the i-th principal component (or residuals).

Returns: status code of the operation

Return values

NDK_SUCCESS	Operation successful
NDK_FAILED	Operation unsuccessful. See Macros for full list.

Parameters

[in]	X	is the independent variables data matrix, such that each column represents one variable
[in]	nXSize	is the number of observations (i.e. rows) in X
[in]	nXVars	is the number of variables (i.e. columns) in X
[in]	mask	is the boolean array to select a subset of the input variables in X. If missing (i.e. NULL), all variables in X are included.
[in]	nMaskLen	is the number of elements in mask
[in]	Y	is the response or the dependent variable data array (one dimensional array)
[in]	nYSize	is the number of elements in Y
[in]	intercept	is the constant or the intercept value to fix (e.g. zero). If missing (NaN), an intercept will not be fixed and is computed normally
[in]	nRetType	is a switch to select the return output: SSR (sum of squares of the regression) SSE (sum of squares of the residuals) SST (sum of squares of the dependent variable) MSR (mean squares of the regression) MSE (mean squares error or residuals) F-stat (test score) Significance F (P-value of the test)
[out]	retVal	is the calculated statistics ANOVA output.

Remarks

The underlying model is described here.
\[\mathbf{y} = \alpha + \beta_1 \times \mathbf{PC}_1 + \dots + \beta_p \times \mathbf{PC}_p\]
The regression ANOVA table examines the following hypothesis: \[\mathbf{H}_o: \beta_1 = \beta_2 = \dots = \beta_p = 0 \] \[\mathbf{H}_1: \exists \beta_i \neq 0, i \in \left[1,0 \right ]\]
In other words, the regression ANOVA examines the probability that the regression does NOT explain the variation in \(\mathbf{y}\), i.e. that any fit is due purely to chance.
The MLR_ANOVA calculates the different values in the ANOVA tables as follows: \[\mathbf{SST}=\sum_{i=1}^N \left(Y_i - \bar Y \right )^2 \] \[\mathbf{SSR}=\sum_{i=1}^N \left(\hat Y_i - \bar Y \right )^2 \] \[\mathbf{SSR}=\sum_{i=1}^N \left(Y_i - \hat Y_i \right )^2 \] Where:
- \(\mathbf{PC}\) is the principal component.
- \(N\) is the number of non-missing observations in the sample data.
- \(\bar Y\) is the empirical sample average for the dependent variable.
- \(\hat Y_i\) is the regression model estimate value for the i-th observation.
- \(\mathbf{SST}\) is the total sum of squares for the dependent variable.
- \(\mathbf{SSR}\) is the total sum of squares for the regression (i.e. \(\hat y\)) estimate.
- \(\mathbf{SSE}\) is the total sum of error (aka residuals \(\epsilon\)) terms for the regression (i.e. \(\epsilon = y - \hat y\)) estimate.
- \(\mathbf{SST} = \mathbf{SSR} + \mathbf{SSE}\)
AND \[\mathbf{MSR} = \frac{\mathbf{SSR} }{p} \] \[\mathbf{MSE} = \frac{ \mathbf{SSE} }{N-p-1}\] \[\mathbf{F-Stat} = \frac{\mathbf{MSR} }{\mathbf{MSE} }\] Where:
- \(p\) is the number of explanatory (aka predictor) variables in the regression.
- \(\mathbf{MSR}\) is the mean squares of the regression.
- \(\mathbf{MSE}\) is the mean squares of the residuals.
- \(\textrm{F-Stat}\) is the test score of the hypothesis.
- \(\textrm{F-Stat} \sim \mathbf{F}\left(p,N-p-1 \right)\)
The sample data may include missing values.
Each column in the input matrix corresponds to a separate variable.
Each row in the input matrix corresponds to an observation.
Observations (i.e. row) with missing values in X or Y are removed.
The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variables (X).
The MLR_ANOVA function is available starting with version 1.60 APACHE.

Requirements

Header	SFSDK.H
Library	SFSDK.LIB
DLL	SFSDK.DLL

NDK_PCR_ANOVA

See Also