# Statistical classification

I was ill when my class studied the theory of statistical classification. Now we have to do lots of SPSS assignments using SPSS research methods, but I don’t know what to start with. Can you please explain me what methods exist and when they are applied?

### 2 comments to Statistical classification

• Afroze

The coefficient of determination R2 is used in the context of statistical models and it is very important to SPSS researcher. The main purpose of this kind of model is prediction of future outcomes on the basis of other related information. R2 is simply the square of the sample correlation coefficient, which value lies between ‘0 and 1’. The computational definition of R2 can yield negative values, depending on the definition used, arise where the predictions which are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept. The coefficient of determination is a measure of the strength of the relationship between the predicted variable and model of the predictors in a regression model.

Variables in the model:

1. Model – SPSS research allows you to specify multiple models in a single regression command. This tells you the number of the model being reported.

2. Variables Entered – SPSS allows you to enter variables into a regression in blocks, and it allows stepwise regression. Hence, you need to know which variables were entered into the current regression.

If you did not block your independent variables or use stepwise regression, this column should list all of the independent variables that you specified.

3. Method – This column tells you the method that SPSS used to run the regression. “Enter” means that each independent variable was entered in usual fashion. If you did a stepwise regression, the entry in this column would tell you that.

4. R – R is the square root of R-Squared and is the correlation between the observed and predicted values of dependent variable.

5. R-Square – This is the proportion of variance in the dependent variable which can be explained by the independent variables

6. Adjusted R-square – This is an adjustment of the R-squared that penalizes the addition of extraneous predictors to the model.

Adjusted R-squared is computed using the formula 1 – ((1 – Rsq)((N -1) /( N – k – 1)) where k is the number of predictors.

7. Std. Error of the Estimate – This is also referred to as the root mean squared error. It is the standard deviation of the error term and the square root of the Mean Square for the Residuals in the ANOVA.

I hope I can be able to clarify your issues. You can also get more from SPSS help and other SPSS researchers.

• Afroze

Statistical classification is a classification which has a set of discrete categories, which may be assigned to a specific variable in a statistical survey and used in the production and presentation of statistics and obviously SPSS researcher uses it frequently. For instance, the categories ‘male’ and ‘female’ constitute a classification for the variable ‘sex’, which can be observed for humans as well as for many other living organisms.

The structure of classification can be either hierarchical or flat. Hierarchical classifications range from the broadest level (e.g., division) to the detailed level (e.g., class). Flat classifications (e.g., sex classification) are not hierarchical. Statistical classifications are developed or revised on the basis of established practices and principles as bellow:

• the categories are exhaustive and mutually exclusive

• the classification is comparable to other related (national/international) standard classifications

• the categories are stable (i.e., they are not changed too frequently, or without proper review, justification and documentation) which is very important to SPSS researcher

• the categories are well described with a title in a standard format and backed up by explanatory notes, coding indexes, coders and correspondence tables to related classifications (including earlier versions of the same classification)

• the categories are well balanced within the limits set by the principles for the classification (i.e., not too many or too few categories). This is usually established by applying significance criteria (e.g., size limits on variables such as employment, turnover, etc.)

• the categories reflect realities of the domain (e.g., the society or economy) to which they relate (e.g., in an industry classification, the categories should reflect the total picture of industrial activities of the country)

• the classification is backed up by the availability of instructions, manuals, coding indexes, handbooks and training. Statistical classifications are used for:

a. presenting statistical information which SPSS researcher does frequently;

b. the collection of information and/organization of information already collected

c. aggregating and disaggregating data sets meaningfully for purposes of analysis, including the construction of indexes.