Correspondence Analysis | |||
Page 4 on 8 | Table of contents | Last | Next . |
|||
4. The Analysis II |
|||
A. |
Dimensionality of the problem |
||
. Let's look at the cloud of point that stands for the rectangular figure table, the sum of whose column and rows we have transformed to one (that is, percentage). This cloud is contained in a space of dimension card(I)-1 or card(J)-1, whichever is lower. For the end of this paper we will assume that I<J and thus the problem is of dimension card(I)-1. In our example the table is 8x12 and is thus contained after its transformation in percents in a 7 dimensional space. . . |
|||
B. |
Geometric principles |
||
. Basically, the insight of any factor analysis and in particular of correspondence analysis is that the cloud of points that we are trying to describe does not stretch equally in every direction, but on the contrary that it has a definite shape which is not an hyperball, for there is affinity between rows and columns. We are then going to define a new system of orthogonal coordinate more "economical". More precisely, what we seek is for a cloud of points N(I) (remember that each point is located in space by its card(J) coordinates on J, that is, its profile on J) the representation which, in as small as possible a dimensionality (that is, the minimum number of axis), is as good as can be. If we want a graphical representation on paper, the problem can be formulated as follows : determine the subspace L of dimension 2 which pass through the center of gravity of the cloud (i.e. its mean profile) and which maximize the inertia of N(I) parallel to L. The softwares do not stop at 2 dimensions and you get in the standard output the card(I)-1 dimensions of the problem. But we have to be more general now. . . |
|||
C. |
Factorial Axis |
||
. If we denote by any row which pass through the center of gravity of cloud N(I), we can break up the total inertia of the cloud in the sum of inertia parallel to (projected on) and inertia perpendicular to . The first factorial axis is row for which the inertia parallel to is maximum. The second factorial axis is, among all rows orthogonal to , the row for which the projected dispersion (inertia) of the cloud orthogonally complementary to is greatest ("What is left of inertia"). Going on axis after axis, we can extract factorial axis which make the new set of orthogonal axis in which the cloud can be totally described. These axes are called principal axis of inertia. .
To each axis is associated an eigenvalue whose sum equals the inertia of the cloud. Each eigenvalue is worth 1 at most. You can see that if N(I) had only one point, there would be no axis, and if it had only 2 points, it would have a single axis; with 3 points we would need at most 2 perpendicular axis and for n points a maximum of card(J)-1 axis. . . |
|||
D. |
Symmetry of both analyses |
||
. We have so far talked only about the analysis of one side of the table, rows or column.We can either project the row-points in the 11 dimensional community space or project the column points in the 7 dimensional schooling level space. We would then obtain 2 representations of the two cloud. Are these representations different ? Not much. The analyses are symmetric in three ways : .
Statisticians have decided not to differentiate between the two systems of factorial axis born from both analysis, and to represent all points on the same graph. The algorithm represent the points (see below how) in the space created by the k first factorial axis. The distance between points will be a distance to the mean profile (or centroid) in the Euclidean space of the paper sheet. Besides the graphs, the softwares give us material to answer some of the scientist's questions like : Which part of the total inertia is accounted by the first k axis ? Which part of the variation of a given point is accounted by this particular graph ? What is the contribution of each point to the construction of the axis system ? . |
|||