摘要
One of the most important jobs of the applied scientist is to visualize data. Instead of tables of numbers, graphs that are easily comprehensible to fellow scientists often help understand trends. Chemometrics has two main roles, a qualitative one that allows the data scientist to view the data and a quantitative role that involves modeling and predictions. Being able to turn matrices into graphs is an important task. Often tabular data are too complicated to readily understand; for example, a spectroscopist usually visualizes their spectrum rather than examines a table of numbers. As another example, a grayscale picture could be represented by matrix consisting of a table of pixilated intensities, but that is hard for the human eye to understand. This visualization can be conceptual, for example, in more than 3 dimensions, but the "picture" is held in the computer, and these simplifications are part of important mathematical approaches for determining the main trends in data. In this article, we will primarily illustrate using matrices of low dimensionality, but the principles can easily be extended to however many dimensions are required, and valuable information can be extracted from virtual multidimensional representations. An expert in multivariate analysis needs a cool mind to ensure he or she distinguishes carefully related but similar ideas. There are two different ways of representing this matrix. In practice we deal with much more complex datasets; for example, a 100 × 20 dataset may be the result of analyzing 100 objects using 20 variables, perhaps the result of analyzing 100 plant extracts and measuring the intensity of 20 GCMS peaks on each extract. In such circumstance the variable space is an imaginary 20-dimensional space and the object space 100-dimensional. In previous articles we have discussed the effect of column centering.1 In our case the matrix is transformed to , and the relationships between the new vectors change. In fact the distance between the objects in variable space is unchanged although they are now centered around the origin, but the distance between the variables in object space changes. Hence, according to whether we center a dataset or not we may come to different conclusions as to how different variables (for example, marker compounds) relate. It is important to remember that linear independence has a specific algebraic and geometric definition. For example, the matrix has rank 2, or a column/row space that is 2-dimensional, so is rank deficient, as the third column equals the first plus twice the second. However, the matrix is not, even though, the third column can be exactly obtained from the first 2 by subtracting 7 from the third column in the first matrix and hence requires 3 dimensions to represent it in either variable or row space: it is easy to show using standard approaches in Excel or matlab or any other favorite language or environment that the first matrix has zero determinant and is singular but the latter has a determinant of −42 and is invertible. In the last few articles we have discussed a variety of related concepts. These include linear independence, rank, column and row space, singular matrices, correlation, and orthogonality. Many of these concepts are viewed as similar to each other, but there are differences, often skated over in chemometrics or indeed many statistical textbooks and papers. The effect of centering also influences the relationship between columns of a matrix and so their overall properties. Most courses and general texts are not aimed at chemometrics experts; the majority aimed towards physicists and engineers who have quite different needs. Even the definition of a vector in chemometrics differs from that in physics. But an expert in multivariate analysis, which is an important basis of chemometrics, needs a cool mind to ensure he or she distinguishes carefully these related but similar ideas. There are many documents available that discuss these concepts in substantially more detail, a few of which are referenced.3-6 Chemometrics experts should be aware that the ideas in this article are part of quite sophisticated treatments of matrix algebra and some may wish to delve further.