Determination of a novel size proxy in comparative morphometrics

HOW TO CITE: Gallagher A. Determination of a novel size proxy in comparative morphometrics. S Afr J Sci. 2015;111(9/10), Art. 2014-0221, 10 pages. http://dx.doi.org/10.17159/ sajs.2015/20140221 Absolute size is a critical determinant of organismal biology, yet there exists no real consensus as to what particular metric of ‘size’ is empirically valid in assessments of extinct mammalian taxa. The methodological approach of JE Mosimann has found extensive favour in ‘size correction’ in comparative morphometrics, but not ‘size prediction’ in palaeontology and palaeobiology. Analyses of five distinct mammalian data sets confirm that a novel size variate (GMSize) derived from k=8 dimensions of the postcranial skeleton effectively satisfies all expectations of the Jolicoeur–Mosimann theorem of univariate and multivariate size. On the basis of strong parametric correlations between the k=8 variates and between scores derived from the first principal component and geometric mean size (GMSize) in all series, this novel size variable has considerable utility in comparative vertebrate morphometrics and palaeobiology as an appropriate descriptor of individual size in extant and extinct taxa.


Introduction
4][5][6] Cope's 'rule' of phyletic size increase is a pervasive phenomenon in the vertebrate fossil record, [7][8][9][10][11][12] and remains a valid prospectus irrespective of any determinant probability governing directional size increases from a lineal founder of comparably diminutive size relative to its terminal members. 7,13[16][17][18] In cases where absolute length or mass of an individual organism cannot be reliably determined, as is generally the default in comparative morphometric analyses of specific skeletal elements and in vertebrate palaeontology, a justified linear proxy for body size is required.3][24][25][26][27] As a basic objective, we desire a reliable size proxy for either a single case (an individual fossil) or a series of individuals sampling an unknown or indeterminate underlying size distribution.5][36][37] While these methods offer considerable improvement over traditional bivariate Model I and Model II regression techniques, their utility is dependent upon access to reasonably complete and associated comparative series in museum repositories and, in the case of interspecific models, effective taxon-specific samples may be little improved over traditional bivariate approaches.There is general acceptance of the size-adjustment approach advocated by JE Mosimann in comparative morphometrics, 38,39 yet there has been hitherto little recognition of the potential primacy of the favoured size variate, the geometric mean (GM), in estimation of 'size'.Following the work of PF Jolicoeur, 40,41 any preferred construct of individual size from a suite of k linear correlates is testable via decomposition of their variance-covariance matrix (VCV) via principal components analysis.Following Jolicoeur's rationale, 38,40,41 if the first principal component (PC) of a VCV matrix of log-transformed k variates accounts for a majority of the total explained variance (>75%), and all k variate loadings on this vector are approximate, then PC1 represents a generalised multivariate size vector and individual variates may be expressed as simple functions of geometric similarity as follows: From this, our k linear variates are simply re-scaled as components of isometry with the values β=< 1, β=1, β=> 1 indicating negative allometry, isometry and positive allometry, respectively. 38,413][44] However, one critical problem with this size metric is that it is entirely dependent upon the number of p landmark points registered on a specimen (or series of specimens) and can differ radically in any given random sequence of restricted landmark points, http://www.sajs.co.za 3][44] Nevertheless, as a comparative size variate, any derivation from the k x p landmark distance space is an inherent intrinsic function of the specific skeletal element under consideration, and cannot be reified as a faithful proxy of size in broader comparative appraisals.From a theoretical perspective, the only available test for allometry in geometric morphometric applications is a simple test of correlation between the first PC on the VCV of the tangent space coordinates, [42][43][44] in a direct assessment of correlation of size and shape.Assessments of size correspondences across even anatomically proximate structures using centroid size are simply not possible.
In contrast, derivation of the geometric mean of a series of k linear dimensions taken on a single element, or across multiple associated elements of the same specimen, offers significant promise as a generalised comparative size variate in normal metric scales of the SI (μm, mm).The geometric mean is simply the nth root of the sum of their products (where n=k) 45 , and the distribution of this size metric in a population of individuals has been demonstrated to conform to expectations of the univariate log-normal and gamma distributions.More critically, the geometric mean of a series of k variates is strongly and positively correlated with the PC1 scores derived from a principal components analysis of the VCV of this series. 38,41A table of parametric correlation coefficients (Pearson's r) is an effective assessment of covariance in a series of k variates prior to calculation of the GM.
In the event that body length and body mass are unknown, in an individual or a series, an alternative 'proxy' should fulfil the basic prospectus of correspondence with intrinsic organismal size.2][33][34] Nevertheless, such analyses ignore discreet allometric trajectories observed within families, and even between closely related species.The approach favoured here is a global skeletal perspective (Figure 1; Supplementary table 1 online), and follows the size proxy outlined by Reno and colleagues. 46A series of eight distinct linear dimensions were derived from the proximal and distal epiphyses of the four major long bones in associated individual skeletons.Given that all major weight-bearing epiphyses are sampled, it follows logically that the cumulative proxy of this series, the geometric mean (GMSize), is both intrinsic to an individual and is a faithful approximation of its locus within any hypothetical Guassian normal distribution, 38,40,41,45,46 intraspecifically and at the familial and higher orders of the Linnaean hierarchy.Given a general acceptance of the primacy of postcranial linear variates in the estimation of body mass in extinct mammalian taxa, particularly dimensions of the epiphyses, the GM of k=8 linear dimensions of the postcranial epiphyses in associated individual skeletons offers a prospectus for exposition of a generalised size variate in vertebrate morphometrics. 45,46

Materials and methods
The preferred k=8 linear variates of the associated fore-and hindlimb skeletons were taken on a comparative series of extant mammals sampling 247 African hominids (Gorilla and Pan), 149 Old World monkeys (Colobus, Cercopithecus and Papio) and 62 large-bodied felids (Panthera and Acinonyx) housed in collections in Africa, Europe and the USA (Supplementary table 2).All data were transformed to natural logarithms (ln), including the GM of the raw series, and parametric correlation matrices (Pearson's r) were calculated for these discreet interspecific series.The covariance matrices (VCV) for each of these series were subjected to a principal components analysis and the Eigenvectors, component loadings and PC scores were calculated using PAST version 3.1. 47In order to assess the efficacy of the proposed size variate at the intraspecific level, pooled-sex series sampling Pan t. troglodytes (n=91) and Gorilla g. gorilla (n=102) were assessed.While closely related, these taxa evidence considerable differences in sexual size dimorphism and are sufficiently large to warrant consideration as viable statistical populations.

Results
Correlation coefficients for the k=8 fore-and hindlimb dimensions are highly significant across all five data sets and exceed r=0.92 in all cases, with the notable exception of Pn. t. troglodytes (Supplementary tables 3-7).The poorer correlation coefficients between the linear variate series in common chimpanzees reflects the well-known phenomenon that centring any bivariate distribution (x,y) in a linear regression yields a higher slope in cases in which effective size ranges of x and y are proportionally large, as in interspecific 'mouse-to-elephant' analyses.2-4.In all five series, there exists a perfect correspondence between the PC1 scores and GMSize (r=1.00) for the k=8 linear variates of the foreand hindlimb epiphyses (Figures 2-4).Individual variate loadings on the first PC across the data sets reveals a satisfying consistency within each (Tables 1-6), yet their multivariate isometry coefficients are sufficiently distinct to support family-level and even species-specific allometric scaling trajectories of fore-and hindlimb epiphyseal joints, as revealed in the positive and negative loadings of the various samples on PC2 (Figure 2b).These observed distinctions further caution against the universal efficacy of any 'scaling criterion' derived from interspecific allometric scaling solutions to a single specimen or a series of specimens.Any universal assumption concerning scaling of the proximal femoral articulation in Pn. t. troglodytes and G. g. gorilla based upon theoretical derivations from pooled-sample analyses of African hominids or Old World monkeys, is not supported by the observation that the proximal femoral articulation scales with negative allometry in these species, as indicated by their pooled-sample multivariate distribution.The proximal femur is actually proportionally smaller in Gorilla than in Pan.While direct correspondences between multivariate isometry coefficients between the log-transformed and raw linear data series are not possible, it is worth noting that the femoral head loads negatively on the second PC axis of both the log-transformed and raw data series in the pooled African hominid sample, but not in the corresponding tables of the species-specific analyses (Table 1; Supplementary tables 8-12).
Calculation of the Jolicoeur multivariate allometry coefficients in Pn. t. troglodytes and G. g. gorilla underscores the necessity of sampling all k=8 linear variates in the derivation of the preferred size metric, as these taxa also differ in the multivariate scaling of their osseous components of the elbow and knee joints and are not allometrically equivalent (Tables 4 and 5).Observed species-specific or genus-specific allometric scaling constants for any of the k=8 variates can be simply tested using conventional post-hoc tests for slopes, y-intercepts and elevations in the bivariate case, yet the observed scalar distinctions in these analyses do not compromise the preferred variate (GMSize) as a valid descriptor of size in comparative contexts.By retaining all k=8 linear variates in the analysis, a comparative size proxy is generated which is sufficiently powerful to verify hypotheses of allometric equivalence in the postcranial epiphyses of living and extinct taxa (Figure 2b).On the basis of these data, chimpanzees and gorillas are not allometrically equivalent animals in terms of their relative fore-and hindlimb epiphyseal joint profiles.[50] Darroch and Mosimann 51 have extended the foundations of Jolicoeur's multivariate allometry to canonical component space, subsuming the k-group method of canonical variates analysis. 52,53Canonical variates analysis is a k-group extension of Fisher's linear discriminant analysis for k=2 groups, 52,53 and this extension has both practical and theoretical significance in biological anthropology.Conventional application of a two-sample discriminant function analysis (DFA) in forensic assessment of sex or ancestry [54][55][56][57][58] proceeds from a series of multidimensional (k=>3) variates under expectations that the predefined 'sets' sample discreet multivariate universes. 52,53Nevertheless, substantial overlap exists in observed univariate and multivariate distributions of female and male individuals in all but the most dimorphic mammalian taxa.5][56][57][58] Following the extension outlined in Darroch and Mosimann 51 , if the GM of any suite of k variates is an appropriate descriptor of size, then an equally satisfying correspondence should exist among the total variance explained by PC1, the classification statistics derived using a DFA, and the underlying pooled-sample distributions of the two GM sets.
The Pn. t. troglodytes (M=39/F=52) and G. g. gorilla (M=56/F=48) series were subjected to DFA based on known sex using an earlier version of PAST (v.2.3). 47DFA equations are given in Supplementary table 13 and the correct percentage classifications for Pan and Gorilla were about 86% and 99%, respectively.The exemplary classification of Gorilla is a clear function of the discreet nature of the intraspecific size distribution (bimodal) and is consistent with extreme sexual size dimorphism.Only a single male specimen was incorrectly classified as female.In contrast, the comparably monomorphic Pan yields a percentage classification that approximates the upper range of a typical DFA classification in recent humans with 12 specimens incorrectly assigned to their respective sexes.Classification statistics for both raw and log data were equivalent across both samples (Supplementary table 13).Both data sets effectively satisfy expectations based upon canonical components of size and shape. 51 correspondence in Gorilla is perfect (Supplementary tables 14-17).As in conventional DFA of sex assessment in humans, there is a substantially higher incorrect classification of female specimens (n=8) than male specimens (n=4) in Pn. t. troglodytes.The question logically arises as to whether this phenomenon is typical of all monomorphic mammalian taxa, and is certainly worthy of further comparative exploration.
Given that the preferred size variate in this analysis is simply the geometric mean (GMSize) of k=8 linear dimensions of the fore-and hindlimb epiphyses, this variate can be reliably constructed from any linear combination of the available series (i.e.k=<8).An obvious candidate for redundancy is one of the osseous components of the knee joint (FBB, PTB) (Figure 1; Supplementary table 1), as is one of the elbow joint components (DHAB, RHD), yielding a geometric size variable derived from k=6 linear dimensions.As data in Tables 7 and  8 attest, two permutations of GMSize, which reduce the variate series, yield little real improvement to the model (or, alternatively, reduce its efficacy) in terms of the variance explained by the first PC, yet there are subtle distinctions in the loadings of the individual variates on the first and subsequent PCs (Tables 7 and 8).Multivariate allometry coefficients also change subtly, underscoring the observations in Figure 2 and in previous analyses that Pan and Gorilla are not allometrically equivalent animals.The potential loss of information in more distinct mammalian taxa is graver, as no assumptions of allometric equivalence are made in the entire k=8 linear series.Stated simply, the geometric mean of the entire k=8 linear dimensions of the fore-and hindlimb epiphyses of the postcranial skeleton retains relevant information pertaining to absolute individual size and equally relevant information about relative joint size, which clearly differs in Pan and Gorilla and within the large-bodied felids (Figure 2b).On the strength of the correlation coefficients, it is clear that any single variate (such as the proximal femoral articulation) can be used to estimate the preferred size proxy in comparative size appraisals of living and fossil taxa via simple bivariate regression of x on y. http://www.sajs.co.za

Discussion and conclusions
The geometric mean of any series of variables is a cumulative dimension inherently dependent on the series of k variates employed in its derivation.9][40][41] As Jolicoeur and Mosimann have demonstrated, [38][39][40][41]51 both principal and canonical components can be derived and assessed in lieu of any generalised multivariate size distribution (conforming to the Guassian log-normal and gamma distributions) and these effectively approximate the geometric mean. Nevrtheless, there has been some recent criticism of the utility of the geometric mean.59,60 As Auerbach and Sylvester 60 have demonstrated, the Model I slope (least squares regression) of any series of k variates regressed upon their respective geometric mean yields a mean slope of β=1.00, irrespective of positive or negative allometry of the independent k variates.While this is important, it merely stresses the rationale (theoretical/computational) that bivariate linear regression of any dependent k variate upon a geometric mean in a cumulative series of which it is a constituent, is inappropriate.38,45,51 Irrespective of scalar constraints (i.e.differential size of the k dependents), any series of k variates is presumed to be highly correlated with its geometric mean and, given the computational mechanics of derivation of the least squares regression slope, the assumption of independence of x and y is effectively violated.Stated simply, we cannot presume that x and y are independent nor, for that matter, that error in y is independent of error in x, when the latter is effectively a cumulative function of unobserved error in a series y 1 …y 2 …y 3 …y k .38,45

Figure 2 :
Figure 2: Bivariate scatter plot of (a) PC1 scores (y-axis) against GMSize and (b) PC2 to PC1 scores of the k=8 postcranial variates in the entire comparative series (Gorilla excluded for visual scale).

Table 1 :
In contrast with Pn. t. troglodytes, the considerable linear size range across the variate series observed in G. g. gorilla yields coefficients only marginally lower than in the familial Old World monkey and felid data sets (Supplementary tables 3-7).The general consistencies in size correspondences across the k=8 fore-and hindlimb joint dimensions in the five data sets is equally supported by the actual proportion of the total variance explained by the first PC across the series (Table1).Analyses of the pooled-sample African hominids, Old World monkeys, large-bodied felids and G. g. gorilla yield a first PC accounting for a staggering 96-98% of the total variance, which is clear confirmation of a dominance of linear size on these axes for these respective series.In contrast, the first PC of the Pn.t.troglodytes data set accounts for a considerably depressed percentage of the total variance (76.5%), particularly striking in comparison with G. g. gorilla, and is consistent with a scalar decrease in absolute ranges of the k=8 variate distributions in this comparably monomorphic taxon.This observed pattern is robust irrespective of whether raw linear data are used in lieu of the logtransformed data (Table1; Supplementary tables 3-7).Summary statistics for the principal components (PC) analysis DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibiaHPAW FHD AP FBB PTB DTP = √DT ML*DT AP HDAW RHD DRB Figure 1: Linear variates taken on associated fore-and hindlimb epiphyses used in the derivation of the size metric (GMSize).http://www.sajs.co.za Volume 111 | Number 9/10 September/October 2015 That both Jolicoeur and Mosimann's conditions are met by the size variate preferred here (GMSize) is equally confirmed by data in Figures

Table 2 :
Component loadings for principal component (PC) analysis axes: African hominids JIC, Jolicoeur multivariate allometry coefficients; PHAB, mediolateral diameter of the articular surface of the humeral head; DHAB, mediolateral diameter of the anterior surface of the distal humeral articular surface (trochlea + capitulum); RHD, maximum diameter of the radial head; DRB, maximum mediolateral diameter of the distal radial articulation; FHD, femoral head diameter (superoinferior or anteroposterior); FBB, maximum mediolateral diameter of the distal femur; PTAB, maximum mediolateral diameter of the tibial articular plateau; DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibia.7 South African Journal of Science http://www.sajs.co.za Volume 111 | Number 9/10 September/October 2015

Table 3 :
Component loadings for principal components (PC) analysis axes: Old World monkeys Jolicoeur multivariate allometry coefficients; PHAB, mediolateral diameter of the articular surface of the humeral head; DHAB, mediolateral diameter of the anterior surface of the distal humeral articular surface (trochlea + capitulum); RHD, maximum diameter of the radial head; DRB, maximum mediolateral diameter of the distal radial articulation; FHD, femoral head diameter (superoinferior or anteroposterior); FBB, maximum mediolateral diameter of the distal femur; PTAB, maximum mediolateral diameter of the tibial articular plateau; DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibia.

Table 4 :
Component loadings for principal components (PC) analysis axes: large-bodied felids DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibia.

Table 5 :
Component loadings for principal components (PC) analysis axes: Pan t. troglodytes Jolicoeur multivariate allometry coefficients; PHAB, mediolateral diameter of the articular surface of the humeral head; DHAB, mediolateral diameter of the anterior surface of the distal humeral articular surface (trochlea + capitulum); RHD, maximum diameter of the radial head; DRB, maximum mediolateral diameter of the distal radial articulation; FHD, femoral head diameter (superoinferior or anteroposterior); FBB, maximum mediolateral diameter of the distal femur; PTAB, maximum mediolateral diameter of the tibial articular plateau; DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibia. http://www.sajs.co.za

Table 6 :
Component loadings for principal components (PC) analysis axes: Gorilla g. gorilla Jolicoeur multivariate allometry coefficients; PHAB, mediolateral diameter of the articular surface of the humeral head; DHAB, mediolateral diameter of the anterior surface of the distal humeral articular surface (trochlea + capitulum); RHD, maximum diameter of the radial head; DRB, maximum mediolateral diameter of the distal radial articulation; FHD, femoral head diameter (superoinferior or anteroposterior); FBB, maximum mediolateral diameter of the distal femur; PTAB, maximum mediolateral diameter of the tibial articular plateau; DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibia.

Table 7 :
Summary statistics for the principal components (PC) analysis (trial redundancies)

Table 8 :
Summary statistics for the principal components (PC) analysis maximum mediolateral diameter of the distal femur; PTAB, maximum mediolateral diameter of the tibial articular plateau; DTP, the square root of the product of the maximal mediolateral diameter (including the medial malleolus) and the maximum anteroposterior diameter of the distal tibia.
38 appropriate solution to this problem is Model II regression.38Analysis of the five comparative series included in this study, encompassing the lowest Linnean operational taxonomic unit (i.e. a species) in two cases and in successively higher taxonomic artifices, confirms that the GM of a suite of k=8 linear dimensions of the fore-and hindlimb epiphyses of the mammalian postcranial skeleton (GMSize) is both an appropriate and faithful approximate of 'size' in an individual.More crucially, this preferred size variable conforms to all logical expectations of the Jolicoeur-Mosimann categorisation of individual organismal size, in both univariate and multivariate space.