Research Article 
Corresponding author: Enrique GarcíaBarros ( garcia.barros@uam.es ) Academic editor: Jadranka Rota
© 2015 Enrique GarcíaBarros.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
GarcíaBarros E (2015) Multivariate indices as estimates of dry body weight for comparative study of body size in Lepidoptera. Nota Lepidopterologica 38(1): 5974. https://doi.org/10.3897/nl.38.8957

Comparative studies on the size of adult Lepidoptera (moths and butterflies) frequently rely on single linear estimates of body size, namely of forewing length or wingspan. As the shape of the wings of these insects – in fact, of all body parts – differs from one taxon to another, such estimates of body mass may not be adequate for comparisons across a wide taxonomic range. Using the length and width of the forewing, thorax and abdomen, as well as the wing area of 375 species and their correlations with dry body weight, several composite indices were determined that might be used in different circumstances. As the coefficients of determination from the multivariate regression models were rather high (R^{2}>0.96), the results are believed to be reliable. A critical reevaluation of the results indicates that important variations in the regression slopes described here would be expected, if at all, only from species with unusual body shapes. Incidentally, the bivariate relationships are in agreement with former comparative work on Lepidoptera and other terrestrial insects in that the relationship between body weight and single linear measurements follows a slightly negatively allometric trend, implying comparatively lighter bodies at the largest body sizes and relatively heavier ones at the shortest body sizes.
As one of the hyperdiverse insect taxa, the order Lepidoptera is well suited for comparative work on subjects of broad biological relevance such as the evolution of body size and its correlation with other traits (e.g.,
Although body mass, or weight, is generally accepted as an accurate measure of size for Lepidoptera (e.g.,
Wings are the most relevant structure of these insects to the human eye, and there are good reasons for wing size to be correlated with body mass for functional reasons, as Lepidoptera are flying insects. However, some degree of structural variation affecting the relationship between wing size and body weight has been documented at several taxonomic levels including the intraspecific one (
The main objective of this study was to determine a composite index based on several linear estimates that could predict accurately the dry body weight of set specimens (e.g., from museum collections or even scale illustrations) irrespective of the species phylogenetic position. The reason for selecting dry body mass instead of fresh body weight is of a practical nature: because these insects are usually preserved as dried samples in scientific collections, the possibility to test and reelaborate any results is far more feasible than obtaining reliable fresh (live) weights from the same set of species. The second objective was to determine the sensitivity of such an index to sample size (the number of species), taxonomic diversity and morphological heterogeneity as a means to measure its robustness (if it is to be applied to species different from those used to fit it).
To avoid heterogeneity caused by the patterns of sexual dimorphism in adult size, the comparison was restricted to adult males from any available source, totaling 665 individuals from 375 species distributed among 61 families. The selection emphasized the diversity of size within and across families and included samples from any region in the world that could be processed.
The measurements were performed on dry set (pinned or spread), complete male specimens. When fresh adults were available, these were first dried in the position traditionally used for these insects in entomological collections. The measures described below were taken in one of four ways: (a) under a stereomicroscope with an ocular micrometer, (b) on a digitized scale drawing made with an optical camera lucida adapted to a stereomicroscope (× 10 to × 40), (c) on a digital photograph of the specimen taken together with a standard scale bar, taken either with a macro lens (up to 1:1) or on a photo microscope at low magnification, or (d) with a Vernier caliper (exceptionally in the case of some of the largest moths). The program ImageJ (
Six linear measurements (in mm) were taken (Figure
Slightly idealized representations of three typical adult Lepidoptera (left to right: Lasiocampidae, Hepialidae, Gelechiidae) to illustrate the variables measured. The right side of the thoraces is represented as devoid of the scale cover to make more evident the limits of this tagma. The three drawings are scaled to the same forewing length. Linear measurements are indicated by bars and areas by a striped pattern. FWL = fore wing length, FWW = forewing width, FWA = forewing area, HWA = hind wing area, TL = thorax length, TW = thorax width, AL = abdomen length, AW = abdomen width.
To estimate the magnitude of error measurement, the mean within sample and mean within species coefficients of variation were calculated after replicated measurements taken on each individual and between individuals within species.
Every measurement was taken twice for each specimen using two different methods among those detailed above (most frequently a, b and c), on two different dates.
Whenever possible two male specimens of approximately the same size (judged from wingspan by naked eye) of the species were processed. However, replications were not always possible as data from single representatives of a number of species were included if this contributed to an increase in the taxonomic or geographic coverage of the species selection.
The insects were dried to a constant weight at 60° for 48 hours (72 h for the largest specimens). The pins, if present, were removed carefully (but see below). The weight of the whole specimen was determined to the nearest 0.01 mg in a Mettler AT261 balance (species of wingspan of ca. 15 mm or above) or in a Mettler Toledo XP6 microbalance with precision of 0.001 mg (individuals smaller than that size).
Although medium or larger sized collection specimens can generally be depinned and remounted without much difficulty, there is always some risk of damage. For a small number of loaned specimens (ca. 20 individuals) the weight of the pins was estimated, then subtracted from that of the dry mounted specimen. Samples of 10 individual pins from four different brands and numbers (gauges): 000, 00, 0 and 1 to 6 (all with nylon heads and 37 mm long) were measured and weighed. The weights were taken to the nearest 0.01 mg, and the widths measured with a precision of 0.0179 mm under a binocular microscope with an ocular scale line. The relationship between the logtransformed weights and widths was highly consistent: log_{10∙}(pin weight in mg) = 2.339 + 1.908 log_{10∙}(pin diameter in mm), R = 0.997, P < 0.0001, n = 350.
The smallest moths (broadly corresponding to the heterogeneous assemblage of the “microlepidoptera”) posed some special difficulties, which handicapped the use of reference collections as sources of size data. These moths are fragile and very likely to be damaged if treated in the way described above, and even though they are frequently mounted on smaller pins (‘minutiae’, weighting 0.69–3.15 mg for widths of 0.10 and 0.20 mm respectively) the small variation in the length of these tiny metal pieces represents an excessive error in terms of the specimen dry weight. Moreover, as the genital pieces are of interest for identification, collection specimens frequently lack the abdomen or a large part of it as it was removed for identification. Finally, most of them cannot be easily identified to species level without expertise. For these reasons the data from several families in this category were obtained from a small reference collection at the author’s department. This hosts expertidentified specimens collected two decades ago at a single site, so new samples were taken at the same location during 2011–2012 to reasonably cover the lower part of the size range, although at the cost of low geographic variation.
All the variables were transformed to their decimal logarithms. This facilitated comparisons with results from earlier research (as most sizeweight relations have been modelled using the equation weight = a × size^{b}:
The multivariate models were fitted using the General Regression Models module of Statistica (
The method of phylogenetically independent contrasts (
The working hypothesis on phylogenetic relationships was built according to the classification proposed by
In the absence of any other references, the formal classifications of Fauna Europaea (
Regressions were done through the origin to estimate the correlations and slopes. After a multivariate regression model was obtained, Least Squares Regression was used to estimate the intercept for the working data set keeping the evolutionary slopes already obtained.
The number of species and of supraspecific taxa available for this study was obviously small if compared to the estimated number of existing species in the order Lepidoptera (more than 150,000 species:
The error in the predicted dry body weight (DBW) values were measured as the mean of the absolute values of the residuals from the two best fit models (described below) calculated for randomly selected subsets of n species, where n = 5, 10, 25, 50, 100, 150, 200, 250, 300 and 350. Forty replicates were taken at each n plus one more sample consisting of the whole data set. The taxonomic and structural diversities of each of such 401 species samples were estimated using the following attributes:
Species diversity: the number of species in each sample.
Variation in dry body weight: the standard deviation of the logtransformed dry body weights.
Structural variation. This variable was intended to account for structural/anatomical variation as reflected by the measurements taken, irrespective of body weight. To do this, each of the eight variables were regressed on body weight, one at a time. The residuals of such bivariate regressions were used as the new variables, now linearly independent of body weight. Applying Principal Component Analysis to this set of residuals (Bartlett’s Sphericity test X^{2} = 344.24, P < 0.001; KMO index = 0.72) resulted in three components accounting for 66.96% of the variance (respectively 41.51%, 14.59% and 10.86%). The standard deviation in these three components (weighted by the respective contribution of each component) was used as an index of structural (body shape) diversity, linearly independent from dry weight.
Taxonomic/phylogenetic diversity. This was tentatively estimated in four alternative ways: (1) Number of clades (absolute number of supraspecific nodes). (2) Phylogenetic diversity (PH): the number of clades or nodes represented in the sample minus one, plus the number of species as defined by
As the relationships between the mean residuals and these variables tended to be asymptotic rather than linear, the bivariate and multivariate regressions were performed using Generalized Regression Models and the logarithmic link function.
The dry body mass of the selected species covered a range of variation of nearly five orders of magnitude, from 0.03 mg to more than 2 g, corresponding to forewing lengths of between 1.8 mm and 110 mm (see Suppl. material
The replicated measurements (Table
Estimate of measurement error for dry body weight and six linear measurements, measured as a percentage of the mean. The values given are the mean coefficients of variation (100∙CV) (± 1 SD) averaged across individuals (from duplicated measurements on each specimen, n = 662) and from different replicates of the same species (within species, n = 328).
Within individuals  Within species  

Dry weight (DBW)    13.334 ± 9.905 
Forewing length (FWL)  2.317 ± 2.477  5.706 ± 4.138 
Forewing width (FWW)  3.177 ± 3.843  6.174 ± 6.826 
Thorax length (TL)  3.760 ± 3.915  5.611 ± 4.748 
Thorax width (TW)  3.032 ± 3.345  5.424 ± 4.901 
Abdomen length (AL)  4.450 ± 4.499  8.631 ± 6.769 
Abdomen width (AW)  5.982 ± 6.473  9.541 ± 6.678 
The results from bivariate regressions of DBW on the other variables as well as the full multivariate results (with all the variables in the model) are presented in Table
Relationships between dry body weight and the test variables based on the species mean values, estimated both by bivariate regression (left four columns) and in a multivariate regression model (right three columns; intercept = 0.489, multiple R = 0.983, adjusted R^{2} = 0.965). The β values represent the relative contribution of each variable in the multivariate model.
Bivariate regression  Multivariate regression  

Variable  R  Slope  P  Intercept  β  Slope  P 
FWL  0.939  2.772  <0.001  2.137  0.060  0.178  0.359 
FWW  0.920  1.989  <0.001  0.320  0.044  0.095  0.390 
TL  0.975  2.718  <0.001  0.445  0.407  1.135  <0.001 
TW  0.957  2.902  <0.001  0.173  0.189  0.572  <0.001 
AL  0.948  2.790  <0.001  1.173  0.082  0.241  0.029 
AW  0.936  2.529  <0.001  0.553  0.150  0.404  <0.001 
FWA  0.941  1.266  <0.001  1.174  0.274  0.368  0.008 
HWA  0.926  1.279  <0.001  1.136  0.011  0.015  0.862 
Relationships between dry body weight and the test variables based on the independent contrasts, estimated by bivariate regression (left three columns) and by multivariate regression (right three columns; multiple R = 0.914, adjusted multiple R^{2} = 0.833). All regressions were forced through the origin (no intercept). The β values represent the relative contribution of each variable in the multivariate model.
Bivariate regression  Multivariate regression  

Variable  R  Slope  P  β  Slope  P 
FWL  0.835  2.489  <0.001  0.146  0.434  0.091 
FWW  0.813  2.132  <0.001  0.040  0.104  0.547 
TL  0.891  2.663  <0.001  0.376  1.122  <0.001 
TW  0.859  2.632  <0.001  0.185  0.568  0.001 
AL  0.817  2.353  <0.001  0.055  0.159  0.257 
AW  0.817  2.185  <0.001  0.149  0.398  0.003 
FWA  0.840  1.153  <0.001  0.301  0.448  0.003 
HWA  0.821  1.210  <0.001  0.015  0.022  0.843 
Several alternative models fit by stepwise regression were calculated with multiple R values above 0.979 in all instances. Models 1 and 2 (Table
The two multivariate models with highest R scores among those fitted using the species mean values (1) and the phylogenetically independent contrasts (2). The statistics given are the coefficients of the intercepts and slopes (Coeff.), β values (relative contribution of each variable after standardization) and P (significance). The multivariate statistics are represented at the base of the table. The regression based on the independent contrasts was done through the origin (without intercept, statistics in the two bottom rows); the intercept given (0.553) was fitted a posteriori for the species values in the data set using the slopes (coefficients) stated.
(1) Species means  (2) Independent Contrasts  

Coeff.  β  P  Coeff.  β  P  
Intercept  0.180    0.207  0.553    <0.001 
FWL  0.745  0.252  0.015       
FWL^{2}  0.183  0.148  0.013       
FWA  0.346  0.257  <0.001       
TL  1.149  0.412  <0.001  1.087  0.395  <0.001 
TW  0.622  0.205  <0.001  0.616  0.167  <0.001 
AL  0.312  0.106  0.005       
AW  0.368  0.136  <0.001  0.408  0.109  <0.001 
FWA        0.378  0.294  <0.001 
Model statistics  
R  0.9828  0.981  
F (P)  F_{7, 367} = 1489.83 (P < 0.0001)  F_{4, 371} = 1409.32 (P < 0.0001)  
R [origin]    0.9140  
F (P) [origin]    F_{3, 287} = 351.54 (P < 0.0001) 
The regressions of the estimated error of the predictions (measured as the mean of the absolute value of the residuals) on the indicators of taxonomic, size and structural diversity led to the same results in the bivariate and multiple tests, irrespective of the data analyzed (species values or independent contrasts); thus, for simplicity, only the multivariate results are presented in Table
Sensitivity of the best models to several sources of diversity in the species selected. Relationships between the deviations of the predicted data (mean absolute residuals from 401 subsets of 5–375 species) based on the multivariate models 1 and 2 (from Table
Model 1  Model 2  

Variable  Coeff.  Wald  P  Coeff.  Wald  P 
Number of species  0.0003  1.837  0.175  0.0003  1.166  0.280 
Body Weight diversity  0.0125  1.752  0.186  0.0053  0.268  0.604 
Morphological diversity  0.0965  40.349  <0.0001  0.0867  27.582  <0.0001 
Taxonomic distinctness  0.0032  0.718  0.396  0.0018  0.195  0.659 
Number of clades  0.0003  1.917  0.166  0.0002  1.191  0.275 
PH  0.00002  0.014  0.906  0.00003  0.027  0.870 
RPD  0.0143  16.371  <0.0001  0.0161  17.527  <0.0001 
Model statistics  
Deviance/DF  0.0022  0.0033  
Loglikelihood  470.817  445.012  
OLS R^{2} (P)  0.168 (P < 0.0001)  0.163 (P < 0.0001) 
The results generally show high correlations between all linear dimensions of the Lepidopteran body, or the wing areas, and total dry body weight. This is not surprising given the relatively important range of sizes covered and, especially, because a functional link between the variables measured and total body size should exist in insects that must be able to fly effectively such as the male specimens of moth and butterfly species studied.
The results are consistent with the fact that the wings of Lepidoptera are thin structures (thus relatively light even if comparatively broad and evident) while the largest proportion of the body weight is determined by the weight of the main thoracic and abdominal structures. Forewing length is a popular estimate of body size in butterflies and moths as it is easier to measure than other body dimensions. However, this measure has by itself a lower predictive power of dry body weight than the thoracic dimensions (length and width) or, depending on the method used, abdomen length. Thus, wingspan, taken as the distance from the midpoint of the thorax to the tip of the forewing, would in theory be more accurate than the length of the wing alone as it would partly account for thorax width. However, as stated by
For the linear measurements that are more directly related to body length, such as the thoracic and abdominal lengths, the slopes determined across the species means (2.7–2.8, see Table
Among the several drawbacks of the present results is the fact that intraspecific variation has not been controlled for, and cannot be distinguished from other sources of error. This may be acceptable under the assumption that intraspecific variation in body weight is generally higher than interspecific variation for the same trait. Given this and the widespread phenomenon that intraspecific allometric trends follow different (generally less steep) slopes than the interspecific trends in animal taxa (e.g.
Of course, it is likely that the predictive accuracy of the regression models selected can be improved by spreading the selection of species. The results in Table
Although the comparative method of independent contrasts is statistically robust in the absence of accurate estimates of branch lengths, the contrasts are calculated by dividing the differences between each pair of values at a node by the estimated evolutionary distances (derived directly from the branch lengths;
The fact that the multivariate approaches presented here showed high R^{2} scores (> 0.94) for a much wider range of size, morphology and taxonomic variety than that in any former comparable study on Lepidoptera suggest that, although liable to be refined, they may represent a useful tool for comparative work when a wide taxonomic scope is necessary.
I wish to thank Pascual Torres (SIDI, Universidad Autónoma de Madrid) for weighing most of the smallest specimens and Mercedes París (Museo Nacional de Ciencias Naturales, Madrid) for the loan of selected specimens. Juan Pablo Berrocal assisted during the initial stages of the study. Most problems related to the identification of the samples would not have been resolved without the help of several colleagues, namely Antonio Vives Moreno, Gareth E. King, Joaquín Baixeras, JoséLuis Yela and Elisenda Olivella. Thanks are also due to D. Molina for his samples of Lepidoptera from Peru and Ecuador.
Nexus format text.
Data type: Adobe PDF file
Explanation note: Tree topology for the phylogenetic hypothesis adopted, to be used as input in applications reading nexus (requires some slight previous edition).
Frequency distribution graph.
Data type: Adobe TIF file
Explanation note: Frequency distribution of the dry body weight data (mg) across the species studied.
Documentation on phylogeny.
Data type: Adobe PDF file
Explanation note: This is a list of references including the most relevant sources of information used to build the hypothesis on phylogenetic relationships which were not quoted in the main text.
Tree topology.
Data type: Adobe PDF file
Explanation note: Graphic display (dendrogram) to show the hypothesis on phylogenetic relations adopted in this work, after the sources quoted in the main texta and in the file: Supplementary material 3.
Mean by superfamily.
Data type: Adobe PDF file
Explanation note: Mean dry body weight and wing length by superfamily, and sample sizes.
Alternative models.
Data type: Adobe PDF file
Explanation note: Alternative or suboptimal regression models derived from the species means or from the independent contrasts.