Research Article |
|
Corresponding author: Enrique García-Barros ( garcia.barros@uam.es ) Academic editor: Zdenek Fric
© 2025 Enrique García-Barros.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
García-Barros E (2025) Refining estimates of dry body weight from linear measurements in adult moths and butterflies (Lepidoptera). Nota Lepidopterologica 48: 165-184. https://doi.org/10.3897/nl.48.144747
|
The dry body weight of adult male Lepidoptera was estimated from thirteen linear measurements of the wings and body using multivariate regression techniques. A dataset comprising information from 2,645 species was used, significantly increasing sample size with respect to a similar former approach. Based on the logarithmically transformed values of dry body weight and several linear measurements the best single predictors for body weight are body length, thorax length and head width, none of which is among the most popular descriptors of size in this insect order (namely, forewing length and wingspan). The results show that combinations of several linear measurements lead to the most precise estimates of dry body weight. More simple models, e.g. based on wing length or wingspan and body length, may provide reasonable but suboptimal approaches. Variance partitioning of the regression residuals indicated that most of the non-explained variance is attributable to morphology rather than to phylogeny, so overall the results suggest that the best models may be stable and liable for prediction except for unusual morphologies. Alternative approaches such as a taxon-by-taxon approach or ANCOVA-based methods were tested, and the results -and problems involved- are discussed. The potential relevance of co-linearity is addressed to. Based on a limited number of species, the author attempted to estimate the female to male weight relation (which happened to be nearly isometric), as well as the percent water content (38% overall).
The dataset is made available so it can be accessed for any related research on this or related subjects.
Body size (or body size at reproduction) is among the key life history traits because of its many -often intricate- interrelations with other life history traits and with the environment (e.g.
The body dimensions (such as body length or width) are seldom reported for the adult Lepidoptera in monographs and field guides, where the length of the forewing or the distance between the apex of the two forewings wings in extended position (wingspan) are most often presented (e.g.
Some years ago, the author essayed a multivariate approach to the prediction of male dry body weight in the Order Lepidoptera (García-Barros, 2015). Although based on a relatively small number of species the results suggested high reliability overall, though probably some sensitivity in the case that morphological diversity within clades was underestimated. This could only be solved by increasing morphological diversity within the taxa already present in the database. Therefore, the present study represents an extension of that former work after substantially increasing taxonomic sampling and, within reasonable limits, the intra-taxon variation in morphology. The new dataset should provide evidence to (1) validate the former results, (2) propose one or several alternative indexes to estimate dry body weight using only linear measurements and (3) estimate the relative robustness of the results in terms of the body proportions and phylogenetic position, as this might illuminate the ways for any further improvement of the results. (4) For practical reasons, research on lepidopteran allometry and related issues often concentrates on one taxon (typically, a family). For this reason, I intend to provide -and discuss the merit of- dry body size estimates based on pre-selected subtaxa (family, superfamily or informal groupings such as ‘micromoths’ or ‘macromoths’), both on a taxon-by-taxon basis or from multivariate regression approaches such as those implemented by other authors (e.g.,
The methods were identical to those described in
Identification to species level was not always possible, so in most instances reliable genus-level identification was judged sufficient for the present approach. Even when a plausible species could be attributed to such samples, these have been kept as ‘sp.’ unless confirmation by an expert or by reference to the specialized literature was possible.
A small sample of female individuals (361 species, 487 individuals) from species already represented by males (where at least the dry body weight and the forewing length were measured) was used to explore the general pattern of sexual size dimorphism of dry body weight. Another subset of field materials collected in Spanish locations (430 male individuals from 237 species) was used to explore the relationship between fresh and dry body weight. These individuals were frozen (-20 °C) shortly after capture, weighted in the laboratory within the next 24 hours and then processed as described above.
In addition to the Dry Body weight (DBw) thirteen linear measurements (in mm) were taken from each spread specimen. These are abbreviated using a combination of 1–2 capital letters to denote the body part measured (H, T, A, FW, HW, W, B respectively for Head, Thorax, Abdomen, Forewing, Hindwing, Wing and Body) followed by small case characters to indicate the attribute measured (l = length, w = width and sp = span, with the latter followed by 1 or 2 to indicate two alternative to estimate the wingspan). Since the statistical analyses were done on the Log10-transformed values, the abbreviations below (used in the text, tables and figures) refer to the log-transformed arithmetic means of such values for each species:
Hl Log10(head length): the length of the head in dorsal view. Frequently, this does not stand for the real length of the head, which can only be measured laterally. In extreme cases (e.g., the genus Hyblaea Fabricius) where the head is dorsally occluded by the thorax an arbitrary measure (0.01 mm) was recorded.
Hw Log10(Head width): where the head width is the distance between the outermost point of the two eyes.
Tl Log10(Thorax length): with the thorax measured dorsally; this estimate is somewhat arbitrary as in some taxa the thorax extends backwards behind the abdomen to some extent.
Tw Log10(Thorax width): the distance between the points of insertion of the forewings (not between the midpoints of the tegulae since these are movable structures covering the wing basis, e.g.,
Al Log10(Abdomen length).
Aw Log10(Abdomen width): where the width was measured at the midpoint of the abdomen length; again, this variable may not equally represent the robustness of the abdomen in all Lepidoptera, because the abdomen may be sub-cylindrical, dorso-ventrally flattened, or laterally depressed (e.g., in most Pieridae).
FWl Log10(Forewing length): with the wing measured from its base (see above) to the apex.
FWw Log10(Forewing width): the width of the wing taken at the midpoint of the FWl: axis and perpendicular to it.
HWl Log10(Hindwing length), the hindwing measured from its base to the most distant part of the wing edge, whatever its position.
HWw Log10(Hindwing width): like FWw described above.
Wsp1 (Wingspan 1, Log10 transformed): where wingspan is the distance between the two forewing apices in the spread insect; or, when the forewing maximum length occurs posteriorly to the wing apex, between the outermost points of the two forewings while these stand in the anterior half of the wing margin.
Wsp2 Log10(Wingspan 2): where wingspan is one half of the flying insect’s maximum wingspan along a wingstroke) calculated as: wingspan 2 = forewing length + (0.5·thorax width).
Bl Log10(body length): where the body length is the sum of the head, thorax and abdomen lengths.
The species means for each variable were calculated as the first step, then Log10-transformed and processed through linear regression. This implies that the traditional allometric (power) equation Y = a(Xb), or Log(Y) = Log(a) + b·Log(X) (e.g.
The predictive accuracy of the regression models estimated by cross-validation was given preference to evaluate the models. For that purpose, a subset of 534 species represented by at least three individuals was chosen as the validation set. Cross-validation was performed using the leave-one-out procedure (package “boot” in R:
Comparative methods where phylogenetic relations are accounted for are the standard whenever evolutionary relationships are the aim (multiple examples during the last three decades after e.g.,
However, phylogenetic information is relevant to this study as a means for estimating the phylogenetic signal in the data and, more relevantly, to check the potential causes of error in the models fitted to determine whether the lack of fit of the results should be attributed to phylogeny or to morphology (by variance partitioning as detailed below). The balance between these two potential sources of bias should help to improve the dataset for further work on the subject. This required an operative tree topology and an estimate of evolutionary distances.
The tree topology was built after the available information on lepidopteran taxonomy, updated to December 2022. Using the scheme used in
Branch lengths (not the tree topology) were derived from the DNA barcode sequences from BOLD (Barcode of Life Data System: www.boldsystems.org). For 1,430 species with sequences available, one of the most complete ones was randomly chosen; for 730 species identified only to genus level, the sequence from a ‘replacement species’ (i.e., a species in the same genus) was adopted if available. In this way, one nucleotidic sequence was associated to 2,160 of the terminal taxa (species) (Suppl. material
The tests incorporating phylogenetic information were restricted to the subset of species for which branch length estimates were available. For variance partitions (see below) the residuals from that subset of species were extracted from the models fitted to the whole data set. The phylogenetic signal in each of the variables (dry body weight and the linear measurements) was estimated using Pagel’s lambda (
Variance partitioning was used to estimate the fraction of the variation not accounted for by the models which is attributable to either morphology or to phylogeny (following the ideas by
Two types of taxon-by-taxon prediction of DBw were essayed using the family level as a starting point (the members of families represented by less than two species were clustered into the most closely related taxon of the same level). First, individual regressions for each family were calculated (with FWl, Wsp1 and Wsp2 as predictors). The differences between the family slopes were tested by means of ANCOVA analysis, and a T-test was used to compare the intercepts. For the main superfamilies, the phylogenetically closest families were combined when their slopes and intercepts were not significantly different. This procedure was repeated until the superfamily groups of at least 40 species were obtained. In addition, several arbitrary assemblages of potentially practical use (such as e.g., macromoths or micromoths) were evaluated. Second, an ANCOVA approach (such as that used by
Finally, a regression model based on the factors extracted from a PCA analysis of the morphological variables and phylogenetic information was performed. 10 PCA factors were extracted from the complete set of variables (99.9% quality representation), which were combined with the 2161 PCoA factors outlining the phylogenetic relations (described above for the variance partitions protocol). The non-significant factors were excluded from the model along a manually driven backwards selection process until a final solution was selected. This method has the advantage that the factors are linearly independent from each other so co-linearity is avoided, though it has practical drawbacks (addressed to in the see discussion).
The data underpinning the analysis reported in this paper are deposited in the Dryad Data Repository at https://doi.org/10.5061/dryad.bk3j9kdnw.
The lowest range of body weights did not differ from that presented previously (
On the logarithmically transformed values all the linear measurements were highly correlated with dry body weight (R > 0.97) and showed a clear phylogenetic structure, with lambda values close to 1.0 (Table
Phylogenetic signal for each of the variables in the study (Lambda), and summary results of the bivariate regression models of Log10(Dry body weight) (DBw) on each of the predictor variables, all values Log10 transformed. Intercept, coefficient and adjusted R2 of each model, together with the cross-validation error (CVE) and the ‘pure’ and shared explanation of the residuals attributable to morphology (Res.: morph.), phylogeny (Res.: phylo.) and shared by morphology and phylogeny (Res.: shared) (R2 values). For the first four columns, all but one of the values (intercept for the Tw regression) were significant at the level P > 0.001.
| Variable | Lambda | Intercept | Coefficient | R2 adj. | CVE | Res.: morph. | Res.: phylo. | Res.: shared |
| DBw | 0.998 | – | – | – | – | – | – | – |
| Hl | 0.965 | 1.418 | 2.311 | 0.599 | 0.259 | 0.031 | 0.015 | 0.008 |
| Hw | 0.997 | 0.250 | 2.928 | 0.916 | 0.053 | 0.329 | 0.017 | 0.055 |
| Tl | 0.999 | -0.434 | 2.708 | 0.948 | 0.033 | 0.042 | 0.001 | 0.008 |
| Tw | 0.984 | -0.016ns | 2.864 | 0.910 | 0.058 | 0.077 | 0.004 | 0.009 |
| Al | 0.986 | -1.055 | 2.695 | 0.881 | 0.077 | 0.097 | 0.007 | 0.015 |
| Aw | 0.985 | 0.657 | 2.381 | 0.867 | 0.086 | 0.107 | 0.009 | 0.017 |
| FWl | 0.999 | -2.053 | 2.727 | 0.862 | 0.089 | 0.106 | 0.009 | 0.023 |
| FWw | 0.998 | -0.340 | 1.953 | 0.780 | 0.142 | 0.172 | 0.024 | 0.025 |
| HWl | 0.985 | -1.728 | 2.714 | 0.782 | 0.140 | 0.166 | 0.022 | 0.031 |
| HWw | 0.984 | -0.732 | 2.238 | 0.763 | 0.153 | 0.183 | 0.026 | 0.028 |
| Bl | 0.998 | -1.832 | 2.823 | 0.929 | 0.046 | 0.059 | 0.002 | 0.010 |
| Wsp1 | 0.985 | -3.211 | 2.962 | 0.889 | 0.072 | 0.087 | 0.007 | 0.017 |
| Wsp2 | 0.987 | -3.078 | 2.729 | 0.885 | 0.074 | 0.089 | 0.006 | 0.019 |
Among the formerly published models, the closest one to those presented in Table
Summary of the results of the multivariate linear regression models of Log10(dry body weight), DBW, on different subsets of the potential explanatory variables, based on the Log10-transformed values (all the values reported for the intercepts, coefficients, model R2, F tests and AIC scores were significant, P < 0.001). The four lower rows contain the cross-validation errors and the explanation of the model residuals attributable to morphology alone, phylogeny alone, and to the shared effects of morphology and phylogeny (Resids., respectively: morph., phylo., shared).
| Parameters | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 |
|---|---|---|---|---|---|---|---|---|
| Intercept | -0.5206 | -0.4474 | -0.6883 | -2.2383 | -2.1903 | -1.0617 | -0.7148 | -0.5329 |
| Hl | -0.1107 | – | – | – | – | – | – | – |
| Hw | 0.1994 | 0.0847 | 0.4516 | – | – | – | – | – |
| Tl | 1.1379 | 1.1281 | – | – | – | – | 1.1757 | 1.2221 |
| Tw | 0.4584 | 0.5020 | 0.7435 | – | – | 1.7776 | 1.1910 | 0.5139 |
| Al | 0.2802 | 0.2738 | – | – | – | – | – | 0.2450 |
| Aw | 0.5111 | 0.5275 | 0.5067 | – | – | – | – | 0.5200 |
| FWl | 0.2859 | 0.2454 | 0.3828 | – | – | 1.2586 | 0.5829 | 0.3760 |
| FWw | 0.0858 | 0.0992 | – | – | – | – | – | – |
| Bl | – | – | 0.6431 | 2.1421 | 2.1751 | – | – | – |
| Wsp1 | – | – | – | 0.7630 | – | – | – | – |
| Wsp2 | – | – | – | – | 0.6873 | – | – | – |
| Model statistics | ||||||||
| R²adj | 0.980 | 0.980 | 0.974 | 0.935 | 0.933 | 0.963 | 0.974 | 0.979 |
| F | 16242.6 | 18250.5 | 16499.9 | 18794.3 | 18716.6 | 34146.0 | 33640.9 | 25246.2 |
| AIC | -4002.8 | -3959.7 | -3300.8 | -849.94 | -839.68 | -2352.1 | -3353.6 | -3847.6 |
| CVE | 0.0129 | 0.0131 | 0.0168 | 0.0425 | 0.0427 | 0.0241 | 0.0165 | 0.0137 |
| Minimum VIF | 4.04 | 11.83 | 10.50 | 11.86 | 11.36 | 3.49 | 6.73 | 1.00 |
| Min. VIF variable | Hl | Al | Aw | Wsp1 | Wsp2 | FWl | Tw | FWl |
| Resids., R2: morph. | 0.017 | 0.017 | 0.022 | 0.052 | 0.054 | 0.033 | 0.023 | 0.020 |
| Resids., R2: phylo. | <0.001 | <0.001 | <0.001 | 0.002 | 0.002 | 0.001 | <0.001 | <0.001 |
| Resids., R2: shared | 0.002 | 0.003 | 0.004 | 0.011 | 0.011 | 0.004 | 0.003 | 0.001 |
Eight alternative multivariate models were fitted, reflecting choices that would be expected to be preferred most often depending e.g., on the type of information available to predict DBw. Redundant combinations of composed measures with their components were avoided (e.g., Bl and Hl, Tl and Al or Wingspan with FWl and Tw). The statistics and coefficients are summarized in Table
Relationship between the Log10-transformed species mean dry body weights (DBW) and one of the most widely used descriptors of body size in the Lepidoptera, forewing length (FWl), and with the predicted values from multivariate model 2 (details in Tables
All the alternatives proved high fit (adjusted R2 above 0.93) with models 1, 2 and 8 rendering the best predictive performance (lowest cross validation errors).
Regarding the ‘taxon-by-taxon approach’ the explanation of DBw in terms of FWl, WS1 and WS2 were acceptable (R2 > 0.73, P < 0.001), on average respectively 0.83, 0.84 and 0.85 (Suppl. material
The butterflies (Papilionoidea) as an example of the potential problems involved in pre-selected taxon-specific approaches the prediction of body weight (Log10-transformed, DBw) from forewing length (Log10-transformed, FWl). Although three families (Lycaenidae, Riodinidae and Nymphalidae) were found to be homogeneous for the slope and intercept of the linear relationship, these and the three remaining families differed significantly from each other in the intercept, the slope, or both. Notice the logarithmic scale in both axes.
As shown in Table
The models based in the linearly independent factors representing morphology and phylogeny were slightly superior to any of the alternatives in Table
Regarding fresh (live) body weight, on average and on the non-logarithmically transformed values the water content was slightly below 40% (38.64%, s.d.= 7.11, Fig.
Fresh body weight as predicted by the Log10-transformed dry body weight values (DBW) of males and females from 237 species. The overall relationship was significant (adjusted R2 = 0.985, p < 0.001) with a slope close to isometry: Log10(Fresh body weight) = 0.428 + 0.992(Log10DBW). The difference between the slope of the two sexes was small, but significant (details in the text). Notice the logarithmic scale in both axes.
For the 362 species with weight data from the two sexes (Fig.
Male dry body weight as a predictor of the female dry body weight, from a sample of 362 species. Overall, on the log-transformed values the female dry body weight can be approached as female DBW = 0.1574 + 0.9770(male BDW) (adjusted R2 = 0.953, P < 0.0001). Notice the logarithmic scale in both axes.
The interpretation of the results allows for two opposed perspectives — both realistic — a pessimistic and an optimistic one.
As far as body weight prediction is concerned, the pessimistic point of view derives from the putatively low taxonomic and geographic representativity of the species used (hardly more than 2% of the described species of Lepidoptera:
Missing information from one part of the species is a recurrent problem in interspecific comparative studies (
However, an optimistic interpretation is straightforward from the results. First, a seven-fold increase of sample size in comparison to
A taxon-specific approach (e.g., predictions for one single family or superfamily using only FWl or Wingspan) may eventually be justified for practical reasons despite the generally suboptimal results. Such approach implicitly assumes a high degree of phylogenetic conservatism within the taxon studied, which may or may not be the case. As found in the Papilionoidea, there is a risk that the weights of part of the species are systematically overestimated or underestimated. The low weight of phylogenetic structure in the regression residuals suggests that species clustering on morphological grounds rather than on taxonomic ones might by far be preferable. This applies equally to the ANCOVA-like approaches where the initial results are of high fit overall, but with many non-significant terms which ultimately lead to comparatively poor results.
In summary, the most complex regression models (based in five to seven body measurements) predicted dry body weight most accurately. The author would suggest that models 2 and 8 (in Table
It is not the author’s intention to negate the utility of traditional size standards used for the Lepidoptera such as wingspan and forewing length. As shown by the results these measures retain a good correlation with body weight and may represent the only metric available for most species, besides their interest as descriptive standards or as the basis for the study of e.g. intra-specific variation of moth and butterfly body size.
The fact that only male insects were analysed imposes severe limitations for implementation of the present results in field studies. In addition to the differences between live insects and preserved ones due to tissue contraction after desiccation (
Incidentally, significant between-taxa differences (such as those within the Papilionoidea already mentioned, and probably other groups) may reflect differences in the broad body plans with ecological, life-historical or biomechanical implications which deserve further attention. On similar grounds, the bivariate slopes (form the regressions of dry body weight on the individual variables) revealed allometric slopes within the range 2.0–3.0, where 3.0 represents isometry. Interestingly, the steepest slopes (2.8–3.0) are among those involving body measurements explicitly (thorax width, total body length) or implicitly (such as wingspan 1) together with head width (as already documented by
To conclude, the dry body weight of male Lepidoptera can be predicted with reasonable reliability from a combination of linear measures not difficult to obtain. Rough estimates of the live weight or of the average male-female dry weight may be possible at the cost of accuracy. For decades, work devoted to the estimation of body weight in insects has relied in the objective of finding reference equations whose results could be used for extrapolation to other species (to quote some examples:
Over the last few years, the author received help from several colleagues who either provided specimens, facilitated access to them or to documentation, assisted with identification or solved methodological issues. While discharging them from for any accuracies in the results, I am indebted to (in alphabetical order) N. Agustí, J. Baixeras, U. Benardo, J.W. Brown, J.P. Cancela, M. Corley, M. Costa, E. Drouet, J. de Freina, O. Karsholt, M. Laguerre, W. Mey, M.L. Munguira, J.E. Murria Beltrán, A. Orellana, L. Przybylowicz, F. Pühringer, J. Razowski, L. Ronkay, H. Romo, P. Sihvonen, H. Staude, J. Sumpich, G. Tarmann, E. Toro Delgado, A. Vives Moreno, T.J. Witt (†) and J.L. Yela.
Documentation on phylogeny
Data type: pdf
Explanatory note: Documentation on phylogeny. Documentation used to assemble the branching pattern of the cladogram for the species included in the study.
Tree topology
Data type: nwk
Explanatory note: Tree describing the topology of the phylogenetic hypothesis used in this study.
List of species and available BOLD sequences
Data type: pdf
Explanatory note: List of the species included in the study, and the related COI sequences adopted to estimate between-species distances. The CODE is an arbitrary one (A001 to ZL013) used to denote each operative species-level taxon in this study. When no sequences associated to a species were found, that available from a closely related species (e.g., same genus) was adopted (listed in the column headed “Replace. / altern. SP”) to get approximated between-species distances at different taxonomic levels. The information was retrieved from the BOLD data portal (https://v3.boldsystems.org/) during 2023.
Additional statistics
Data type: pdf
Explanatory note: Supplementary tables describing the bivariate correlations (table S1), taxon-by-taxon regression results (table S2) and ANCOVA approaches (table S3).