View as PDF
D.G. Allen, G.Pearse, J.K. Haseman,and R. R. Maronpot
View as PDF

The National Toxicology Program (NTP) developed the chronic 2-year bioassay as a mechanism for predicting the carcinogenic potential of chemicals in humans. The cost and duration of these studies has limited their use to small numbers of selected chemicals. Many different short-term methods aimed at increasing predictive accuracy and the number of chemicals evaluated have been developed in attempts to successfully correlate their results with evidence of carcinogenicity (or lack of carcinogenicity). Using NTP studies, the effectiveness of correlating prechronic liver lesions with liver cancer encompassing multiple studies using mice (83 compounds) and rats (87 compounds) was assessed. These lesions include hepatocellular necrosis, hepatocellular hypertrophy, hepatocellular cytomegaly, bile duct hyperplasia, and hepatocellular degeneration, along with increased liver weight. Our results indicate that pooling 3 of these prechronic data points (hepatocellular necrosis, hepatocellular hypertrophy, and hepatocellular cytomegaly) can be very predictive of carcinogenicity in the 2-year study (p < 0.05). The inclusion of increased liver weight as an endpoint in the pool of data points increases the number of rodent liver carcinogens that are successfully predicted (p < 0.05), but also results in the prediction of increased numbers of noncarcinogenic chemicals as carcinogens. The use of multiple prechronic study endpoints provides supplementary information that enhances the predictivity of identifying chemicals with carcinogenic potential.


Cancer persists as a worldwide disease and remains among the most common causes of death in the human population. Validated, dependable techniques that predict carcinogenic chemicals and provide information on human risk also remain elusive. A wide range of research methods have been explored, each aimed at predicting the carcinogenic potential of an untested chemical. For example, automated computer systems predict carcinogenicity based on mathematical models created by statistical analysis to correlate relationships between chemical structure similarities to a defined biological activity (Cunningham et al., 1998; Richard, 1998). In vitro methods such as the Salmonella mutagenicity assay and the micronucleus test are used to evaluate the ability of a compound to interact with DNA. Other mammalian cell-based assays have been implemented for nongenotoxic carcinogenicity predictions (Aardema et al., 1996; Foster, 1997). In vivo assays that reduce the term of the traditional 2-year bioassay, so-called “medium-term” bioassays, have been researched (Ward and Ito, 1988). Transgenic mouse and rat models have been employed with the hopes of enhancing the sensitivity of the 2-year bioassay (Tennant et al., 1995; Spalding et al., 2000). Many of these methods have proven beneficial in the study of carcinogenic potential, but none have been sufficiently validated so as to supplant the chronic 2-year bioassay as the benchmark carcinogenicity assay. Pitfalls of each assay have been recognized as limiting their role in carcinogenic predictions. Mathematical models and structure activity relationships rely on generalizations of biological activity based on chemical structure. Mutagenicity assays such as the Salmonella and Micronucleus assays are only applicable to genotoxic carcinogens. Revelvant transgenic models may be costly and difficult to obtain.

In contrast, there is an excellent concordance between the evidence of carcinogenicity in chronic bioassay studies using compounds that are known human carcinogens (Huff, 1999). In addition, direct comparisons among chemicals with diverse structural and/or biological properties can be drawn since the chronic bioassay is conducted by a defined protocol and on the same rodent strains. However, certain limitations exist that contribute to many chemicals never reaching the stage of the chronic bioassay. These include the high cost of a full 2-year bioassay in both sexes of rats and mice, and the time interval between a chemical’s nomination for study and the submission of a final report on findings. Therefore, these limitations are not only financially burdensome, but also reduce the throughput of potential chemicals because they are so laborious. In addition, animal welfare proponents cite the need to minimize the number of animals subjected to study.

The dose and route of administration for the 2-year chronic bioassay is based largely on data obtained from prechronic toxicity studies using the same compound. Prechronic traditionally encompasses both 14-day and 90-day exposures at a wide range of doses with histopathological endpoints similar to those in the chronic study. The results obtained in this evaluation are used to determine the appropriate dose levels for the chronic study. The prechronic bioassay not only provides important information for the study design of the chronic study, but also may provide insight into the expected results from the chronic study, including evidence of potential carcinogenicity (Ashby and Tennant, 1994).

This review of NTP studies was conducted as an attempt to identify specific pathological endpoints in short-term toxicity that can be used individually and/or in concert with other findings to predict carcinogenicity in chronic studies. The focus of this review was to identify specific histopathological lesions that would provide investigators with criteria to strengthen the justification for recommending a chemical for chronic study, or alternatively, to remove it from further consideration. This would serve to reduce the number of chemicals for chronic study, which in turn would reduce costs, and reduce the number of animals used. We have attempted to correlate specific hepatocellular pathology in prechronic studies with carcinogenic endpoints in the chronic 2-year study. We present data from chemicals tested by the U.S. National Toxicology Program (NTP), encompassing both genotoxic and nongenotoxic compounds, as well as noncarcinogens. We focused specifically on liver carcinogenicity, as it represents the most frequently diagnosed cancer in the reports that we reviewed, and thus maximizes our population size.







Data Set

Rodent carcinogenicity bioassay data were obtained for 111 technical reports (TR) produced by the NTP over a 10-year period, from March 1991 (TR-388) to July 2001 (TR- 499). These represented the most recent final technical reports to date when this evaluation was conducted. Only those reports from studies using B6C3F1 mice (Table 1) and/or Fisher 344 rats (F344/N) (Table 2) were used for the analysis. In addition, only those studies using both male and females from an individual species were evaluated (i.e., eliminating studies on compounds delivered intravaginally, or those specifically targeting single sex organs). Finally, only those studies that contained prechronic evaluations (studies ≤12 months duration) were considered. These criteria reduced the number of compounds reviewed to 87 for rat, and 83 for mice.

The NTP denotes evidence of carcinogenic activity using 5 different categories. Two categories denote positive results: Clear Evidence (CE) and Some Evidence (SE). Studies that are designated CE show a dose-dependent increase in neoplasms, which may be malignant and/or benign. However, if only benign neoplasms are reported, sufficient evidence must exist that these neoplasms progress to malignancy. Studies that are designated SE show a statistically significant, dose-dependent increase in neoplasms, albeit less than the level of significance seen in CE studies. Another 2 categories delineate negative results: Equivocal Evidence (EE) and No Evidence (NE). Studies that are designated EE show only a marginal increase in neoplasms that may or may not be chemically related. Studies assigned NE show no chemically related increases in neoplasms. A fifth designation: Inadequate Study (IS), is assigned to those studies that are flawed and/or limited in their capacity to define neoplastic changes and thus are not interpretable with respect to carcinogenicity. For the purposes of our evaluation, only those studies assigned CE or SE were considered carcinogenic, while studies assigned EE or NE were considered noncarcinogenic. Studies designated IS were treated as untested in the species in question.

Specific hepatic histopathological lesions were evaluated for their correlation to evidence of carcinogenicity. These lesions were selected as those that appeared most frequently among the reports evaluated and were as follows: hepatocellular cytomegaly, hepatocellular necrosis, bile duct hyperplasia, hepatocellular hypertrophy, and hepatocellular degeneration (rats only). Increased liver weight (relative and/or absolute) was also used as an endpoint for analysis because of its prevalence among the technical reports we screened, as well as the positive correlation between liver weight and mouse liver cancer previously reported by Elcombe et al. (2002). It should be noted that all lesions reported in this study were as they appeared in the NTP technical report for the respective compounds. No attempt was made to compare or contrast the nature and degree of similarly recorded changes. Furthermore, this review did not attempt to link the dose that induced prechronic findings with the dose that induced tumors.

Genotoxicity data were also evaluated using results from the Salmonella test and the Micronucleus assay, as these were the most common genotoxicity assays employed among the technical reports evaluated. Compounds were scored as either positive or negative in each assay. Although the Salmonella test is often performed on multiple strains, a positive result recorded for any strain was interpreted as a positive result for the assay.

Statistical Analysis

Comparison tables (2 × 2) were constructed for each of the parameters investigated (specified liver toxicity, genotoxicity, carcinogenicity) and a Fisher’s Exact Test was performed to determine level of significance (defined as p < 0.05).



The best single predictor of liver cancer in mice was hepatocellular hypertrophy (Table 3). Hepatocellular cytomegaly and hepatocyte necrosis also contributed, although the numbers of positive findings were less than hypertrophy. Bile duct hyperplasia failed to identify any liver carcinogens that were not already identified by the other 3 nonneoplastic liver lesions. As a group, hypertrophy, cytomegaly, and necrosis successfully predicted 17 of the 27 liver carcinogens (10 false negatives), with only 2 false positives (p < 0.0001). Adding increased liver weight as a predictor successfully identified 8 of the 10 false negatives as liver carcinogens. However, including increased liver weight resulted in the identification of 16 additional false positives, for a total of 18 false positives. As a single predictor, liver weight successfully identified 18 of the 27 liver carcinogens, but also identified 17 false positives.

With regard to the genotoxicity studies, there was no evidence of a correlation between mouse liver tumor chemicals and Salmonella or micronucleus assay outcome. For example, of the Salmonella positive chemicals 27% (6/22) produced mouse liver tumors compared with 33% (22/66) of the Salmonella negative chemicals that also produced liver tumors resulting in an insignificant difference. A similar result was observed for the micronucleus assay, although the sample sizes were much smaller, due to its exclusion from many of the studies. In addition, none of the prechronic liver lesions were correlated with either Salmonella or Micronucleus results.

Therefore, our analysis indicated that a chemical showing a positive response in the 3 nonneoplastic liver lesions detailed previously in a prechronic study had very high likelihood (17/19 or 89.5%) of being a liver carcinogen in a chronic study. However, these lesions did not identify 10/27 or 37.0% of the liver carcinogens. If increased liver weight was also grouped with the liver lesions, a majority of the liver carcinogens were identified (25/27, or 92.6%), but only 25/43, or 58.1%, of the positive prechronic findings correlated with liver cancer in the chronic studies (i.e., a large number of false positives would be introduced).



No single prechronic liver lesion (when considered individually) was a strong predictor of liver cancer in rats. The most predictive lesion was hepatocellular hypertrophy (Table 4). As was seen in the mice, bile duct hyperplasia failed to contribute to predictions, and identified no carcinogens not detected by other lesions. Hepatocellular degeneration was also a poor predictor of liver cancer. Grouping hepatocellular hypertrophy, hepatocellular necrosis, and hepatocellular cytomegaly (as in mice) resulted in 7 of the 11 (64%), rat liver carcinogens being correctly predicted (p < 0.01). However, this strategy also produced 16 false positives. Increased liver weight correctly predicted 8 of the 11 (73%) liver carcinogens (p < 0.05), but also produced even more false positives, 26, as well as producing 4 false negatives.

Therefore, as in mice, increased liver weight (when evaluated alone) was not as successful a predictive strategy as the grouped strategy detailed before. Including increased liver weight in the grouped strategy corrected 3 of the 4 false negatives produced by the 3-lesion group, but also added false positives to bring the total to 32. Despite the high number of false positives, the correlation between the grouped strategy (including increased liver weight) and rat liver cancer is highly significant (p < 0.001).

Analysis of the rat data provided the same results as mice with respect to genotoxicity data—no significant correlation between liver tumors/toxicity and the 2 mutagenicity measures were found. The only suggestion of correlation was between liver tumors and Salmonella results (p > 0.15). Of the 24 positive Salmonella chemicals, 21% (5/24) produced liver tumors in rats compared with only 10% (7/71) of the chemicals with negative Salmonella results that produced liver tumors.


These results provide evidence that prechronic liver lesions may be used as a component in the search for predictors of liver carcinogenicity in the chronic 2-year bioassay. In mice, a chemical showing a positive response in the 3 liver lesions (hepatocytomegaly, hepatocellular hypertrophy, and hepatocellular necrosis) has a very high likelihood of being a carcinogen in the chronic study. However, more than one-third of carcinogens would not be identified due to a propensity for false negatives using these criteria. If increased liver weight is also included as a lesion in the screening criteria, a majority of these false negatives would be eliminated. This enhanced sensitivity would come at a cost because an increase in false positives would be created (Figure 1). In rats, the same 3 liver lesions would again be very effective at predicting liver carcinogenesis while also producing fewer numbers of false negatives compared to mice. However, this apparent improvement in accuracy is offset by an increased occurrence of false positives. Inclusion of increased liver weight allows for all of the rat liver carcinogens evaluated in this study (11) to be successfully identified. As in mice, however, an improvement in the numbers of carcinogens identified is accompanied by an increase in the number of false positives (Figure 2). It should be noted, however, that the predictivity seen in the rat data might be slightly artifactual due to the low number of liver carcinogens (11) relative to mice (27).


It should be noted that clinical chemistry endpoints were also explored as potential predictors of liver carcinogenicity. It is conceivable that significant changes in certain liver enzymes (e.g., alkaline phosphatase, lactate dehydrogenase, alanine aminotransferase, sorbitol dehydrogenase) could correspond to toxicity that may in turn correlate with specific liver lesions, and ultimately with liver cancer. However, upon examination of all of the technical reports involved in this study, it became apparent that several inconsistencies precluded any feasible inclusion of these endpoints in the fi- nal analysis. For example, many of these studies have no prechronic liver chemistry data (samples were collected over 12 months after initiation) or have no liver chemistry data at all. In the studies that do contain prechronic liver chemistry data, there are a variety of endpoints that were collected, and the same endpoints were not always collected. A complete set of consistently generated data might have strengthened the predictivity of the morphological parameters.

An intriguing finding is the fact that genotoxicity is not correlated with liver carcinogenesis in either rodent species. This conclusion implies that the liver carcinogens evaluated are predominantly nonmutagenic in their mechanism of induction. However, the lack of micronucleus assay data for several chemicals mandates its exclusion from screening. In addition, there have been previous reports of compounds that are genotoxic, based on positive Ames assay results, that were not found to be rodent liver carcinogens. Therefore, the validity of this conclusion could be questioned because it is solely dependent on Salmonella mutagenicity. Additional genotoxic endpoints could conceivably shift the association between liver cancer and genotoxicity towards a more positive correlation (Ashby, 1996).

As presented previously, 2 types of errors are inherently associated with this type of evaluation. An error of false positivity demonstrates an overprediction of cancer, while an error of false negativity corresponds to an underprediction. Clearly, using this data analysis for liver cancer prediction must be accompanied by careful scrutiny. In both rats and mice, inclusion of increased liver weight markedly enhances the probability of identifying carcinogens using this method. However, should this inclusion of liver weights be made if it results in an overprediction of cancer? The primary ramification of this type of error would be the premature removal of a chemical from consideration when it actually does not cause cancer. In contrast, while exclusion of increased liver weight reduces the number of false positives, an increase in false negatives results. The impact of such an error could be the premature acceptance of a compound as a noncarcinogen when it is actually a carcinogen. A more realistic scenario, however, would be the inclusion of the compound in the 2-year chronic bioassay with the expectation of noncarcinogenicity, only to learn that it is indeed a carcinogen.

With this information in hand, one must next determine the best strategy with which to use this approach as a prognostic tool for carcinogenicity. The data suggest that these prechronic liver lesions may provide another supplementary information source for use in concert with other short-term assays (e.g., in vitro assays) to create a pool of data. Compiling data from multiple assays in this manner could conceivably reduce the collective impact of the errors intrinsic to each individual assay. Perhaps an even more appropriate use of this data could be as an asset to study design of the chronic bioassay. Armed with the knowledge that a compound has a significant chance of being a liver carcinogen, investigators could design experiments with more mechanistic approaches (in conjunction with the traditional protocol) that might shed some light onto the actual mode of the cancer-causing agent. Finally, with the ever-mounting increases in budgetary constraints, the reality exists that compounds under evaluation will need to be prioritized. This method could potentially provide investigators with evidence in support of or against continued study by identifying the best candidates among a group of compounds. Once again however, a decision would have to be reached regarding inclusion of increased liver weights as a component in the screening and which type of error would be most acceptable under these circumstances.


Aardema, M. J., Isfort, R. J., Thompson, E. D., and LeBoeuf, R. A. (1996). The low pH Syrian hamster embryo (SHE) cell transformation assay: a revitalized role in carcinogen prediction. Mutat Res 356, 5–9.

Ashby, J. (1996). Alternatives to the 2-species bioassay for the identification of potential human carcinogens. Hum Exp Toxicol 15, 183–202.

Ashby, J., and Tennant, R. W. (1994). Prediction of rodent carcinogenicity for 44 chemicals: results. Mutagenesis 9, 7–15.

Cunningham, A. R., Klopman, G., and Rosenkranz, H. S. (1998). Identification of structural features and associated mechanisms of action for carcinogens in rats. Mutat Res 405, 9–27.

Elcombe, C. R., Odum, J., Foster, J. R., Stone, S., Hasmall, S., Soames, A. R., Kimber, I., and Ashby, J. (2002). Prediction of rodent nongenotoxic carcinogenesis: evaluation of biochemical and tissue changes in rodents following exposure to nine nongenotoxic NTP carcinogens. Environ Health Perspect 110, 363–75.

Foster, J. R. (1997). The role of cell proliferation in chemically induced carcinogenesis. J Comp Pathol 116, 113–44.

Huff, J. (1999). Long-term chemical carcinogenesis bioassays predict human cancer hazards. Issues, controversies, and uncertainties. Ann NY Acad Sci 895, 56–79.

Richard, A. M. (1998). Structure-based methods for predicting mutagenicity and carcinogenicity: are we there yet? Mutat Res 400, 493–507.

Spalding, J. W., French, J. E., Stasiewicz, S., Furedi-Machacek, M., Conner, F., Tice, R. R., and Tennant, R. W. (2000). Responses of transgenic mouse lines p53(+/−) and Tg.AC to agents tested in conventional carcinogenicity bioassays. Toxicol Sci 53, 213–23.

Tennant, R. W., French, J. E., and Spalding, J. W. (1995). Identifying chemical carcinogens and assessing potential risk in short-term bioassays using transgenic mouse models. Environ Health Perspect 103, 942–50.

Ward, J. M., and Ito, N. (1988). Development of new medium-term bioassays for carcinogens. Cancer Res 48, 5051–4.