SUMMARY OF THE REMEDIATION TECHNOLOGIES DEVELOPMENT FORUM
PHYTOREMEDIATION ACTION TEAM
TOTAL PETROLEUM HYDROCARBON IN SOIL SUBGROUP
CONFERENCE CALL

August 5, 2002
12:30 p.m.-2:30 p.m.

On August 5, 2002, the following members of the Remediation Technologies Development Forum's (RTDF's) Phytoremediation Action Team, Total Petroleum Hydrocarbon (TPH) in Soil Subgroup, met in a conference call:

Steve Geiger, ThermoRetec, Inc.
Peter Kulakow, Kansas State University (KSU)
Kirk O'Riley, Chevron Corporation
Steve Rock, U.S. Environmental Protection Agency (EPA)
David Tsao, BP America
Duane Wolf, University of Arkansas

Arati Kolhatkar of BP America, Vicki Lancaster of Neptune Environmental, Jessica Patino of the State University of New York, Ann Vega of EPA, Wendy Feng of KSU, and Christine Hartnett of Eastern Research Group, Inc. were also present.


BACKGROUND INFORMATION

The TPH in Soil Subgroup has created a field study program to evaluate how effectively plants degrade petroleum hydrocarbons. As part of this effort, phytoremediation demonstration projects have been established at 13 sites across the country. For the most part, these sites are following the TPH in Soil Subgroup's protocol, which recommends: (1) testing at least three treatments (i.e., two vegetated treatments and one nonvegetated control treatment), (2) establishing at least four replicates per treatment, and (3) conducting field studies over at least three years. The protocol also recommends collecting soil samples before treatment plots are established (Ti), after seed bed preparation but before planting (T0), and after each growing season (T1, T2, and T3). Soil samples are collected from more than one depth.

Data collection efforts are underway. Samples are sent to laboratories for analysis and the results are forwarded to KSU's Peter Kulakow, who is compiling and analyzing data from most of the Subgroup sites. In addition, these data are forwarded to individual site managers and to statisticians who have been enlisted to help with the data analysis. This conference call was held to allow those who are analyzing the data to share ideas and discuss analytical methodologies. The themes discussed are summarized below.


SUBSTITUTIONS FOR NONDETECT VALUES

At some of the Subgroup sites, Kulakow said, contaminants have been reported as nondetect values. He asked call participants to indicate how they address nondetect values. He noted that some analysts substitute nondetect values with zeros, thus making the assumption that a contaminant is absent if it is recorded as a nondetect value. This can be an erroneous assumption, however, because nondetect values are also recorded in instances where a contaminant is present but at concentrations that are below a laboratory's detection limit. In an attempt to be more conservative, Kulakow said, some analysts substitute nondetect values with half the detection limit rather than zero. In fact, this is basically the approach KSU is using with the data from the Subgroup sites. (Much of the data have been generated by ICF Consulting [formerly Arthur D. Little Laboratory], a laboratory that lists minimum reporting limits [MRLs] in their data reports. Detection limits are approximately one-tenth of an MRL. Thus, KSU substitutes nondetects with one-twentieth of an MRL--that is, one-half of one-tenth of an MRL.)

Vicki Lancaster acknowledged that substitution with half the detection limit is an approach commonly used to address nondetects. She advised, however, using the "minimum-maximum" approach before resorting to the "half detection limit" approach. She said that the minimum-maximum approach involves performing analyses twice. The first time through, analysts substitute zeros for nondetects. The second time through, analysts substitute nondetects with the highest reasonable value (e.g., the detection limit rather than half the detection limit). If the same conclusions are gleaned using the two different methods, Lancaster said, analysts do not need to worry about the nondetect issue. If different substitution values yield different inferences, however, analysts will then be required to think carefully about which substitution value to choose and to offer justification for the method chosen.


NORMALIZING THE DATA

Kulakow initiated discussions on two topics that relate to data normalization:


DATA DISTRIBUTION ISSUES/DATA TRANSFORMATIONS

Kulakow described the approach KSU is using to analyze the Subgroup's data set. He started by saying that a number of assumptions are made when performing analysis of variance. For example, statisticians assume that errors (or residuals) are homogeneous. According to some statisticians, Kulakow said, this particular assumption is one of the most crucial; if violated, the results produced by the analysis of variance are suspect. Thus, Kulakow said, KSU is making a point to test the homogeneity of error variance and is using the following approach. First, the original data are analyzed and the residuals are tested for homogeneity of error variance. If the residuals are not homogeneous, then a square-root transformation is performed on the data, they are re-analyzed, and the residuals are retested for homogeneity of error variance. If the residuals fail the test again, a logarithmic transformation is performed, the data are analyzed again, and the residuals are retested. If the residuals still lack homogeneity, then an unequal F test is used as the method of analysis. Data with negative values, such as percentage change, cannot be transformed with the square root or logarithmic transformation. For these data, an unequal F test is used. Kulakow acknowledged that this analytical approach is intensive but said that KSU's Wendy Feng has automated the process. He encouraged Subgroup members to contact him if they wanted information about how to automate analytical processes.

Kulakow said that the process that he described is performed for each analytical parameter. He has performed the analyses for Site G and will soon do so for most of the Subgroup sites. In most cases, analyses performed with Site G's original data passed the homogeneity of error variance test, and therefore, did not require transformation. For the most part, he said, in cases where the test failed with the original data, neither of the transformations (i.e., square-root or logarithmic) helped improve the situation, and analysts resorted to the unequal F-test mode of analysis. Lancaster, who is working on Site B's data, expressed surprise in Kulakow's finding; she said that all of the data she has worked on required a logarithmic transformation.

Lancaster suggested performing some simple plotting analyses (e.g., box plots and histograms) as a first step in the analytical process. She said that it is important to visualize the data before moving into more complicated analyses. Doing so allows analysts to identify outliers, determine whether the distribution is normal, and whether the data require transformation. She said that she has written some functions that allow her to plot data and generate descriptive statistics. She agreed to send Kulakow some information on her methods and to forward materials from a workshop on exploratory data analysis.


ANALYSIS OF VARIANCE ISSUES

Kulakow said that KSU is using a "repeated measures design" to analyze the Subgroup data. He said that the Subgroup's demonstration projects have two levels of repetition in measurements: depth and time. Kulakow said that the data collected across different depths are being analyzed separately. The only reason to run analyses on data from multiple depths, he said, would be to answer the following question: do results differ by depth? At most sites, however, the shallow and deep soils are so different from each other that the question becomes moot. KSU is interested, however, in using the "repeated measures design" to evaluate time. By doing so, analysts will be able to determine whether responses differ depending on time. He said that KSU is using the "derived variable approach to analyze repeated measures." New variables are calculated that represent the difference or percentage difference in response between two timed sampling events. For example, the difference between T1 and T2 is a variable and the difference between T2 and T3 is another variable. Kulakow said that the response is being measured using two different approaches: (1) the absolute difference in contaminant concentrations between the two time periods, and (2) the percentage difference. Lancaster expressed concern that there are some questions that might not be answered using the "repeated measures design" approach.


VARIABILITY

Lancaster noted that there is a great deal of variability within treatment plots at Site B. This is more pronounced in the deeper soil layers than in the surface layers. She said that the variability is so significant that it makes it difficult to glean meaning from the data and generate conclusions about the impact, if any, plants are having on contaminant degradation rates. This reality begs the following questions: Is there a way to obtain a more representative soil sample? Was the correct plot size chosen?


CHROMATOGRAMS

For Site A, Kulakow and Kirk O'Riley noted, chromatograms have been generated and fingerprinting analysis has been performed. Kulakow said that he eventually plans to generate this type of data for most of the Subgroup's 13 sites. O'Riley said that the chromatograms are a visual tool that allow analysts to determine how the components of petroleum wastes are changing over time. For example, by analyzing chromatogram patterns, it is possible to determine which hydrocarbon compounds are diminishing. While such findings offer important qualitative information, Kulakow said, he is not sure how to deal with these findings from a statistical standpoint and asked for suggestions. His request prompted brief discussions on the Principal Component Analysis and the Fatty Acid Methyl Ester (FAME) analysis.


QUALITY ASSURANCE/ QUALITY CONTROL (QA/QC)

Call participants discussed the following data quality issues:

Kulakow said that poor surrogate recovery and variable results in the standard reference sample could help explain some of the variability that has been reported in the Subgroup's data. Vega said that all EPA reports are required to include a discussion on data quality. She agreed to send Kulakow an example.


CONSISTENCY IN DATA ANALYSIS APPROACHES

As noted previously, Kulakow plans to perform analyses on most of the data collected from the RTDF sites. In addition, at particular sites, individual site managers and/or statisticians are conducting data analysis independently. Kulakow said that he is glad for the duplication of efforts for two reasons. First, it will allow for cross-comparison of conclusions. Second, it will allow specific sites to explore a wider variety of questions than are currently being evaluated under the RTDF Subgroup field study program. Kulakow said that this call was held to get involved parties talking and to help facilitate consistency in statistical analysis approaches. He said that it was acceptable, however, for different methods to be used at different sites. Call participants expressed strong interest in keeping the dialogue alive. They agreed that this could be accomplished by doing the following:


ACTION ITEMS