# Jan 6, 2016Consider these points when conducting on-farm research

Over the past several decades, I have seen many growers perform research to compare rootstocks, fumigation treatments, fertilizers, growth regulators and other types of practices. Often the experiments are not well designed and probably have led to erroneous conclusions. The purpose of this article is to point out some of the things that should be considered while designing an experiment to increase the likelihood that observed differences are in fact due to the treatments being compared.

Experiments are performed to obtain data that will provide information about one or more treatments or variables. The term “variable” refers to a characteristic that can be measured, such as tree height, trunk size, yield, fruit flesh firmness, etc. Treatments can be applied to sections of an orchard, individual trees or even individual fruit on a tree selected at random. When performing an experiment it is important to hold constant any variable(s) that can influence the results other than the treatments that are being studying. While designing experiments it is critical to replicate and randomize treatments to allow the researcher to partition the total variation into two components (variation due to treatment and unexplained variation). We need replication to calculate variation and randomization is needed to avoid biased estimates of variation. Most researchers take at least one 3-credit course in Experimental Design because designing good experiments is complicated and many factors must be considered. In this article I will concentrate on the three aspects of experimental design that I feel are most important – randomization, replication and sample size.

**Think about randomization**

There is usually variation within an orchard due to differences in soil fertility, soil moisture, pest pressure, shading, dust from nearby dirt roads or drying conditions. The purpose of an experiment is to determine if treatments differ, but we want to be sure that the observed differences are due to the treatments rather than some other factors that we did not control.

Randomization is complicated in commercial orchards, but lack of randomization will usually lead to erroneous results. It is easy to spray a foliar fertilizer on an entire row rather than spraying individual trees, but there may be something different about that row compared to the other rows receiving different treatments. If that is the case, then the observed differences may be due to factors other than treatments, resulting in misleading results. A single replication with many trees is not a substitute for true replication.

**Completely Randomized Design**

The simplest and most commonly used experimental design is the Completely Randomized Design (CRD), where all experimental units have an equal chance to being assigned to every treatment. An experiment unit is the smallest unit receiving a single treatment. An experimental unit may be a single apple on a tree, a branch on a tree, a whole tree, a plot of 5 consecutive trees, an entire row of trees or even a 10 acre block of trees.

**Randomized Complete Block Design**

Sometimes researchers can identify sources of variation that may affect the variable that is being measured. For example large trees typically have higher yield than small trees; trees with heavy crops tend to have smaller fruit than trees with light crops, or trees at the bottom of a hill tend to be more vigorous than trees higher on a hill due to differences in soil fertility. In such cases, the experimental units can be assigned to different blocks and every treatment is assigned to each block. Such a design is called the Randomized Complete Block Design (RCBD).

For RCB designs to be effective there should be less variation within blocks than between blocks, so blocks should be kept as small as possible. Blocks can be any shape or size and a block does not need to be confined to an area within an orchard. For example, for thinning experiments, I often blocked on bloom density because thinning treatments usually remove more fruit from trees with heavy bloom than trees with light bloom. So to determine the true effect of the thinning treatment, I must account for the differences in initial bloom.

**Assigning treatments**

For example, if I wanted to compare 3 thinning treatments and I had 21 trees to work with, I rated bloom on each tree on scale of 1 to 7, where 1 = few blossoms and 7 = snowball bloom. Then I ranked the 21 trees in ascending order. The first 3 trees were assigned to block 1, the next 3 trees were assigned to block 2, and so on. Then each treatment was randomly assigned to each block, so every treatment was applied to a tree with each bloom density rating. In this way, I was able to account for the effect of bloom density on final fruit set and fruit size at harvest.

A better way to assign treatments, if there are enough trees to work with, is to select 21 trees with uniform bloom, so initial bloom density is not even an issue. However, since experimental trees are expensive to maintain, we usually have limited numbers of trees to work with in our research orchards, so we have to design experiments that control for experimental error. Since RCB designs require more work and are more complicated than CR designs, I prefer to use the simpler design unless there is a good reason to block.

**Think about replication**

Since no two experimental units are exactly the same and they will not respond exactly the same to a given treatment, there will be variability in our data. To get a sense of variation, we need to apply each treatment to more than one experimental unit. In an experiment, replication refers to the practice of assigning each treatment to more than one experimental unit. In general, the more replications we have, the better. Replication also allows us to estimate the variability associated with the treatments. When performing a statistical analysis of the data, we are actually estimating the proportion of the total variation (variation due to treatments plus variation not due to treatments) that can be attributed to the treatment. We usually test the hypothesis that the difference between two or more treatment means is zero. An appropriate statistical test will allow us to calculate the probability that the observed difference between treatment means is not due to our treatments.

**Think about sample size**

If there are too few replications, then we cannot detect differences between treatments. If there are too many replications, then we are wasting resources. So we want to strike a balance, where we have enough replications to detect important differences without wasting resources. In other words, if we want to detect a 10% difference in yield, then we don’t need enough replications to detect a 3% yield increase. Preliminary data is critical to accurately estimate how many replications are needed. However, we can consider how many replications are typically used for different types of experiments and the magnitude of the treatment differences that were determined to be significant. For experiments involving whole apple trees, about 8 to 10 single-tree replications is usually adequate to find meaningful treatment differences. Peach trees tend to be more variable than apple trees and I usually try to use 15 to 20 peach trees per treatment. If the experimental unit is a 3 or 5 tree plot, then replication can usually be reduced to 4 to 6 per treatment.

I have spent much of my career trying to determine appropriate sample sizes for various types of experiments.

- To detect a 10% difference in yield of apple trees, we usually need at least 8 trees per treatment.
- To detect a 10% difference in peach tree yield, we need 20 trees per treatment.
- To detect 10% differences in strawberry yields, we need 8 plants per treatment.
- To detect a 10% difference in peach fruit size, we need to weigh 16 fruit from each of 4 trees harvested on 3 dates (most of the variation is fruit-to-fruit within a tree, so we need many fruit per tree but few trees per treatment).

**Data interpretation**

After collecting data from each experimental unit, we need to summarize the data so we can compare treatments. We usually calculate the mean or average for each treatment. Means are calculated by summing the data for a given treatment and dividing the sum by the number of experimental units. However, we need to look at the numbers that go into those means. For example one very large or very small number for a treatment can have a large influence the mean. Such values are sometimes called outliers, so we need to make sure that those values are reliable. Sometimes the number was recorded incorrectly, other times there is an explanation for the unusual value.

When I see an unusual value, I go out to the orchard and look at the tree to see if I can explain the unusual value. For example, in one experiment peach fruit size for one tree was quite a bit smaller than for the other trees. When I looked at the tree in early September I noticed that the foliage was going off-color early and upon inspecting the trunk at the soil line, I noticed that the trunk had been girdled. So I eliminated that tree from the data set. If no reasonable explanation can be found for unusual values, I prefer to retain those values in the data set.

Researchers use various statistical techniques to determine if treatment means are significantly different. Significance actually refers to the probability of declaring that two or more treatments are different when the observed treatment differences are actually not due to the treatments. The 5% probability level is most commonly used. This means that there is a 5% chance that the differences between treatments are not due to the treatment. This is a fairly conservative test. I think most growers would be willing to accept a 15 or 20% risk of mistakenly declaring that two treatments are different.

**Final thoughts concerning significance**

Performing statistical tests is beyond the scope of this article, but I like to consider 3 types of significance. I have already discussed statistical significance, which tells us how sure we are that treatments are actually different.

- If we use very large numbers of replications, we will determine that very small differences are statistically significant. But this tells us nothing about how important the difference is.
- A second type of significance is “biological significance”. In other words, is the difference between two treatments biologically important? Apple trees on M.9 EMLA rootstocks may be 10 ft tall and trees on M.9 Pajam 1 may be 11 ft tall. This 1 ft difference may be significant, but is it biologically important?
- The third type of significance is “economic significance.” This is probably the most important type of significance for fruit growers to consider and I will provide two examples.

- When I was in grad school I was trying to identify the critical freezing temperature that would decrease strawberry yields. I froze plants at different temperatures, grew them out and recorded yield. Plants that were exposed to minus 8°F were highly variable, some plants had a full crop and some plants had no fruit at all. There was so much variation that a 50% yield difference was not significant at the 5% level. However, a 50% yield reduction would have a large effect on profitability.
- I once saw a presentation on increasing the typiness of Delicious apple fruit by applying a PGR product. The researcher reported that the average length:diameter ratio was 0.93 for the treated fruit and only 0.91 for the non-treated fruit and the difference was significant. When I asked him if he could distinguish between boxes of treated and non-treated fruit, he said he could not tell which fruit were treated. When I asked him if a buyer would be willing to pay more for the treated fruit, his reply was “probably not.” This is an example of a statistical difference is not associated with an economic difference.

So when evaluating research results we have to look beyond the statistical significance. We have to consider how the experiment was performed. Were treatments adequately replicated and randomized to give us confidence in the data? And are the treatment differences of economic importance?

– *Richard Marini, Penn State University*