Part II
Selecting Samples
 

 For the averages you calculate from your samples to be as reliable as you need, certain procedures must first be followed in selecting your samples. This occurs after you complete your study design (the preceding chapter) but before you start sorting through those samples (the next chapter)
 

 

This chapter on selecting samples shows you how to [A] determine the requisite number of samples needed, [B] make sure each is of sufficient weight, and [C] pick those samples randomly.
 

[A] How many total samples are needed?
 

One of the major hurdles in sampling an area's waste stream is the expense involved in sorting. It would be nice to know approximately how many samples to sort before you get too far along so you don't wind up sorting more samples than needed.
 

How many samples will need to be sorted in order to get an estimate that can be relied upon to the degree desired depends upon how confident you want to be. Or, to use statistical terms: what is the confidence interval and the confidence level you would like to achieve? The interval describes how wide a band of uncertainty surrounds your sampling estimate (e.g. plus or minus 10%): the level, how probable it is that the answer is within that band (e.g. 90% probability). (For a detailed explanation of the two terms, see the Stat / facts Primer, at p.89.)
 

Essentially, the answer will depend upon how much the proportion for each material varies from sample to sample: greater variation needing more samples. The answer also depends on how small a fraction a material bears to the samples: materials with similar variability that average 2% will tend to require more samples than those that average 20%. Thus, for the desired reliability, the answer will vary from one material to the next: aluminum might require 45, while food waste, 15.
 

This chapter offers two ways to estimate the number of samples needed: [1] generic estimates from standard tables or [2] the formula and techniques for doing your own calculations.
 

[1] Standard Tables To Estimate Sample Size
 

For a quick estimate, you can use Table 4 which provides a general guide of the number of samples needed at the typical 90% confidence level. The table displays the different width confidence intervals you might desire ±5%, ±10%, ±20% and ±30%. The number of samples is expressed as a range to reflect the fact that different waste steams exhibit varying characteristics. Keep in mind that these figures are from standard industry sources based upon very old data and, we believe, they may overstate the samples needed in many cases.

Table 4
Estimated Number of Samples to Achieve
Different Confidence Intervals
at 90% Confidence Level [11]
To Achieve a ±5% Confidence Interval 
Residential  Commercial  Consolidated 
Newsprint 224-2397 698-3563 512-663
Cardboard 899-1955 533-997 1060-2573
Aluminum 275-1437 764-4399 430-704
Ferrous 194-554 552-3411 275-1331
Glass 146-619 596-2002 262-937
Plastic 261-1100 422-783 200-954
Organics 12-47 26-92 19-65
 
To Achieve a ±10% Confidence Interval
Residential  Commercial  Consolidated 
Newsprint 58-600 175-891 128-166
Cardboard 225-489 134-250 265-644
Alumin 70-360 191-1100 110-176
Ferrous 50-139 138-853 70-333
Glass 39-155 149-501 67-235
Plastic 67-275 107-196 52-239
Organics 5-14 8-25 7-18
Table continued on following page...
 

Table 4 ...continued from preceding page
 
To Achieve a ±20% Confidence Interval
Residential  Commercial  Consolidated 
Newsprint 16-150 48-223 34-43
Cardboard 58-123 35-64 68-161
Aluminum 19-92 50-275 29-46
Ferrous 14-37 36-214 19-85
Glass 19-61 39-126 19-61
Plastic 18-70 28-51 15-61
Organics 3-5 4-8 4-6
 
To Achieve a ±30% Confidence Interval
Residential  Commercial  Consolidated 
Newsprint 9-68 21-101 16-21
Cardboard 27-56 17-30 31-73
Aluminum 10-42 23-123 14-22
Ferrous 8-18 17-97 10-39
Glass 6-19 19-58 10-28
Plastic 10-32 14-24 8-28
Organics 3-4 3-5 3-4
 
 
 

[2] Formula for Calculating Sample Size
 

For more realistic numbers in your situation ones that may not be nearly as large you can do your own calculations. To do that, you'll need to have a preliminary idea of how much variation the materials of major interest exhibit in the particular kind of waste stream you're sampling. By variability, it is meant how much each set of samples that are taken would vary around the average of those samples. This preliminary estimate at the variability of materials in your area may be based on:
 

An earlier sample in that area; or
 

A sampling conducted in a similar community; or

The first half dozen to a dozen samples of your own sort.
 

Be aware, though, that the number crunching involved is not for the "statistically-disadvantaged," and those disinclined to try may want to skip the rest of this section.
 

[note: If you don't want to handle the statistics but do want to enjoy the fruits that statistics can produce you may want to consider purchasing the WasteSort software package to automatically handle all of those calculations for you. An order form is contained on the last page of this GUIDE.]
 

The formula for sample size relies on two factors that are recurringly used in this and later calculations: [a] standard deviations and its close relative the coefficient of variance and [b] t-values. These are then incorporated into [c] the sample size formula. The mechanics of using these terms are explained very simply below. (An explanation of what they mean is found in the Stat / facts Primer, at p. 85).
 

[a] Standard Deviations
 

As with much of the rest of the statistics used in waste sorts, the calculations typically are a function of the amount of variability in your samples. The measuring stick for that variability in waste sort equations is the standard deviation. Again, this is simply a way to quantify how much each of those samples varies from the average of all the samples. Equation 1 is the particular form of the formula used to calculate the standard deviation for sampling, the so-called unbiased form.
 

As discussed in the next section (see p. 36), it will be handy to convert this standard deviation into something called the coefficient of variation when it is later used to compute sample size. See Equation 2 for the formula for that coefficient.

[example: ASSUMPTIONS. The cardboard in 5 samples is:
 
Sample Nos. 1 2 3 4 5
Cardboard 4% 3% 5% 6% 4%
CALCULATIONS. [1] Average. The sum of these 5 percentage values, then, is 22%. To determine the average, divide that sum by the number of samples, or 5, and the answer is 4.4%, or 0.044. [2] Squared Deviations. To determine the squared deviations in the numerator of Equation 1, for the first sample subtract the 0.044 average value from the 4%, or 0.04, for the material, or 0.004, and then square that difference, or 0.000016. Repeat this subtracting-and-squaring procedure for the other 4 samples, which should be .000196, 0.000036, 0.000256 and 0.000016. The sum of these squared deviations is 0.00052. [3] Divide by Samples. Divide that sum of the squares by the number of samples minus one in the denominator of Equation 1, or 4, and the answer is 0.00013. Take the square root of 0.00013, or 0.0114, and that is the standard deviation. Divide 0.0114 by the 0.044 average, or 25.91%, and that is the coefficient of variation.]
 

[caution: If you're using a spreadsheet to compute standard deviations, be aware that there is a possible glitch. Many spreadsheets default to a different version of this parameter that is referred to as either the "biased" or "population" standard deviation. You'll have to check the program's manual to be sure. If it is the biased form, some programs will have an option for the unbiased form which is what is required here. If yours does not, you can gerry-rig a correction by using the following formula


 
 

[b] t-Value
 

The t-value is simply a multiplier used in many statistical formula when the number of samples involved is small. It relates the amount of variability among the averages for a material from a number of different samplings to the probability that estimate is correct. This is what is being done when either the number of samples needed or the confidence bands are calculated. You can find the appropriate t-value to use from a table called the Students t-table that is reproduced in a simplified form in the appendix, at p. 101. To learn how to use t-tables, see the sidebar on the page following the next.

[c] Sample Size
 

(i) When Not Consolidating Strata
 

The formula in Equation 3 is used in order to calculate the number of samples needed to achieve the desired level of certainty around the estimated average. It is for either an unstratified sample or for an individual stratum in a sample which has been stratified.
 

Note that the number of samples needed will be different for each material which has a different variation than another. In practice, the calculation is often done just for the particular material of greatest interest. Alternatively, it may be done for the each of several materials of equal importance, and then the average of the different numbers of samples for each of those materials is used.

[note: An example follows later on p. 39.]

You define the desired certainty in two places in Equation 3 first as a confidence interval in the denominator; and second as the confidence level that is used in determining the t-value in that equation.
 

There are two wrinkles when using this formula. One is the particular units in which you define the desired confidence interval.
 

The way many people use the term "confidence interval" is as a percent of the averages, e.g. ±10% of the average for paper, etc. When used in this way, if the average for a material were 5.0%, the correct answer would be within a range of 5.5 to 4.5 percent, i.e. 10% of 5.0% is 0.5% that is added to and subtracted from the 5.0% average.

However, were you to use the traditional form of this formula for sample size, the numerator would show multiplication of the t-value by the standard deviation not by its close relative the coefficient of variation that is shown in Equation 3. And, were the calculation performed with the more traditional standard deviation, the units in which the computed number of samples are expressed would be with reference to the amount of actual uncertainty around the mean instead of as a percent of the average.
 

To illustrate, look again at the preceding page's example. If the standard deviation were used in Equation 3, the confidence interval would be expressed with the actual value of the band, 0.005, in the assumed case of a material with an average of 5% ±10% (i.e. 0.05 x 0.10 = 0.005). If the coefficient of variation were used, the confidence interval would be 10% of the average (i.e. .005/.05).
 

That is why Equation 3 uses the coefficient of variation instead of the more commonly stated standard deviation namely to produce an answer in the units that most people understand which is as a percent of the mean. To recap the process to do this, the standard deviation is converted to the coefficient of variation by dividing the standard deviation by its average (see Equation 3 on p. 35).
 

A second complication is that the t-value on the right side of the equation is determined, in part, by the number of samples on the left side of the equal sign. (Remember that the t-value is a function of the confidence level and the number of samples being considered or more precisely the degrees of freedom.)
 

That is to say, the number of samples on the left of the equation, and the t-value on the right, both change at the same time in relation to the other. To resolve this kind of problem, it is necessary to use a process called iteration. If you don't, the formula can incorrectly estimate the number of samples needed that will usually be less than what is actually required to achieve the desired reliability.
 

Here, iteration involves successive calculations of Equation 3 in which the coefficient of variation and the desired confidence interval remain constant, while the t-value changes with each computation. The degrees of freedom used in determining the t-value will depend on the number of samples determined from the prior calculation.
 

The first time you calculate Equation 3, use as the t-value the number at the intersection of the column for the desired confidence level and the bottom row for the degrees of freedom (see the Using t-tables sidebar on the preceding page for an explanation). That bottom row is for any large sample, represented by the "" symbol, and is used initially to provide the lowest possible number of samples needed to provide a lower outer bound. Then solve for the number of samples.
 

The second time you calculate Equation 3, substitute for the t-value the number at the intersection of the same confidence level column and unlike before the degrees of freedom for the number of samples computed from the prior iteration, minus 1. Continue this process until the calculated number of samples from successive iterations converges.
 

[example: ASSUMPTIONS. Let's use the same assumptions from the example for the standard deviation (see p. 33). The standard deviation was 1.14% and the average was 4.4%. Further assume that the desired confidence interval is ±10% of the average and the desired confidence level is 90% (although, later, you will make your own determination of desired certainty). CALCULATIONS. [a] Convert to Coefficient of Variation. To convert that actual value of the variation into a percent of its average, divide the standard deviation by the average, 4.4%. The standard deviation as a percent of its mean becomes 25.91%. [b] Confidence Interval. Even though the ±10% confidence interval is plus and minus the average, it is only necessary to use the positive percent in the equation because it will squared, and a negative number becomes positive when it is squared. [c] Iterate. The initial iteration would use a t-value of 1.645 (from the intersection of the 90% confidence level column and the row). Plugging these numbers into Equation 3 produces an answer of 18.17 for the first interation

To insure the desired reliability is achieved, round UP. The second iteration would use a t-value for a degrees of freedom from the previous calculation's 19 minus 1, or 18. That produces a sample size of 20.19. Round that up to 21 and subtract 1, and find the t-value for 20 degrees of freedom, which is 1.721. Running Equation 3 again and the sample size for the third iteration becomes 19.98. On the fourth iteration, Equation 3 produces 20.07. The fifth iteration, in turn, is 19.98, and the sixth, 20.07. Thus, the process converges on 20. In order to insure that the desired reliability is achieved if the number included a fraction, you would have rounded UP.]
 
 
 
 

(ii) When Consolidating Strata/Seasons
 

If your study design uses different strata, it is possible to compute the required sample size for the strata consolidated together as a whole. This procedure can often result in the need for fewer samples for the whole than would be computed if it were not stratified. The formula for sample size from stratified samples uses advanced techniques called pooling. See Equation 4 which uses the example of a waste stream stratified into residential and commercial strata.

[note: This formula only pertains if you allocate the number of samples needed between strata optimally as opposed to proportionally or otherwise (as is described in the following section with reference to Equation 5, see p. 43]
 
 

[example: ASSUMPTIONS. You want to know the number of samples needed to achieve a ±20% confidence interval (as a percent of the mean) and confidence level of 90% for corrugated cardboard in the consolidated residential and commercial strata of a waste stream. Their characteristics are:
 
Average Stnd. Dev. Proprtn.
Residential 1.5% 1.0% 40%
Commercial 5.0% 6.0% 60%
 

CALCULATIONS. [a] Weighted Coefficient of Variation. The numerator of the bracketed part of the equation, then, would be 0.987


 

[b] Confidence Interval. The denominator inside the brackets would be 20% (representing the desired uncertainty band being used in this example). [c] Iterate. As before, the t-value for the initial iteration would be 1.645. Thus, the first calculated sample size is 65.90. For the second iteration, the degrees of freedom derived from the number of samples in the prior iteration would be 65 (i.e. 66-1) if the number is rounded UP for conservatism. The t-value for a degrees of freedom of 65, which is between the 60 and 120 rows shown on the table, would, again, be taken from the row for 120 to be conservative, or 1.658. Solving for the second iteration is 66.9, or, if rounded up, 67. The third iteration, obviously, also is 66.9 because it repeats the same t-value. Thus, the number of samples needed is 67.]
 
 

[B] How many samples in each strata?
 

If you have stratified the waste stream, such as between residential and commercial waste, you will need to allocate the total number of samples required among those groupings. For example, out of 50 total samples, 30 samples will be from the commercial and 20, the residential, strata. There are two standard procedures to allocate samples to strata: [1] one that optimizes reliability and [2] the other, proportionality, that eases computations.
 

[1] Allocating samples to strata optimally
 

The optimal way to allocate a specified number of samples to strata is to do so in a way that produces the narrowest uncertainty band around the averages you estimate from the consolidated waste stream. This is also the way to produce the desired confidence with the fewest samples.
 

Optimization is different from having the proportion of samples in each strata mirror your estimates of their proportion in the population. Usually, it will be more optimal to load up more samples in the strata with the greatest variability. If both strata exhibit similar variation, however, the extra effort for the calculation will not be necessary.
 

Equation 5 shows how optimization can be done when you are deciding how many of the total samples needed should come, for example, from the residential relative to the commercial strata. The allocation will be different for each material, so it will be simplest to use this procedure for just the key material.
 

[example: ASSUMPTIONS. The residential waste stream is 40% of the entire waste stream, and commercial, 60%. Further assume that the standard deviation from Equation 2 for one of the primary materials in the residential strata is 1.2% and, commercial, 1.8%. Also, the total number of samples being allocated to strata is assumed to be 50. CALCULATION. The portion of those 50 samples allocated optimally to the residential strata would be 15 (rounded from 15.4)


 
 
 
 

There is an important lesson from this that runs counter to common practice. The optimization principle means that it will usually be more efficient to take a greater number of samples from the commercial than from the residential sector instead of the other way around. Assuming that the commercial strata has greater variability than residential, more commercial samples will more likely provide equivalent reliability when combining the two strata with fewer total number of samples.
 

[2] Allocating samples to strata proportionately

Another way to allocate the total samples taken by strata is to do so in the same proportion that the strata bears to the total waste stream, per Equation 6. In this example, it is again illustrated as the residential strata.
 

[example: ASSUMPTIONS. In the last year, the residential sector is estimated to have discarded 333,000 tons, and the commercial, 667,000 tons, at the landfill where the sort is planned. You're planning to take 50 samples based upon the sample size formula or other consideration. CALCULATIONS. The following shows the calculation for proportional allocation. As shown, the calculated number for each strata includes a fraction


 
 
 

To insure attaining the desired reliability, you will need to round UP and take 17 residential and 34 commercial samples. If retaining the original number of samples is more important, round to the closer number.]

This procedure will make it unnecessary to use weights later when calculating the statistics, because the weights will be implicit in the number of samples taken. However, unless the strata have similar variability, the optimization approach will require less sampling to be done.
 

[B] How many pounds should each
sample be?

Once the number of samples needed has been determined, you'll need to know how many pounds each sample should be. The general rule is that samples should be between 200-300 pounds. Less than that size and the variability between samples increases to the point where you'll need to increase the number of samples to maintain the same level of reliability. More than 200 - 300 pounds and the gains in improved reliability fall off rapidly and are not worth the additional cost of sorting through the larger quantity of trash.
 

[C] How do I get samples that are
random?

After determining how many samples are needed, the next job is to insure that the samples you pick are randomly selected. Randomization of sample selection, remember, is essential to make the maze of statistical calculations work.
 

Waste sorts are typically done where [1] the material is disposed of at the landfill (or incinerator) a waste destination study or [2] set out by the household at the curb for collection a waste source study.
 

[1] Waste destination studies
 

Waste sorts are typically done at the disposal site or transfer station, the so-called waste destination study, and, here, random sampling becomes a two-stage operation, namely: [a] choosing the trucks from which to select samples and [b] choosing the samples picked from within the trucks.
 

[a] Trucks
 

The first task is to randomly select the trucks, from which, in the next step, you'll pull your actual samples. Your study design should have previously established whether the trucks are to be picked from strata (see p. 12) and from which hours or days they should be selected (see p. 19).
 

To do all this, follow these three steps
 

Total Number of Trucks. Estimate the total number of trucks in each strata (e.g. residential and commercial loads) and days on which sorting is scheduled (e.g. Monday, Tuesday...Friday). This can be done by interviewing the haulers in your area, if they are known, or by reviewing customer records at the disposal facility or observing the names stenciled on incoming trucks at the disposal facility's weigh station. To insure having enough trucks, UNDER estimate the number of trucks you expect to guard against the possibility of having too few trucks show up on the day you sort.
 

Needed Number of Trucks. Divide the total number of samples for each strata, as well as for each day's sorting, by the maximum number of samples that will be pulled from each truck (see p. 27 for the procedure to determine total sample size and p. 42 for the procedure to allocate the total number of samples by strata). Typically 2 - 4 samples are pulled from each truck. As a rule of thumb for selecting between 2 - 4 samples per truck: if the composition of loads varies greatly between trucks, tend to the lower number per truck. If sorting cost is a paramount factor, then tend to the higher number. Round UP if the resulting number of trucks needed includes a fraction to insure that you have enough trucks. In that case, fewer than the maximum number of samples will be pulled from the one of the trucks.
 

Generate Random Numbers. Use the procedure for random number selection described in the sidebar on page 52 to generate a list of random numbers for each strata and sorting day. The random numbers can then be used to count off trucks as they arrive at the disposal site, with those matching the random numbers being selected. Alternatively, the random numbers can be used to pre-select from the list of trucks scheduled to unload on each sorting day. But to guard against last minute changes, you must check on the day of the sort to make sure no changes have been made in the haulers' schedule.
 

[note: Tear-out worksheets to use in the field to randomly select trucks are included in the appendix. See p. 121.]
 
 

[example: [1] Total number of trucks. ASSUMPTION. If you will be sorting from loads on Monday, Tuesday and Wednesday from the residential and commercial sector, your review of landfill records might show something like the following estimates of the minimum number of trucks expected each day
 
Total Trucks
Mon. Tues. Wed.
Residential 43 31 39
Commercial 58 43 45
 

[2] Number of trucks needed. ASSUMPTION. The total number of residential samples needed for this season's sort is 20 and, commercial, 30, and the maximum number of samples per truck is 2. CALCULATION. The 20 residential samples will be pulled two from each truck, or from 10 trucks total. Dividing the 10 needed residential trucks (and 15 commercial) by the three days of the sort for the sort would look like this
 
Number of Trucks Needed
Mon. Tues. Wed.
Residential 3 3 4*
Commercial 5 5 5
 

* When the number of samples does not divide equally into the number of days, allocate the extra samples to meet logistical needs. Here it was done on the latter days when your sorters will have become more experienced and can sort faster.
 

[3] Random number generation. Here is an abbreviated section from the full random number table in the appendix (see p. 99) to use to randomly select, as an example, the 3 trucks needed from the residential strata on Monday
 
83483460  87865509  86011066  71703342  69967095 
35432442  83188631  51383215  62917750  46335727 
29055222  98578894  22901878  64718692  91965927 
 

There are two common procedures for random selection: random number generation and the "Nth" number method. Both begin with the random selection of the first number from the table by closing your eyes and jabbing at the chart.

random selection of first number. Assume you randomly picked the sixth number "98578894". The last two digits (because there are two digits in the number 43 trucks) are "94". Because this is above 43 reject it and go on. Reading down, move on to the next number, "86011066", whose last two digits, "66", also do not qualify because they are too large. Finally, the last two digits of the next number, "51383215", or "15", qualifies.

random number generation. If you are using the "random number generator" approach, record #15 as the first truck to be selected, and proceed down. The next one is "42" which also fits, but the following three do not. The two after that qualify, but they are both the same number "27". Knock out the duplicate. That provides you with the three randomly selected numbers needed from the residential strata on Monday. Ordered sequentially, they become 15, 22, 27. On Monday, you pull over the 15th, 22nd and 27th residential trucks that arrive at the landfill.

Nth method. Simply divide the 43 trucks expected to arrive on Monday by the 3 trucks needed, which is 14.3, or, rounded down, 14. Thus, pick that same 15th truck to begin numbering, add 14 to get the second, or 29, add 14 again to get the third, 43, at which point you would have the three needed.]
 

[b] Samples from within trucks
 

Only 200-1,000 out of upwards of 20,000 pounds on a packer truck will be needed for sampling. It is extremely important to insure that each part of the load on the truck has an equal chance of being selected as a sample. For one thing, loads inside a truck stratify. Heavier materials and fines fall to the bottom and sometimes to the back. For another, most people would subconsciously be less eager to pull samples out of the load that are foul smelling.
 

Two approaches are typically used to prevent such biases: (i) the coning and quartering and (ii) the grid and pull approaches. The grid approach uses strict random procedures and, for that reason, is usually considered superior to the cone and quarter approach which does not.
 

This chapter will describe how to designate the parts of the waste stream to pull and sort for samples: the next chapter describes how to go about physically pulling them out (see p. 56).
 

(i) Coning and Quartering
 

The truck load of waste is tipped onto the floor and shoved into a pile by an end loader. A quarter of the pile is removed like a slice out of a pie, and, in turn, shoved into a pile and quartered again...until the remaining pile is the size desired for a sample usually 200-300 pounds (see p. 45). "Eyeballing" is usually used to select quarters instead of strict randomization. Another variant of this approach is to then thoroughly shake and mix the 200-300 pound sample in a tarp that is pinned corner to corner. Under the assumption that the variation among different parts of that sample have been eliminated by the mixing, any smaller 50-60 pound subset of the original 200-300 pound sample is sometimes used.
 

(i) Grid and Pull

A truck driver begins to dump, and then drives his vehicle forward in lurching movements as the load is pushed onto the tipping floor with the truck's push-out plate. This spreads the load as nearly as possible into the shape of a "bread loaf" about one yard in height and, about 10 feet wide by 40 feet long. An end loader may be necessary to insure that the height is leavened at 36"-40". Then, use either stakes and twine or visually divide the bread loaf into approximately 25-50 "cells", 3 feet by 3 feet or, with a 3 foot depth, approximately one cubic-yard in size which should weigh 200-300 pounds that is the desired weight (see p. 45). Next, sequentially number each cell's location on a sheet of paper on which you map the loaf. Stratification of the load by weight can cause the cells at the edge to have different characteristics than at the center. In order to ensure that cells in the center have an equal chance of selection as cells on the edge, use the random selection procedure in the sidebar on the next page.
 

When you attempt to implement random selection in the real world on the tipping floor, often some item of waste, such as a 2" x 4" piece of lumber, may be partially in and partially out of the targeted cell or of the quarter of the cone. Does the sorter scoop it in or throw it out of the sample?
 

The best sampling technique that has been developed to cope with this, and still keep the samples random so that statistics will work, is to have a systematic and unbiased response.
 

For coning and quartering, decide by random choice at the outset, such as by the flip of a coin, that overhanging items on the right (or the left) side will be "in" and the other side "out" of the sample. For grids, designate two of the four faces of selected cells, such as north and east, as the "in" sides and the others will be the "outs".
 

Yet another real world complication is fondly referred to as the "dead horse" problem, namely an item that is wildly larger than any of the other items. In waste sorts, it might be a large sleeper sofa. Leave in such a large, rarely occurring item, and it will warp your averages, so something needs to be done. There are punctilious statisticians who violently disagree, but the simplest way to handle this problem is to arbitrarily kick the dead horse out of your sample. Otherwise, your sample results will probably be distorted and require all kinds of convoluted massaging for "outliers" to make any sense.
 

[note: Tear-out worksheets to use in the field to randomly select samples are included in the appendix. See p. 121.]
 

[2] Waste source studies
 

In waste source studies samples are directly selected and collected at the source in the household or business. To insure that samples are taken from all types of waste generators in proportion to their share of the whole requires fancy stratification and, after selection has been determined, substantial expense to collect. For this reason, waste source studies are rarely done. However, it is often the only good way to get waste composition analysis specific for different types of socio-economic groupings or specific types of generators, and to access the materials in the stream before they have been crushed in the packer truck.