For the averages you calculate from your samples to be as reliable
as you need, certain procedures must first be followed in selecting your
samples. This occurs after you complete your study design (the preceding
chapter) but before you start sorting through those samples (the next chapter)
This chapter on selecting samples shows you how to [A] determine
the requisite number of samples needed, [B] make sure each is of
sufficient weight, and [C] pick those samples randomly.
One of the major hurdles in sampling an area's waste stream is the expense
involved in sorting. It would be nice to know approximately how many samples
to sort before you get too far along so you don't wind up sorting more
samples than needed.
How many samples will need to be sorted in order to get an estimate
that can be relied upon to the degree desired depends upon how confident
you want to be. Or, to use statistical terms: what is the confidence
interval and the confidence level you would like to achieve?
The interval describes how wide a band of uncertainty surrounds
your sampling estimate (e.g. plus or minus 10%): the level, how
probable it is that the answer is within that band (e.g. 90% probability).
(For a detailed explanation of the two terms, see the Stat
/ _{facts} Primer,
at p.89.)
Essentially, the answer will depend upon how much the proportion for
each material varies from sample to sample: greater variation needing more
samples. The answer also depends on how small a fraction a material bears
to the samples: materials with similar variability that average 2% will
tend to require more samples than those that average 20%. Thus, for the
desired reliability, the answer will vary from one material to the next:
aluminum might require 45, while food waste, 15.
This chapter offers two ways to estimate the number of samples needed:
[1] generic estimates from standard tables or [2] the formula
and techniques for doing your own calculations.
[1] Standard Tables To Estimate Sample Size
For a quick estimate, you can use Table 4 which provides a general guide of the number of samples needed at the typical 90% confidence level. The table displays the different width confidence intervals you might desire ±5%, ±10%, ±20% and ±30%. The number of samples is expressed as a range to reflect the fact that different waste steams exhibit varying characteristics. Keep in mind that these figures are from standard industry sources based upon very old data and, we believe, they may overstate the samples needed in many cases.


Residential  Commercial  Consolidated  
Newsprint  2242397  6983563  512663 
Cardboard  8991955  533997  10602573 
Aluminum  2751437  7644399  430704 
Ferrous  194554  5523411  2751331 
Glass  146619  5962002  262937 
Plastic  2611100  422783  200954 
Organics  1247  2692  1965 
To Achieve a ±10% Confidence Interval  
Residential  Commercial  Consolidated  
Newsprint  58600  175891  128166 
Cardboard  225489  134250  265644 
Alumin  70360  1911100  110176 
Ferrous  50139  138853  70333 
Glass  39155  149501  67235 
Plastic  67275  107196  52239 
Organics  514  825  718 
Table 4 ...continued from preceding
page
To Achieve a ±20% Confidence Interval  
Residential  Commercial  Consolidated  
Newsprint  16150  48223  3443 
Cardboard  58123  3564  68161 
Aluminum  1992  50275  2946 
Ferrous  1437  36214  1985 
Glass  1961  39126  1961 
Plastic  1870  2851  1561 
Organics  35  48  46 
To Achieve a ±30% Confidence Interval  
Residential  Commercial  Consolidated  
Newsprint  968  21101  1621 
Cardboard  2756  1730  3173 
Aluminum  1042  23123  1422 
Ferrous  818  1797  1039 
Glass  619  1958  1028 
Plastic  1032  1424  828 
Organics  34  35  34 
[2] Formula for Calculating Sample Size
For more realistic numbers in your situation ones that may not be nearly
as large you can do your own calculations. To do that, you'll need to have
a preliminary idea of how much variation the materials of major interest
exhibit in the particular kind of waste stream you're sampling. By variability,
it is meant how much each set of samples that are taken would vary around
the average of those samples. This preliminary estimate at the variability
of materials in your area may be based on:
An earlier
sample in that area; or
A sampling conducted in a similar community; or
The
first half dozen to a dozen samples of your own sort.
Be aware, though, that the number crunching involved is not for the
"statisticallydisadvantaged," and those disinclined to try may want to
skip the rest of this section.
[note: If you don't want to handle the statistics
but do want to enjoy the fruits that statistics can produce you may want
to consider purchasing the WasteSort_{™}
software package to automatically handle all of those calculations for
you. An order form is contained on the last page of this GUIDE.]
The formula for sample size relies on two factors that are recurringly
used in this and later calculations: [a] standard deviations
and its close relative the coefficient of variance and [b]
tvalues. These are then incorporated into [c] the sample
size formula. The mechanics of using these terms are explained very simply
below. (An explanation of what they mean is found in the Stat
/ _{facts} Primer,
at p. 85).
[a] Standard Deviations
As with much of the rest of the statistics used in waste sorts, the
calculations typically are a function of the amount of variability in your
samples. The measuring stick for that variability in waste sort equations
is the standard deviation. Again, this is simply a way to quantify
how much each of those samples varies from the average of all the samples.
Equation 1 is the particular form of the formula used to calculate the
standard deviation for sampling, the socalled unbiased form.
[example: ASSUMPTIONS. The cardboard in 5 samples
is:
Sample Nos.  1  2  3  4  5 
Cardboard  4%  3%  5%  6%  4% 
[caution: If you're using a spreadsheet to compute standard deviations, be aware that there is a possible glitch. Many spreadsheets default to a different version of this parameter that is referred to as either the "biased" or "population" standard deviation. You'll have to check the program's manual to be sure. If it is the biased form, some programs will have an option for the unbiased form which is what is required here. If yours does not, you can gerryrig a correction by using the following formula
[b] tValue
The tvalue is simply a multiplier used in many statistical formula when the number of samples involved is small. It relates the amount of variability among the averages for a material from a number of different samplings to the probability that estimate is correct. This is what is being done when either the number of samples needed or the confidence bands are calculated. You can find the appropriate tvalue to use from a table called the Students ttable that is reproduced in a simplified form in the appendix, at p. 101. To learn how to use ttables, see the sidebar on the page following the next.
[c] Sample Size
(i) When Not Consolidating Strata
The formula in Equation 3 is used in order to calculate the number of
samples needed to achieve the desired level of certainty around the estimated
average. It is for either an unstratified sample or for an individual stratum
in a sample which has been stratified.
Note that the number of samples needed will be different for each material which has a different variation than another. In practice, the calculation is often done just for the particular material of greatest interest. Alternatively, it may be done for the each of several materials of equal importance, and then the average of the different numbers of samples for each of those materials is used.
You define the desired certainty in two places in Equation 3 first as
a confidence interval in the denominator; and second as the confidence
level that is used in determining the tvalue in that equation.
There are two wrinkles when using this formula. One is the particular
units in which you define the desired confidence interval.
The way many people use the term "confidence interval" is as a percent of the averages, e.g. ±10% of the average for paper, etc. When used in this way, if the average for a material were 5.0%, the correct answer would be within a range of 5.5 to 4.5 percent, i.e. 10% of 5.0% is 0.5% that is added to and subtracted from the 5.0% average.
However, were you to use the traditional form of this formula for sample
size, the numerator would show multiplication of the tvalue by
the standard deviation not by its close relative the coefficient of variation
that is shown in Equation 3. And, were the calculation performed with the
more traditional standard deviation, the units in which the computed number
of samples are expressed would be with reference to the amount of actual
uncertainty around the mean instead of as a percent of the average.
To illustrate, look again at the preceding page's example. If the standard
deviation were used in Equation 3, the confidence interval would be expressed
with the actual value of the band, 0.005, in the assumed case of a material
with an average of 5% ±10% (i.e. 0.05 x 0.10 = 0.005). If the coefficient
of variation were used, the confidence interval would be 10% of the average
(i.e. ^{.005}/_{.05}).
That is why Equation 3 uses the coefficient of variation instead of
the more commonly stated standard deviation namely to produce an answer
in the units that most people understand which is as a percent of the mean.
To recap the process to do this, the standard deviation is converted to
the coefficient of variation by dividing the standard deviation by its
average (see Equation 3 on p. 35).
A second complication is that the tvalue on the right side of
the equation is determined, in part, by the number of samples on
the left side of the equal sign. (Remember that the tvalue is a
function of the confidence level and the number of samples being
considered or more precisely the degrees of freedom.)
That is to say, the number of samples on the left of the equation, and
the tvalue on the right, both change at the same time in relation
to the other. To resolve this kind of problem, it is necessary to use a
process called iteration. If you don't, the formula can incorrectly
estimate the number of samples needed that will usually be less than what
is actually required to achieve the desired reliability.
Here, iteration involves successive calculations of Equation 3 in which
the coefficient of variation and the desired confidence interval remain
constant, while the tvalue changes with each computation. The degrees
of freedom used in determining the tvalue will depend on the number
of samples determined from the prior calculation.
The first time you calculate Equation 3, use as the tvalue the
number at the intersection of the column for the desired confidence level
and the bottom row for the degrees of freedom (see the Using
ttables sidebar on the preceding page for an explanation).
That bottom row is for any large sample, represented by the "" symbol,
and is used initially to provide the lowest possible number of samples
needed to provide a lower outer bound. Then solve for the number of samples.
The second time you calculate Equation 3, substitute for the tvalue
the number at the intersection of the same confidence level column and
unlike before the degrees of freedom for the number of samples computed
from the prior iteration, minus 1. Continue this process until the calculated
number of samples from successive iterations converges.
[example: ASSUMPTIONS. Let's use the same assumptions from the example for the standard deviation (see p. 33). The standard deviation was 1.14% and the average was 4.4%. Further assume that the desired confidence interval is ±10% of the average and the desired confidence level is 90% (although, later, you will make your own determination of desired certainty). CALCULATIONS. [a] Convert to Coefficient of Variation. To convert that actual value of the variation into a percent of its average, divide the standard deviation by the average, 4.4%. The standard deviation as a percent of its mean becomes 25.91%. [b] Confidence Interval. Even though the ±10% confidence interval is plus and minus the average, it is only necessary to use the positive percent in the equation because it will squared, and a negative number becomes positive when it is squared. [c] Iterate. The initial iteration would use a tvalue of 1.645 (from the intersection of the 90% confidence level column and the row). Plugging these numbers into Equation 3 produces an answer of 18.17 for the first interation
To
insure the desired reliability is achieved, round UP. The second iteration
would use a tvalue for a degrees of freedom from the previous calculation's
19 minus 1, or 18. That produces a sample size of 20.19. Round that up
to 21 and subtract 1, and find the tvalue for 20 degrees of freedom, which
is 1.721. Running Equation 3 again and the sample size for the third iteration
becomes 19.98. On the fourth iteration, Equation 3 produces 20.07. The
fifth iteration, in turn, is 19.98, and the sixth, 20.07. Thus, the process
converges on 20. In order to insure that the desired reliability is achieved
if the number included a fraction, you would have rounded UP.]
(ii) When Consolidating Strata/Seasons
If your study design uses different strata, it is possible to compute the required sample size for the strata consolidated together as a whole. This procedure can often result in the need for fewer samples for the whole than would be computed if it were not stratified. The formula for sample size from stratified samples uses advanced techniques called pooling. See Equation 4 which uses the example of a waste stream stratified into residential and commercial strata.
[note:
This formula only pertains if you allocate the number of samples needed
between strata optimally as opposed to proportionally or otherwise (as
is described in the following section with reference to Equation 5, see
p. 43]
[example: ASSUMPTIONS. You want to know the
number of samples needed to achieve a ±20% confidence interval (as
a percent of the mean) and confidence level of 90% for corrugated cardboard
in the consolidated residential and commercial strata of a waste stream.
Their characteristics are:
Average  Stnd. Dev.  Proprtn.  
Residential  1.5%  1.0%  40% 
Commercial  5.0%  6.0%  60% 
CALCULATIONS. [a] Weighted Coefficient of Variation. The numerator of the bracketed part of the equation, then, would be 0.987
[b] Confidence Interval. The denominator inside the brackets
would be 20% (representing the desired uncertainty band being used in this
example). [c] Iterate. As before, the tvalue for the initial iteration
would be 1.645. Thus, the first calculated sample size is 65.90. For the
second iteration, the degrees of freedom derived from the number of samples
in the prior iteration would be 65 (i.e. 661) if the number is rounded
UP for conservatism. The tvalue for a degrees of freedom of 65, which
is between the 60 and 120 rows shown on the table, would, again, be taken
from the row for 120 to be conservative, or 1.658. Solving for the second
iteration is 66.9, or, if rounded up, 67. The third iteration, obviously,
also is 66.9 because it repeats the same tvalue. Thus, the number of samples
needed is 67.]
If you have stratified the waste stream, such as between residential
and commercial waste, you will need to allocate the total number of samples
required among those groupings. For example, out of 50 total samples, 30
samples will be from the commercial and 20, the residential, strata. There
are two standard procedures to allocate samples to strata: [1] one
that optimizes reliability and [2] the other, proportionality, that
eases computations.
[1] Allocating samples to strata optimally
The optimal way to allocate a specified number of samples to strata
is to do so in a way that produces the narrowest uncertainty band around
the averages you estimate from the consolidated waste stream. This is also
the way to produce the desired confidence with the fewest samples.
Optimization is different from having the proportion of samples in each
strata mirror your estimates of their proportion in the population. Usually,
it will be more optimal to load up more samples in the strata with the
greatest variability. If both strata exhibit similar variation, however,
the extra effort for the calculation will not be necessary.
Equation 5 shows how optimization can be done when you are deciding
how many of the total samples needed should come, for example, from the
residential relative to the commercial strata. The allocation will be different
for each material, so it will be simplest to use this procedure for just
the key material.
There is an important lesson from this that runs counter to common practice.
The optimization principle means that it will usually be more efficient
to take a greater number of samples from the commercial than from the residential
sector instead of the other way around. Assuming that the commercial strata
has greater variability than residential, more commercial samples will
more likely provide equivalent reliability when combining the two strata
with fewer total number of samples.
[2] Allocating samples to strata proportionately
Another way to allocate the total samples taken by strata is to do so
in the same proportion that the strata bears to the total waste stream,
per Equation 6. In this example, it is again illustrated as the residential
strata.
To insure attaining the desired reliability, you will need to round UP and take 17 residential and 34 commercial samples. If retaining the original number of samples is more important, round to the closer number.]
This procedure will make it unnecessary to use weights later when calculating
the statistics, because the weights will be implicit in the number of samples
taken. However, unless the strata have similar variability, the optimization
approach will require less sampling to be done.
Once
the number of samples needed has been determined, you'll need to know how
many pounds each sample should be. The general rule is that samples should
be between 200300 pounds. Less than that size and the variability between
samples increases to the point where you'll need to increase the number
of samples to maintain the same level of reliability. More than 200  300
pounds and the gains in improved reliability fall off rapidly and are not
worth the additional cost of sorting through the larger quantity of trash.
After determining how many samples are needed, the next job is to insure
that the samples you pick are randomly selected. Randomization of sample
selection, remember, is essential to make the maze of statistical calculations
work.
Waste sorts are typically done where [1] the material is disposed
of at the landfill (or incinerator) a waste destination study or
[2] set out by the household at the curb for collection a waste
source study.
[1] Waste destination studies
Waste sorts are typically done at the disposal site or transfer station,
the socalled waste destination study, and, here, random sampling
becomes a twostage operation, namely: [a] choosing the trucks from
which to select samples and [b] choosing the samples picked from
within the trucks.
[a] Trucks
The first task is to randomly select the trucks, from which, in the
next step, you'll pull your actual samples. Your study design should have
previously established whether the trucks are to be picked from strata
(see p. 12) and from which hours or days they should be selected
(see p. 19).
To do all this, follow these three steps
Total
Number of Trucks. Estimate the total number of trucks in each strata (e.g.
residential and commercial loads) and days on which sorting is scheduled
(e.g. Monday, Tuesday...Friday). This can be done by interviewing the haulers
in your area, if they are known, or by reviewing customer records at the
disposal facility or observing the names stenciled on incoming trucks at
the disposal facility's weigh station. To insure having enough trucks,
UNDER estimate the number of trucks you expect to guard against the possibility
of having too few trucks show up on the day you sort.
Needed
Number of Trucks. Divide the total number of samples for each strata,
as well as for each day's sorting, by the maximum number of samples that
will be pulled from each truck (see p. 27 for the procedure to determine
total sample size and p. 42 for the procedure to allocate the total number
of samples by strata). Typically 2  4 samples are pulled from each truck.
As a rule of thumb for selecting between 2  4 samples per truck: if the
composition of loads varies greatly between trucks, tend to the lower number
per truck. If sorting cost is a paramount factor, then tend to the higher
number. Round UP if the resulting number of trucks needed includes a fraction
to insure that you have enough trucks. In that case, fewer than the maximum
number of samples will be pulled from the one of the trucks.
Generate
Random Numbers. Use the procedure for random number selection described
in the sidebar on page 52 to generate a list of random numbers for each
strata and sorting day. The random numbers can then be used to count off
trucks as they arrive at the disposal site, with those matching the random
numbers being selected. Alternatively, the random numbers can be used to
preselect from the list of trucks scheduled to unload on each sorting
day. But to guard against last minute changes, you must check on the day
of the sort to make sure no changes have been made in the haulers' schedule.
[note: Tearout worksheets to use in the field
to randomly select trucks are included in the appendix. See p. 121.]
[example: [1] Total number of trucks.
ASSUMPTION. If you will be sorting from loads on Monday, Tuesday and Wednesday
from the residential and commercial sector, your review of landfill records
might show something like the following estimates of the minimum number
of trucks expected each day
Total Trucks  
Mon.  Tues.  Wed.  
Residential  43  31  39 
Commercial  58  43  45 
[2] Number of trucks needed. ASSUMPTION. The
total number of residential samples needed for this season's sort is 20
and, commercial, 30, and the maximum number of samples per truck is 2.
CALCULATION. The 20 residential samples will be pulled two from each truck,
or from 10 trucks total. Dividing the 10 needed residential trucks (and
15 commercial) by the three days of the sort for the sort would look like
this
Number of Trucks Needed  
Mon.  Tues.  Wed.  
Residential  3  3  4* 
Commercial  5  5  5 
* When the number of samples does not divide equally
into the number of days, allocate the extra samples to meet logistical
needs. Here it was done on the latter days when your sorters will have
become more experienced and can sort faster.
[3] Random number generation. Here is an abbreviated
section from the full random number table in the appendix (see p. 99) to
use to randomly select, as an example, the 3 trucks needed from the residential
strata on Monday
83483460  87865509  86011066  71703342  69967095 
35432442  83188631  51383215  62917750  46335727 
29055222  98578894  22901878  64718692  91965927 
There are two common procedures for random selection: random number generation and the "N^{th}" number method. Both begin with the random selection of the first number from the table by closing your eyes and jabbing at the chart.
random selection of first number. Assume you randomly picked the sixth number "98578894". The last two digits (because there are two digits in the number 43 trucks) are "94". Because this is above 43 reject it and go on. Reading down, move on to the next number, "86011066", whose last two digits, "66", also do not qualify because they are too large. Finally, the last two digits of the next number, "51383215", or "15", qualifies.
random number generation. If you are using the "random number generator" approach, record #15 as the first truck to be selected, and proceed down. The next one is "42" which also fits, but the following three do not. The two after that qualify, but they are both the same number "27". Knock out the duplicate. That provides you with the three randomly selected numbers needed from the residential strata on Monday. Ordered sequentially, they become 15, 22, 27. On Monday, you pull over the 15^{th}, 22^{nd and} 27^{th} residential trucks that arrive at the landfill.
N^{th} method. Simply
divide the 43 trucks expected to arrive on Monday by the 3 trucks needed,
which is 14.3, or, rounded down, 14. Thus, pick that same 15^{th}
truck to begin numbering, add 14 to get the second, or 29, add 14 again
to get the third, 43, at which point you would have the three needed.]
[b] Samples from within trucks
Only 2001,000 out of upwards of 20,000 pounds on a packer truck will
be needed for sampling. It is extremely important to insure that each part
of the load on the truck has an equal chance of being selected as a sample.
For one thing, loads inside a truck stratify. Heavier materials and fines
fall to the bottom and sometimes to the back. For another, most people
would subconsciously be less eager to pull samples out of the load that
are foul smelling.
Two approaches are typically used to prevent such biases: (i)
the coning and quartering and (ii) the grid and
pull approaches. The grid approach uses strict random procedures and,
for that reason, is usually considered superior to the cone and quarter
approach which does not.
This chapter will describe how to designate the parts of the waste stream
to pull and sort for samples: the next chapter describes how to go about
physically pulling them out (see p. 56).
(i) Coning and Quartering
The truck load of waste is tipped onto the floor and shoved into a pile
by an end loader. A quarter of the pile is removed like a slice out of
a pie, and, in turn, shoved into a pile and quartered again...until the
remaining pile is the size desired for a sample usually 200300 pounds
(see p. 45). "Eyeballing" is usually used to select quarters instead
of strict randomization. Another variant of this approach is to then thoroughly
shake and mix the 200300 pound sample in a tarp that is pinned corner
to corner. Under the assumption that the variation among different parts
of that sample have been eliminated by the mixing, any smaller 5060 pound
subset of the original 200300 pound sample is sometimes used.
(i) Grid and Pull
A truck driver begins to dump, and then drives his vehicle forward in
lurching movements as the load is pushed onto the tipping floor with the
truck's pushout plate. This spreads the load as nearly as possible into
the shape of a "bread loaf" about one yard in height and, about 10 feet
wide by 40 feet long. An end loader may be necessary to insure that the
height is leavened at 36"40". Then, use either stakes and twine or visually
divide the bread loaf into approximately 2550 "cells", 3 feet by 3 feet
or, with a 3 foot depth, approximately one cubicyard in size which should
weigh 200300 pounds that is the desired weight (see p. 45). Next,
sequentially number each cell's location on a sheet of paper on which you
map the loaf. Stratification of the load by weight can cause the cells
at the edge to have different characteristics than at the center. In order
to ensure that cells in the center have an equal chance of selection as
cells on the edge, use the random selection procedure in the sidebar on
the next page.
When
you attempt to implement random selection in the real world on the tipping
floor, often some item of waste, such as a 2" x 4" piece of lumber, may
be partially in and partially out of the targeted cell or of the quarter
of the cone. Does the sorter scoop it in or throw it out of the sample?
The best sampling technique that has been developed to cope with this,
and still keep the samples random so that statistics will work, is to have
a systematic and unbiased response.
For coning and quartering, decide by random choice at the outset, such
as by the flip of a coin, that overhanging items on the right (or the left)
side will be "in" and the other side "out" of the sample. For grids, designate
two of the four faces of selected cells, such as north and east, as the
"in" sides and the others will be the "outs".
Yet another real world complication is fondly referred to as the "dead
horse" problem, namely an item that is wildly larger than any of the other
items. In waste sorts, it might be a large sleeper sofa. Leave in such
a large, rarely occurring item, and it will warp your averages, so something
needs to be done. There are punctilious statisticians who violently disagree,
but the simplest way to handle this problem is to arbitrarily kick the
dead horse out of your sample. Otherwise, your sample results will probably
be distorted and require all kinds of convoluted massaging for "outliers"
to make any sense.
[note: Tearout worksheets to use in the field
to randomly select samples are included in the appendix. See p. 121.]
[2] Waste source studies
In waste source studies samples are directly selected and collected at the source in the household or business. To insure that samples are taken from all types of waste generators in proportion to their share of the whole requires fancy stratification and, after selection has been determined, substantial expense to collect. For this reason, waste source studies are rarely done. However, it is often the only good way to get waste composition analysis specific for different types of socioeconomic groupings or specific types of generators, and to access the materials in the stream before they have been crushed in the packer truck.