Precision is not accuracy.
We often use the words precision and accuracy interchangeably in everyday conversation, but in statistics they mean different things. To understand the difference and why it matters imagine a researcher’s sampling method as archery equipment and the phenomenon to be studied is the target. The researcher steps up to the firing line, pulls back the bow string, and fires her arrows. After she runs out of arrows, she walks down to the target to see how well she’s done (i.e., to collect her results). There she finds all of her arrows have stuck into the target in a tightly packed clump on the target’s surface but outside the rings of the bull’s eye. Her shooting is very precise — all the arrows have landed very close to one another — but also inaccurate — none of them have hit within the rings of the target.
On September 15, 2016 the Basel Action Network (BAN) released the second of its public reports from its e-Trash Transparency project. The project uses Global Positioning System (GPS) trackers hidden in discarded printers and monitors to follow their movement to end points. In the most recent report, 69 of 205 trackers (almost 34%) were found to be exported from the US to various points outside it, mostly in Hong Kong (see BAN’s report “Scam Recycling” Table 1, p. 18).
The methods deployed in the latest BAN reports are analogous to the above description of the archer. Without methods that lead to both precision and accuracy, results that do not reflect general patterns in the underlying phenomenon being studied can easily be obtained. The issue I am drawing attention to here is not a dispute about what individual GPS tracks divulged in the BAN report or its companion website show us. Those tracks with coordinate data associated with them provide us with very precise point data (at sub-meter precision). But, statistically speaking, precision is not accuracy.
Media reportage on BAN’s GPS work has, for the most part, taken the results at face value as being both precise and accurate depictions of international flows of e-waste. Some critics have pointed out that the 205 individual trackers sent out is too small a number to be able to draw general conclusions about the results (indeed, BAN’s own reports make similar caveats about the sample size). Sample size is one reason to think carefully about patterns apparent in the data, but far more important is the absence of a sufficient sampling framework that would account for a host of variables influencing the results. In the absence of a sufficient sampling design we cannot know if the results obtained are both precise and an accurate depiction of general patterns. The net result is the generation of fundamental indeterminacy about what the findings obtained from the GPS data mean — BAN used to claim a 50-80% export rate (see BAN’s Exporting Harm report of 2002); does the new figure of 34% mean the original 50-80% figure was wrong? Or, has the US recycling industry export rate substantially declined over time? Or does something else altogether explain the new export rate found with GPS trackers? What is the relative importance of exports arriving overseas from the US versus domestic generation of discarded electronics in the countries where the trackers wound up? In the absence of an appropriate sampling framework, it’s impossible to have answers to any of these questions using the GPS approach.
The Scam Recycling report states that delivery locations for the tracking devices “were selected from state e-waste program listings […] and from Google search results from the phrase ‘computer recycling [city of deployment]'” (Scam Recycling, p. 92). However, no information is provided to indicate whether some systematic form of sampling then occurred (e.g., random, stratified, or other) to select facilities for deployment of trackers. Given the characteristics of the underlying phenomenon it would have been crucial to account for a wide range of variables that potentially affect results (e.g., geographic variation in drop off locales in jurisdictions with/without mandated take-back legislation; variety and proportion of device types entering recycling infrastructure; differing characteristics of the certified/non-certified recycling infrastructure).
BAN’s report repeatedly emphasizes that a certification system called R2 that competes with BAN’s own e-Stewards system was found to have a higher percentage of facilities exporting tracked equipment. Yet, the report talks about these higher percentages as if the two underlying certification systems share such close similarities that they can be treated as identical from a sampling point of view. For a variety of reasons, this is not a valid assumption.
First, R2 and e-Stewards have irreconcilable approaches to repair. E-Stewards demands full functionality before any export occurs. The R2 certification has a provision called “R2 Ready for Repair” which permits export without full functionality. This provision of R2 comes with important conditions including documenting the legality of the export (see “Reusable Equipment and Components” in the R2 Standard and R2 Guidance). For BAN this is a fundamental design flaw built into to the R2 certification (see BAN’s “R-2 Non Compliance By Design“).
Second, the R2 system has a much larger number of certified facilities outside of the US than does the e-Stewards program. According R2’s “Find a Recycler” webpage, the program has 18 certified facilities in Hong Kong alone. 13 in Mexico. Another 9 R2 facilities in Japan, 1 in Taiwan, 1 each in Thailand and Vietnam, 3 in Singapore, 1 in Malaysia, 4 in India and 1 in Dubai. E-Stewards has both far fewer certified facilities outside the US and has no certified facilities in any of the aforementioned countries except for 4 in Mexico and 1 Singapore (according to their “Find a Recycler” webpage).
R2 permits export for repair without full functionality and has many more facilities outside the US than does the e-Stewards program. Together, these factors mean the probability of export is unequal between the two systems. In statistical terms, the two systems do not share equal probabilities of being included in the results. To be able to draw general conclusions, those differences in the underlying probabilities have to be accounted for but BAN’s deployment method does not do so. In the absence of an appropriate sampling framework, the differing characteristics of the two certification systems makes it no surprise that BAN’s tracker exercise found a higher rate of export for R2 infrastructure than that for BAN’s competing e-Stewards program.
One may respond to the above points by arguing that many of the devices tracked through R2 infrastructure wound up at sites that are not certified recycling facilities. This is a difficult point to refute because neither of the BAN reports nor the accompanying tracking website fully disclose the tracking data used to support that argument. Pages 99-115 of the Scam Recycling report provide only partially complete coordinate data for the exported trackers and rely on textual descriptions of tracks to fill in the gaps. Meanwhile, the companion website generated by MIT’s Senseable City Lab does not fully disclose coordinate data for all tracks displayed on the site. A user of that map can zoom in on individual destination points, but not all of those points include tracker ID numbers. As a consequence, it is impossible to systematically associate what data that are disclosed in the BAN reports with the data disclosed on the Senseable City map. An easy fix to this situation would be to make all the GPS data and tracker ID numbers fully public (I’ve collected and mapped what coordinate data are available, see map below and here for full details including access to the data).
Big problems are ostensibly more important to solve than small ones. Unsurprisingly, BAN’s reports emphasize the massiveness of exports that it deems problematic. But even if we accept the upper range findings of the tracker study — 40 percent of trackers were exported — how big is this “potentially massive” problem (“Scam Recycling” p. 93)? The upper range estimates suggest as much as 489,840 US tons of e-waste being exported from the US (see “Disconnect” p. 9-10). That figure might sound like a lot when it is taken on its own. But consider this comparison: a single Mexico based copper smelter annually produced between 792,000 to over 970,000 US tons of sulfuric acid as a waste byproduct of its smelting process (for source of figures click on Mexico and look for “Read More” under the “La Caridad Processing Facilities”) . That’s almost double the weight of e-waste the e-Trash Transparency project claims is heading offshore from the entire US annually. Copper, of course, is a key input to the electronics industry.
Waste externalities generated upstream in raw material extraction and manufacturing vastly exceed what arises as post-consumer electronic discards destined for recycling. Does that mean we should be unconcerned about the fates of discarded consumer electronics? Of course not. It just means that if size matters, then we need to be paying much more attention to what happens before electronics are even made.
The state of understanding generated by the e-Trash Transparency reports is deeply unfortunate. By opting for a high precision technique in the absence of a sampling protocol that would ensure accuracy the new BAN reports raise fundamental questions about what we think we might know about global flows of discarded electronics. Yet, the situation could have been avoided altogether.
Useful results could have been obtained using a proper sampling design i.e., one designed to account for certified and non-certified facilities, tagged devices, and sent them through the systems. Once data were obtained they could be reported in aggregate with anonymized results. Researchers could go further and alert each individual certification body that a systematic sample of each system found actual and/or potential violations. The researchers could then have sat down with the certification bodies and said ‘we have data that shows material going from location X to location Y’. The certification bodies for e-Stewards and R2 could have each been given their own separate data sets thereby enabling those organizations to go off and investigate their own processors at their discretion. The proverbial ‘win-win’ could have been obtained: facilities in violation would get a chance to clean up their act or be de-certified (R2 by R2, e-Stewards by e-Stewards; publicly or not); and a useful data set from which aggregated/anonymized results could be published would also have been obtained. But this is not the situation that is emerging from the e-Trash Transparency project. Instead, the reports provide a commercially partisan push to one certification system, e-Stewards (which provides a direct financial benefit for “Domestic/International advocacy by BAN [to increase] demand for e-Stewards”), against that system’s main competitor, R2.
For further reading on the politics of precision versus accuracy, see: