The Illinois Department of Transportation (IDOT) purchased access to precise geolocation data about over 40% of the state’s population from Safegraph, the controversial data broker recently banned from Google’s app store. The details of this transaction are described in publicly-available documents obtained by EFF.
In an agreement signed in January 2019, IDOT paid $49,500 for access to two years’ worth of raw location data. The dataset consisted of over 50 million “pings” per day from over 5 million monthly-active users. Each data point contained precise latitude and longitude, a timestamp, a device type, and a so-called “anonymized” device identifier.
Taken together, these data points can easily be used to trace the precise movements of millions of identifiable people. Although Safegraph claimed its device identifiers were “anonymized,” in practice, location data traces are trivially easy to link to real-world identities.
In a response to a public records request, IDOT said that it did not store or process the data directly; instead, it hired contracting firm Resource Systems Group, Inc (RSG) to analyze the data on its behalf. The contracts with RSG and Safegraph are part of a larger effort by IDOT to create a “statewide travel demand model.” IDOT intends to use this model to analyze trends in travel across the state and project future growth.
As smartphones have proliferated, governments around the country increasingly rely on granular location data derived from mobile apps. Federal law enforcement, military, and immigration agencies have garnered headlines for purchasing bulk phone app location data from companies like X-Mode and Venntel. But many other kinds of government agencies also patronize location data brokers, including the CDC, the Federal Highway Administration, and dozens of state and local transportation authorities.
Safegraph discloses that it acquires location data from smartphone apps, other data brokers, and government agencies, but not which ones. Since it’s extremely difficult to determine which mobile applications transmit data to particular data brokers (and often impossible to know which data brokers sell data to each other), it is highly likely that the vast majority of users whom Safegraph tracks are unaware of their inclusion in its dataset.
“It is a lot of data”
IDOT filed an initial Procurement Justification seeking raw location data from smart phone apps in 2018. In the request, IDOT laid out the characteristics of the dataset it intended to buy. The agency specifically requested “disaggregate[d] (device-specific)” data from within Illinois and a “50 mile buffer of the state.” It wanted more than 1.3 million monthly active users, or at least 10% of the state’s population, with an average of 125 location pings per day from each user. IDOT also requested that the GPS pings be accurate to within 10 meters on average.
Safegraph’s dataset generally exceeded IDOT’s requirements. IDOT wanted to monitor at least 10% of the state’s population, and Safegraph offered 42%. Also, while IDOT only requested one month’s worth of data for $50,000, Safegraph offered two years of data for the same price: one year of historical data, plus one year of new data “updated at a regular cadence.” As a result, IDOT received received precise location traces for more than 5 million people, for two years, for less than a penny per person. On the other hand, Safegraph was only able to provide an average of 56 pings per day, less than the requested 125. But as the company assured the agency, that still represented over 50 million data points per day—to quote the agreement, “It is a lot of data.”
Who is Safegraph?
Safegraph is led by Auren Hoffman, a veteran of the data broker industry. In 2006, he founded Rapleaf, a controversial company that aimed to quantify the reputation of users on platforms like Ebay by linking their online and offline activity into a single profile. Over time, Rapleaf evolved into a more traditional data broker. It was later acquired by TowerData, a company that sold behavioral and demographic data tied to email addresses. In 2012, Hoffman left to run Rapleaf spinoff Liveramp, an “identity resolution” and marketing data company that was bought by data broker titan Acxiom in 2014. In 2016, Hoffman departed Acxiom to found Safegraph.
Early on, Safegraph sold bulk access to raw geolocation data through its “Movement Panel” product. It collected data via third-party code embedded directly in apps, as well as from the “bidstream.” Gathering bidstream data is a controversial practice that involves harvesting personal information from billions of “bid requests” broadcast by ad networks during real-time bidding.
In 2019, Safegraph spun off a sister brand, Veraset. Since then, Safegraph has tried to present a marginally more privacy-conscious image on its own website: the company’s “products” page mainly lists services that aggregate data about places, not individual devices. Safegraph says it acquires much of its location data from Veraset, thus delegating the distasteful task of actually collecting the data to its smaller sibling. (The exact nature of the relationship between Safegraph and Veraset is unclear.)
Meanwhile, Veraset appears to have inherited the main portion of Safegraph’s raw data-selling business, including the “Movement Data” product that IDOT purchased. Veraset sells bulk, precise location data about individual devices to governments, hedge funds, real-estate investors, advertisers, other data brokers, and more. On the data broker clearinghouse Datarade, Veraset boasts that it has “the largest, deepest, and most broadly available movement dataset” for the United States. It also offers samples of precise GPS traces tied to advertising IDs. Neither Safegraph nor Veraset disclose the sources of their data beyond vague categories like “mobile applications” and “data compilers”.
One of many IDOT data relationships
IDOT’s purchase from Safegraph was part of a larger project by the agency to model individuals’ transportation patterns. IDOT also worked with HERE Data LLC, another location data broker, and Replica, the company spun off of Google’s Sidewalk Labs. According to IDOT, HERE acquires location data primarily from vehicle navigation services. HERE is owned by a consortium of automakers including BMW, Volkswagen, and Mercedes, and gathers data from connected vehicles under those brands. Replica has been cagey about its data sources, but reports using “mobile location data” as well as “private” sources for real estate and credit transactions.
As noted above, IDOT did not process the data directly. Instead, it shared the raw data with RSG, which was tasked with deriving useful insights for the transportation agency. A memo from RSG to IDOT, dated June 19, 2018, specifically requested that IDOT purchase bulk location data gathered from smartphone apps for RSG to analyze. RSG is a prolific consultant in transportation planning. Its website claims it has worked with “most” major transportation agencies in the U.S. and lists the Federal Highway Administration, the U.S. Department of Transportation, the NY Metropolitan Transportation Authority, the Florida Department of Transportation, and many others as clients.
A Toxic Pipeline
It is no comfort that IDOT did not acquire or process the raw data itself. Its payment to Safegraph normalizes and props up the dangerous market for phone app location data—harvested from millions of Illinois residents who never seriously considered that this sensitive data about them was being collected, aggregated, and shared.
This particular brand of data-sharing is a growing trend around the country. Data brokers vacuum granular locational data from users’ phones with no accountability, and state and local governments help them monetize it. In some cases, agencies mandate that tech companies share traffic data, as in the case of ride-sharing. In the last decade, this toxic pipeline has aligned government interests with data brokers’, and makes it less likely that those same governments will pass laws that crack down on the corporate exploitation of personal data.
Federal laws (like the Fourth Amendment) and state laws (like California’s Electronic Communications Privacy Act) prevent governments from seizing sensitive personal information from personal devices or companies without a warrant. But many government agencies claim that no laws restrict them from purchasing that same data on the open market. We disagree: laws that protect our data privacy from government surveillance have no such “bill me later” exception from the warrant requirement. We expect courts will reject this governmental overreach (unless police evade judicial review by means of evidence laundering). In the meantime, we support legislation to ban such purchases, including the Fourth Amendment Is Not For Sale Act. We also urge app stores to kick out apps that harvest users’ location data—just as Google kicked out Safegraph.
When data flows from a broker to a government transportation agency, this greatly increases the likelihood of further data flow to law enforcement or immigration agencies. This sort of precise, identifiable location data needs far stronger protections at every level—whether in the hands of governments or private entities. But at the moment, third-party aggregators can and do sell their data to government agencies with near-zero accountability.
IDOT and SafeGraph might argue that the agency is just obtaining traffic patterns. But the data used for these traffic patterns sheds light on all sorts of private activity—from attendance at a protest and trips to hospitals or churches to where you eat lunch and with whom. Even if it’s done for supposedly innocuous ends, the acquisition of large quantities of granular location data about people is too dangerous.
Agencies tempted to use big data about real people should acquire the minimum information necessary to accomplish their goals. Governments must demand detailed information on the provenance of any personal data that they handle, and refuse to do business with companies like Safegraph that buy, sell, or aggregate sensitive phone app location data from users who have not provided real consent to its collection. The interlocking industries of ad tech and data brokers are responsible for rampant privacy harms, and civic governments must not “green wash” these harms in the name of energy efficiency or transportation planning. As a society, we need safeguards in place to ensure that partnerships between tech and government do not cost us more than we gain.