October 31, 2012

Coalition offers DATA suggestions, prepares for lame-duck push

Our Coalition cheered last month when Sens. Mark Warner (D-VA) and Rob Portman (R-OH) re-introduced the Digital Accountability and Transparency Act (DATA Act). Nonprofit transparency advocates and the Professional Services Council, which represents federal contractors, also lauded the new bill. Though less ambitious than the version passed unanimously by the House of Representatives last spring, the DATA Act, if it becomes law, should push the executive branch toward publishing all spending data online and standardizing it so that it becomes electronically searchable.

Warner and Portman's new version, S. 3600, could be considered by the Senate Homeland Security and Governmental Affairs Committee (HSGAC) next month in the "lame duck" legislative session that follows this Tuesday's elections. If that happens, we'll be ready to show our support.

In the meantime, the Coalition is recommending a few key changes to improve and clarify the bill. You'll find the full set of recommendations embedded below.

The document is a double redline. The underlying text shows existing law. The first set of edits (red) shows how the current Warner/Portman bill would change existing law. The second set (blue) adds the Coalition's recommendations.

Naturally, our recommendations are in legalese. Here's our best shot at a summary in English.
  • Clarify the data standards requirements - and make sure that the standards are fully imposed on USASpending.gov's data.  The central idea of the DATA Act is that all federal spending information should be published and standardized. Until now, it has been nobody's job to establish consistent data standards for the diverse reports, databases, and websites that contain spending details. The Warner/Portman bill adds new paragraph 2(e)(2) to the Federal Funding Accountability and Transparency Act of 2006 (FFATA), which assigns the crucial standardizing task to the Department of the Treasury. We've recommended two changes to make sure the Warner/Portman bill succeeds. First, we've recommended a revision of that new paragraph. The revision clarifies what these standards will look like: common identifiers, such as the standard award ID that the White House has already told Congress it's working on; and common data interchange formats like XML and XBRL. These standards, as much as possible, should be nonproprietary - the government should not be paying anyone for their use. Second, we've recommended language - paragraph 2(b)(5) and clause 2(e)(2)(E)(ii) (can you believe that reference?) - that will make sure that the standards are fully imposed on the information being published on USASpending.gov, the federal spending transparency website that already exists (and was first mandated six years ago by FFATA). The lack of standardization is one reason why USASpending.gov's accuracy has been abysmal. Once standard identifiers and interchange formats are applied to USASpending.gov's data, it will be easier to check that data against other compilations to ensure accuracy.
  • Treasury and OMB should get public input as they implement this law. We've recommended a requirement for Treasury to seek input from standards experts and from the organizations that want to scrutinize federal spending data before it sets the standards (new subparagraph 2(e)(2)(D)). And we've recommended a requirement for the White House to set up an advisory committee, subject to the Federal Advisory Committee Act, to provide public advice on whether the goals of the DATA Act are being achieved (new subsection 2(i)).
  • The Recovery Board has already built the infrastructure. Let's use it!  The Recovery Accountability and Transparency Board, which Congress put in charge of spending and tracking stimulus money, proved that publishing and standardizing the government's spending information can stop waste and fraud. The Recovery Board is winding down: under the stimulus law, it will expire on September 30, 2013. It has built the best accountability infrastructure in the government. For instance, according to the Recovery Board's most recent monthly status report, its Recovery Operations Center has saved taxpayers $20 million in funds recovered from questionable recipients and $30 million by preventing grants and contracts from being paid in the first place. The government shouldn't let these systems - which are already up and running! - and expertise go to waste. So we've recommended provisions that will transfer the Recovery Board's assets, contracts, key staff, and any remaining funding to the Treasury Department (last two pages).
  • Clarify existing subaward reporting requirements. FFATA already requires grantees and contractors that award subgrants and subcontracts to report on those subawards. We've recommended additional language for paragraph 2(d)(2) of FFATA to ensure that (a) Treasury's new data standards will be applied to these reporting requirements and (b) as much as possible, subaward reporting systems will prepopulate data to automate the reporting burden.
  • Geospatial analysis, bulk download capability, and more. Throughout the redline below you'll find tweaks designed to ensure that the spending data being published on USASpending.gov is as broadly useful and accessible as possible.
Thanks to the support of our members and vocal allies such as the American Institute of CPAs, the Sunlight Foundation, OMB Watch, and the Project on Government Oversight, there's a real possibility that the DATA Act might pass this year. The U.S. government is closer than it has ever been to publishing comprehensive and standardized spending data.

October 10, 2012

TechAmerica's Big Data Commission: where's the leadership?

Last week the TechAmerica Foundation, the 501(c)(3) arm of TechAmerica, Washington's largest tech industry trade foundation, released a report entitled Demystifying Big Data: A Practical Guide to Transforming the Business of Government. TechAmerica is billing the report as "a comprehensive roadmap to using Big Data to better serve the American people."

The report defines the Big Data phenomenon. It describes some of the technologies that can be used to derive insights from high-volume, high-velocity, and high-variety government data assets. But it doesn't provide leadership where leadership is needed.

Crucially, Demystifying Big Data ignores the pressing need for the U.S. government to adopt standardized models, interchange formats, and identifiers throughout its data portfolio. In fact, it contains no discussion of data architecture at all. Instead, it advises federal agencies to "start with a specific and narrowly defined business or mission requirement, versus a plan to deploy a new and universal technical platform to support perceived future requirements" (p. 7). It also recommends a data "notional flow" - "Understand, Cleanse, Transform and Exploit" - that presumes a future without standardized reporting (p. 26).

Moreover, Demystifying Big Data recommends a five-step project process - "Define, Assess, Plan, Execute, Review" (p. 29) that keeps data's uses tightly bound to original designs. In reality, the promise of Big Data lies in the fact that vast, fast, varied data assets are becoming accessible for analyses for which they weren't originally designed.

To be sure, neither the Data Transparency Coalition nor anyone else recommends a universal, utopian architecture for government data. But all data analysis relies on structure - whether imposed ad-hoc at the time of the analysis or pre-existing. And government operations, in the U.S. as elsewhere, are rife with concepts, forms, and relationships that could be structured and should be standardized but aren't. Treasury and OMB use incompatible means of identifying federal agencies. Dozens of regulators use separate, non-interoperable codes to for regulated entities, contracts, people, locations, and events. Of the SEC's six hundred reporting forms, two have been partially converted into XBRL and three into XML; the rest are untagged text. The government's failure to adopt consistent vocabularies for regulatory text results in needless ambiguity. The list goes on. Even where standard identifiers and formats have been imposed, they frequently are not built on any underlying data model - which means future changes will be unnecessarily traumatic.

The government's adoption of standardized identifiers and formats, supported by common data models, would allow many Big Data projects to skip right over TechAmerica's "Understand" and "Cleanse" steps, or, at any rate, dramatically reduce the time and money those steps require. Standardization - not universal, but incremental! - would vastly improve the U.S. government's Big Data capabilities. And it's already happening: Treasury's Office of Financial Research is working to implement one standard identifier for regulated entities, the White House has promised to pursue another for federal awards, and Congress is poised to require a data architecture for federal spending that could incorporate both. Our Coalition supports data standardization, in federal spending and in other areas, in part because it'll unleash the power of Big Data.

The high-tech industry must take the lead in advocating government data standardization, both because it has a civic responsibility and because it has a business opportunity. Unfortunately, Demystifying Big Data doesn't provide that leadership.
Welcome to the official mouthpiece of the Data Transparency Coalition.