Tuesday, April 30, 2013

Big Apple’s Big Data Beats Washington’s

NYC Opens Government Spending Checkbook to Improve Lives of New Yorkers; USASpending.gov Doesn’t Measure Up

As Washington works to expose the Big Data of Big Government with the reintroduction of the DATA Act, New York City has already solved the Big Data problem associated with its own spending.

Checkbook NYC is the cutting edge of government spending transparency, providing access to disbursement-level details, including the date and recipient of each transaction, in real time. In contrast, disbursement-level transparency is absent from the federal government’s spending transparency website, USASpending.gov, which summarizes each grant and contract but provides no details on transactions.

Updated daily, Checkbook NYC accounts for every dollar the city spends, from contract awards to vendor payments to payroll. USASpending.gov, meanwhile, includes grants and contracts but ignores all internal expenditures. A U.S. PIRG report analyzing city transparency websites found that it was easy to verify the accuracy and completeness of New York’s data by aggregating and comparing across the database (p34). The Sunlight Foundation has found that similar verifications cannot be performed on USASpending.gov, since it provides only a limited subset of federal spending data.

Checkbook NYC is useful for average citizens and sophisticated coders alike. The data is accessible and intuitively searchable for casual users via the website, which includes plenty of graphs and charts for quick visualizations. Every dataset is also easily downloadable in machine-readable formats so that more sophisticated tech start-ups, watchdog groups, and coders can build their own analysis tools. USASpending.gov does permit bulk downloads, but there are legal limits on how the data can be reused because a private company, D&B, owns the proprietary system of codes that are used to identify grantees and contractors.

New York launched Checkbook NYC after learning – the hard way – how opaque government spending is a breeding ground for waste, fraud, and abuse. CityTime, a massive contracting debacle, served as motivation for opening contract spending data to public oversight. In 1998, the city contracted to improve the payroll system at an expected cost of $63 million over a term of 5 years. Fraud permeated the contract for years as costs ballooned without investigation. More than ten years later and well over $600 million beyond budget, CityTime was still incomplete when the contract was finally terminated. On the federal level, executive-branch overspending motivated the House of Representatives to pass the DATA Act in 2012, but the White House opposed the proposal and it died in the Senate.

Checkbook NYC can't get the wasted CityTime money back, but it can protect New Yorkers from similar corruption in the future. Not only is each and every payment on each contract publicly available, but Checkbook NYC also keeps a running list of master agreement and contract modifications. This means New Yorkers can pinpoint exactly when and by how much a contract goes over budget. As Checkbook NYC continues to develop, subcontractor data will be incorporated into the database and the text of each contract is expected to be digitally published as well.

Once fully operational, Checkbook NYC will place its background code in the public domain to encourage other state and municipal governments to use and build upon the platform. As NYC Comptroller John Liu put it, "We do not view sharing software as a selfless act; on the contrary, distributing code as 'open source' is a cost-effective way to identify and fix bugs and to leverage new features added by other developers" (p35). 

New York City's efforts to publish spending data provide both a roadmap and a yardstick for federal lawmakers. Full transparency for a government’s spending requires disbursement-level disclosure, inclusion of internal expenditures, and unfettered reusability. Checkbook NYC delivers all three. USASpending.gov delivers none.

We hope the DATA Act, which awaits reintroduction in the 113th Congress, will help Washington measure up to New York.

Tuesday, April 16, 2013

Duplication in Government Programs: Hidden in Opaque Federal Spending Data


On Tuesday, April 9th, the Government Accountability Office (GAO) released its third annual report on duplicative government spending, Actions Needed to Reduce Fragmentation, Overlap, and Duplication and Achieve Other Financial Benefits. The GAO is required by law to conduct routine investigations of duplicative federal activity and report annually to Congress with recommendations for consolidation or elimination. The mandate was inserted by Senator Coburn as an amendment to an increase in the debt limit in 2010.

To comply, the GAO conducted "a systematic and practical examination across the federal government to provide reasonable coverage for areas of potential fragmentation, overlap and duplication government-wide" over three years (p233-234). The 2013 report identified 31 new areas of duplication, fragmentation, overlap, or potential financial benefits, with 81 suggested actions to address the issues identified. In total, the GAO has identified over 160 areas with more than 300 specific actions where the government can increase efficiency or effectiveness. If fully implemented, the GAO estimates savings of tens of billions of dollars annually.

The amount of estimated savings sounds impressive, but the GAO’s method of identifying the savings certainly wasn't.

The GAO started by identifying which agencies obligated more than $10 million for budget functions and sub-functions in FY2010, using the Office of Management and Budget's (OMB) MAX information system. The MAX information system is a nonpublic budget wiki, for federal employees. Watchdog groups and citizens do not have access to the same budget information and cannot conduct similar investigations.

Next, the GAO analyzed documents – strategic plans, performance and accountability reports, budget justifications, and independent reports from authorities like the CBO, inspectors general, and the CRS – to find and isolate areas of possible fragmentation, overlap, or duplication.  In other words, the GAO didn’t look at program-level spending itself – just documents summarizing the programs and the spending.

By its own explanation, the GAO takes this approach to the examination because "it is not practical to examine every instance of potential duplication or opportunities for cost savings across the federal government" (p234). The GAO didn’t look at a list of programs with amounts spent – because none exists! The federal government has no master list of programs – a 2010 law requires OMB to create one, and the first list is still in production (p21-p23)  – and even if it did, there is no way to connect such a list with the amount of money being spent on each program. So, foraging through the sea of government documents is the only method of review available to the GAO.

It took the GAO three years to identify the federal catfish inspection programs in triplicate, inefficiency of fragmented military uniform procurement, and overlap in almost 80% of drug treatment programs. The public couldn’t have done this analysis at all, because the public doesn’t have access to data on agencies’ budget actions. Unless the government publishes all its spending data, and electronically matches programs to spending, we’ll never get a handle on government waste. Fortunately, that’s what data transparency is all about – publish everything, and use consistent data identifiers and formats to connect related information.

The bipartisan DATA Act, which passed the House, but not the Senate, last year, would require exactly these actions for federal spending. Under the DATA Act, the federal government would publish all executive branch spending data, including the budget actions that are currently hidden from the public, and develop the data standards necessary to match programs to dollars. During a House Committee on Oversight and Government Reform hearing on the duplicative spending report, GAO Comptroller Gene Dodaro expressed as much support as an unbiased auditor can. Dodaro said, "I think that transparency is needed. I think there needs to be a statutory underpinning so it's enduring over time and there is consistency. Data standards need to be put in place. I'm very supportive of the need to have that type of legislation that would require that level of transparency and from that transparency can lead to better questions, can lead to better oversight, and hopefully better results."



Under the DATA Act, a complete review of government expenditures could be conducted in real time. Citizens could access the information and conduct independent investigations. The GAO could electronically identify government fragmentation, overlap, and duplication – using the source data, not the summaries-of-summaries it had to use this year. Identifying waste is the first step to trimming a bloated federal budget. The DATA Act provides the transparency the GAO and citizens need to do that.

Monday, April 8, 2013

States Find Savings in Spending Transparency; Federal Government Lags Behind


According to Following the Money 2013, a report by U.S. PIRG, the Federation of State Public Interest Research Groups, state governments are publishing “checkbook-level” spending data on transparency websites and realizing significant cost savings from making this detailed financial information available. When governments publish their spending information at the checkbook level, "users can view the payments made to individual companies and details about the goods or services purchased."


PIRG’s report points out that these actions are bringing benefits beyond transparency itself. States are also realizing significant financial savings by publishing their transactions at the checkbook level. Follow the Money provides an excellent breakdown of some of the savings states have achieved after the launch of transparency websites. Texas renegotiated contracts with prison food vendors and copy machine suppliers for tens of millions of dollars of savings. Texas also estimates an additional $4.8 million in savings by identifying areas for greater administrative efficiency. After Utah's website revealed nearly $300,000.00 spent on bottled water, the state reduced water bottle expenditures by more than 70%. A reporter in South Dakota used that state’s transparency website to launch an investigation into subsidies that led to annual savings of about $19 million through eliminating redundancies.  

The most obvious savings came from reductions in costly information requests. Massachusetts saved $3 million in paper, printing, and postage related to information requests and vendor paperwork. Mississippi estimates it saves $750 in staff time for every information request satisfied by the transparency website rather than by a state employee. After South Carolina launched its website, information requests dropped by about two-thirds.

The U.S. government does not make its spending data available at the checkbook level. The primary federal spending transparency website, USASpending.gov, publishes summary details on federal grants and contracts, but it doesn’t publish the payments that are made pursuant to each award.  In addition, federal agencies are not required to publicly report internal spending. As a result, there is no way for taxpayers to see a detailed or complete picture of federal spending.

In fiscal 2010 (the last year for which complete state figures are available), federal expenditures exceeded $3 trillion, more than 150% of than the total expenditures of all states combined. How many millions of taxpayer dollars could be saved by eliminating redundant federal subsidies or programs illuminated by checkbook-level spending data? What are the costs of federal Freedom of Information Act requests that would have been unnecessary if the information could be found online? What else might taxpayers have found if they could analyze federal expenditures as they can analyze state expenditures?

States have already begun to realize hundreds of millions of dollars in savings by opening up their checkbooks to citizens, but the federal checkbook remains hidden from the taxpayers who pay the bills. The bipartisan DATA Act, which the House passed unanimously last year, would require the federal government to finally start publishing all executive branch spending – both external grants and contracts and internal expenditures – at the checkbook level.

Monday, April 1, 2013

The DATA Act: Answering Big Government's Big Data Challenge


In the tech industry, “Big Data” is the buzzword of the day. It actually means pretty much what it sounds like – a whole lot of data. TechAmerica's definition gives us some more clarity: Big Data is "large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management and analysis of the information." Big Data refers to situations where we have so much data, and so varied, accumulating on a continuous basis, that we have trouble analyzing and using the information with traditional techniques. Big Data is a challenge for modern organizations. Skyrocketing volume, variety, and velocity are making it harder to extract meaning.

The challenge of Big Data is exemplified throughout the U.S. federal government, the world’s largest and most complicated organization. Consider the daunting complexity of federal spending. In FY 2011, the federal government spent over $3 trillion in the form of contracts, grants, loans, direct payments, and other expenditures.

Attached to all that spending is data. Why did Congress appropriate taxpayers’ money the way it did? How did the dozens of federal agencies obligate each appropriation amongst the thousands of federal programs? How did each program divide its funding between internal expenditures (salaries and such) and external awards (contracts, grants, loans, etc.)? Who received each award? How much was received, and when did the Treasury issue each payment? How much has each recipient received, in sum, from multiple federal awards? How have the awards been spent? Where were they spent? The government has the right data – somewhere – to answer each of these questions. Yet with countless distinct agencies or offices using thousands of separate IT systems, it’s essentially impossible to assemble government-wide answers. The needed data resides in too many different places, organized in too many different ways.  

The Big Data definition is not complete without the understanding of the big opportunity it brings. When advanced software tools are used to analyze Big Data, meaning can be extracted – insights that are more accurate, more detailed, and more useful. A report by the McKinsey Global Institute shows that the analysis of Big Data creates value by enabling experimentation, exposing vulnerability, creating transparency, permitting customization, and automating risk assessment. The tech industry is developing the needed tools. Companies are using them to catch fraud in financial markets, prevent drug abuse, find waste in the electric industry, and much more.

Big Data offers the same opportunity for big government. If the same techniques and technologies that are transforming the private sector could be unleashed on federal spending data, we would have government-wide answers to the questions above. Knowing these answers would help the government catch fraudulent contractors and grantees, prioritize resources, and identify and cut waste.

To derive these answers from its Big Data, the U.S. government needs to make two changes. First, it must publish more spending data in one place so that the tools can get at it. Second, it must apply consistent standards to the spending data wherever possible. With common identifiers and a common markup language for federal spending data, Big Data tools could make connections and compare large datasets easily.

The bipartisan Digital Transparency and Accountability Act (DATA Act), which passed the House of Representatives last year on a unanimous voice vote before dying in the Senate, is designed to make these two changes: publish and standardize federal spending data. The DATA Act would have required all the major categories federal spending data to be published together and reported using common identifiers and in a standard markup language.

The DATA Act’s death was not permanent. It is now awaiting reintroduction in the new 113th Congress. This should excite the tech industry, government transparency advocates, and anyone who cares about the effective use of taxpayers’ money. As the bill's author, Representative Darrell Issa (CA-49), wrote in a recent op-ed, "journalists, academics, and citizen watchdogs will be able to build tools that ferret out fraudulent and wasteful spending and analyze the value taxpayers get for their dollar… the same [B]ig [D]ata analytic techniques that improve performance and save money on behalf of the shareholder today can be used on behalf of the taxpayer tomorrow."

Stay tuned …

Friday, January 25, 2013

Connecting big business (regulation) to Big Data: Columbia report shows the need for action

Last Tuesday Columbia Business School's Center for Excellence in Accounting and Security Analysis released a definitive report evaluating the implementation of a structured data format for the financial statements that public companies file with the U.S. Securities and Exchange Commission. Over a year in the making and based on extensive discussions and surveys with corporate filers, investors, data and filing vendors, regulators, and others, the survey illuminates the promise of structured data to better serve investors, improve the enforcement of securities laws, and make the U.S. capital market more efficient. It also reveals serious flaws in the SEC's approach thus far - flaws which have prevented the promise from being realized.

The Columbia report is a call to action by both the SEC and Congress. The Data Transparency Coalition is going to pursue that action in 2013.

In 2009, the SEC adopted a requirement for public companies to file each financial statement in the eXtensible Business Reporting Language (XBRL) alongside the regular plain-text version. The requirement was slowly phased in over four years, starting with the largest companies and eventually covering all public companies. The XBRL format imposes a data structure on the financial statements and their notes and footnotes by assigning electronic tags to each item and defining how the items relate to one another.

Judging by potential impact, this is the most ambitious data transparency program ever undertaken by the U.S. government. The XBRL reporting requirement transformed all of the public financial statements in the world's largest capital market from cumbersome text, which must be manually transcribed to allow quantitative analysis by investors and regulators, into an open, standardized, machine-readable format.

In theory, replacing unstructured text with structured data should, by now, have triggered revolutions and disruptions all over the financial industry. The SEC's XBRL reporting requirement should, by now, have opened up corporate financial statements in the United States to Big Data platforms and applications.
  • Investors and analysts serving them should, by now, have started using powerful new software tools to compare and analyze the newly-structured financial statements - and to mash financial figures together with other data sources. They should be making better decisions, evaluating a broader universe of companies, and democratizing the financial industry.
  • Aggregators like Bloomberg and Google Finance should, by now, have started saving money and improving accuracy by ingesting corporate financial data directly from the SEC's structured XBRL feed instead of manually entering the numbers into their own systems (or paying someone else to do that).
  • The SEC should, by now, have incorporated structured corporate financial data into its own review processes, instead of relying on manual reviews of the financial statements in Forms 10-K and 10-Q.
  • Other federal agencies should, by now, have started automatically checking the financial performance of companies as reported to the SEC before bestowing contracts or loan guarantees (among many other possible uses).
None of these things is happening on a large scale - yet. The Columbia report explains why. The Columbia report also hints at what the SEC and Congress can, and should, do about it.

What does the Columbia report tell us?
  • Investors are demanding structured data - not unstructured text - to track companies' financial performance. The Columbia authors "have no doubt that [investors'] analysis of companies will continue to be based off increasing amounts of data that are structured and delivered to users in an interactive [structured] format" (p. i). "[T]here is clear demand for timely, structured, machine-readable data including information in financial reports, and ... this need can be met via XBRL as long as the XBRL-tagged data can reduce the total processing costs of acquiring and proofing the data, and that the data are easily integrated (mapped) into current processes" (p. 20).
  • Nonetheless, most investors are not making any use of the structured-data financial statements that public companies are now submitting to the SEC. Fewer than ten percent of the Columbia study's non-scientific sample of investors said they were using XBRL data downloaded directly from the SEC or from XBRL US (p. 61). Instead, most investors were getting their corporate financial information from aggregators like Bloomberg and Google Finance - some free, some not. Moreover, aggregators told Columbia that they were not using XBRL data either. Aggregators were mostly still electronically scraping the old-fashioned plain-text financial statements (which are still being filed alongside the new structured-data financial statements) and manually verifying the numbers - or paying others to do that "labor-intensive" work for them. (pp 26-27.)  
  • Two problems explain why most investors have not begun to use structured-data financial statements. First, they don't yet trust the data. "XBRL-tagged SEC data are generally perceived by investors as unreliable," say the Columbia authors, both because of errors in numbers and categorization and because of companies' use of unnecessary extensions, hindering comparability (p. 28). Columbia's review of the quality of structured-data financial statements filed with the SEC (conducted two years ago) revealed that fully 73% of filings had data quality errors (p. 32). Moreover, investors reported "a large number of seemingly unnecessary company-specific tags" (p. 21). Investors surveyed by Columbia were "especially hesitant about using the data until they are comfortable that the XBRL data matches the [plain-text] data in SEC filings" (p. 21). Aggregators, too, were holding off until accuracy and comparability improved.
  • Second, investors don't yet have a wide range of software tools to compare and analyze structured-data financial statements. "End users are also looking for easy-to-use XBRL consumption and analysis tools that do not require programming or query language knowledge. In general, these users are not willing or able to incur the significant disruption to their workflow that they perceived would be required to incorporate XBRL data without state-of-the-art consumption and analytics tools." (p. 24)
  • If these two problems were fixed, investors could make enthusiastic and productive use of structured-data financial statements. "[T]he potential for interactive data to democratize financial information and transform transparency remains stronger than ever, and many participants, including most investors and analysts, wish that the data were useful today," say the Columbia authors (p. 4). For instance, "virtually all investors" frequently use information that is available only in the footnotes of corporate financial statements to make their decisions - information that is now submitted and published in XBRL as part of companies' structured-data filings (p. 48.) "With respect to the detailed-tagged footnote data, in particular, several investors and analysts have communicated to us that they view XBRL data as potentially an excellent solution to manually collecting the data they need" (p. 31).
  • Even if most investors aren't directly using structured-data financial statements, there will be indirect benefits to investors and the markets if the SEC starts using such data for its own reviews. The study reported that "the SEC has begun to review the data to identify filer-wide, as well as individual company filing and financial reporting issues. XBRL data could significantly enhance the efficiency of the Division of Corporate Finance’s review of filings and facilitate a “red-flag” ex-ante approach to regulatory oversight." (p. 25) "Representatives from the FASB and the SEC have both stated on the record that, in their opinions, the amount of time that it takes them to conduct their respective analyses has been reduced significantly by their use of the XBRL-tagged data (p. 26)." Even imperfectly implemented, the XBRL mandate could indirectly benefit investors and the markets by improving the SEC's review and enforcement processes.
The SEC's XBRL reporting requirement could deliver transformative data transparency. But it has not. So far its impact has been incremental, not transformative.

To be sure, the problems identified by the Columbia study are problems of execution, not shortcomings of XBRL itself or of the concept of structured data. Investors and the analysts serving them "would like to have the U.S. regulatory filings tagged in a structured (e.g., XBRL) format that would meet their information requirements" (p. 5). For the SEC to eliminate the XBRL reporting requirement entirely - as some filers seem to hope that it will - would be a backward move and a tragic mistake.

Nevertheless, structured data for financial statements is, without doubt, "at a critical stage in its development. Without a serious reconsideration of the technology, coupled with a focus on facile usability of the data, and value-add consumption tools, it will at best remain of marginal benefit to the target audience of both its early proponents and the SEC’s mandate—investors and analysts" (p. ii). 

How can these problems be fixed?

How can the SEC fix these problems of reliability and analysis and deliver transformative transparency? The Columbia report suggests four answers:
  • First, insist on accuracy and quality! The SEC does not require companies to amend their filings to correct tagging errors and unnecessary extensions. The Columbia report suggests strongly that it should. The Columbia authors fault "the reticence (or inability) of regulators and filers to ensure that the interactive filings data are accurate and correctly-tagged from day one of their release to the public and forward (or, to communicate to the market for this information that they were not insisting on this and why)" (p. 37). It is "critical" to reduce errors and extensions, either through "greater regulatory oversight and potentially requiring the audit of this data" or through third-party quality checks (pp. 42-43). The SEC's own interests should motivate it to insist on accuracy once it becomes "serious about using the data in its Corporate Finance function and even for enforcement, as it should" (p. 43) (emphasis added). The need to improve quality might require the SEC and the Financial Accounting Standards Board to consider simplifying the underlying XBRL taxonomy (pp. i, 14, 43).
  • Second, communicate that structured data is not a supplemental feature of a regulatory filing. Rather, it is the filing! The Columbia authors explain that "the reliability of the data has been compromised by the way filers have approached their XBRL filings ... [perceiving] XBRL-tagging [as] an additional task in the financial reporting documentation process rather than as a part of the internal data systems" (p. 29). The SEC framed its XBRL reporting rule as a requirement to "create an XBRL-tagged reproduction of the paper or HTML presentations of their filings" (p. 37), rather than "making individual data points available for the end user to utilize or present as they required" (p. 39). Since filers think structured-data financial statements are "incremental to their existing [plain-text] filings, they do not perceive any user need" (p. 35) - and take few pains to ensure that investors using their structured data filings get an accurate picture of their finances. "We believe this presentation-centric step hindered or diverted what should have been an important evolution from a paper presentation-centric view of financial reporting information to a far more transparent and effective data-centric one" (p. 37). One way to correct this situation would be to move to a data format that is both human-readable and machine-readable, combining the plain text and structured-data tags in a single filing. Inline XBRL would do exactly that, and in fact the SEC is considering adopting this format (n. 48).
  • Third, encourage the development of software tools that make structured-data financial statements come alive! This is something of a chicken-and-egg problem. More software tools will be created as investors demand them. But effective, lightweight, cheap XBRL analysis tools are already on offer - notably Calcbench.
  • Fourth, expand the mandate! The Columbia report is clear that investors want more regulatory information tagged and structured, not less (p. 28):
i. The data that are required by the SEC to be XBRL-tagged are all relevant in varying degrees to some subset of the investor/analyst population, but more data are required than currently mandated—e.g., earnings release, MD&A, etc.  
ii. If anything, users require more, not less, types of machine-readable data to be made available, because a significant amount of information they require are not from SEC filings or financial statements. 
iii. The primary focus on data in the SEC filings of annual and quarterly financial statements seriously limits the perceived ongoing usefulness and relevance of the data.
Over and over, the report points out that the SEC's current mandate for structured data is limited to the financial statements and accompanying notes (pp. 14, 18, 21, 24, 34-35, 42). Everything else that companies must file with the SEC under the U.S. securities laws is still submitted only in plain text. These other materials - earnings releases, corporate actions, executive compensation disclosures, proxy statements, officer and director lists, management discussions - could be valuable if tagged. But they are not. Investors "view access to the full array of footnote, management discussion and analysis (MD&A), and earnings release numerical data as the main reason to consider adapting their workflow to incorporate XBRL-tagged filings" (p. 21). But this demand is "pent-up" because such items are not - yet - included in the SEC's mandate (p. 24).
What lies ahead? 

The path forward for the SEC is clear. First, the agency must take the basic steps that are necessary to improve the quality of structured-data financial statements. Second, to tap the full potential of structured data, the agency must first stop requiring the simultaneous submission of plain-text and structured-data versions of financial statements. It should instead collect single structured-data version. That would encourage companies, analysts, and the SEC's own staff to focus on data, not on documents. Second, data transparency requires full standardization as well as publication. Third, the agency must expand its structured-data mandate by phasing in more disclosures: earnings releases, management's discussion and analysis, executive compensation, proxy disclosures, ownership structure, board and officer lists, insider trading reports - and, eventually, everything.

If the SEC is unwilling to act, Congress could insist. Our Coalition will call for the reintroduction, this year, of the Financial Industry Transparency Act. That bipartisan proposal, first introduced in 2010 by Reps. Darrell Issa (R-CA), Edolphus Towns (D-NY), and Spencer Bachus (R-AL), would require these steps as a matter of law.

Friday, January 11, 2013

Coalition joins ITIF's Data Innovation Day - and answers five questions

The Coalition is a partner in the Information Technology & Innovation Foundation's Data Innovation Day, scheduled to take place on January 24th, 2013. ITIF Senior Analyst Daniel Castro interviewed Coalition Executive Director Hudson Hollister for a policy activist's perspective on government data. This interview is also published on the Data Innovation Day site.


5 Q’s on Data Innovation with Hudson Hollister
Hudson Hollister is founder and executive director of the Data Transparency Coalition, a trade association that is advocating for policies that will require federal agencies to publish their data online using standardized, machine-readable, non-proprietary identifiers and markup languages. I asked Hudson to give me his take on how data transparency is unfolding in the federal government.
 
Castro: You’ve been leading the charge in the call for more open data in government. How does data transparency improve government?
  
Hollister: For government, data transparency means that public information is both published online and also electronically standardized in a way that makes it searchable and useful. Data transparency allows citizens to track what their government is doing. Data transparency also allows a government to better manage itself. Since there are so many separate silos within any government, the best way to make sure that public information is available to all managers and staff who need it is simply to publish it.
 
Data transparency isn’t merely good for government. In a democracy, data transparency is an obligation. Public information should be recognized as a public resource. The taxpayers who paid for its creation and collection should have full access to it.
 
And data transparency is good for the tech industry. The members of the Data Transparency Coalition, led by Teradata Corporation, understand that when more government data is published and standardized, they’ll be able to use it for all sorts of new business opportunities.
 
We need data transparency for just about every type of information that any government generates (or requires someone else to report): spending data, management and performance reports, regulatory rules and filings, legislative actions, and judicial documents. And unfortunately the U.S. government does not deliver data transparency in any those five overlapping areas.
 
Castro: Can you give some examples, either in the United States or in other countries, of where the government has used data successfully? 
 
Hollister: We’re nowhere near true data transparency in the United States, but we do have some examples from innovative agencies that give us a hint of what’s possible.
 
In the spending area, we got our first taste of data transparency from the Recovery Accountability and Transparency Board, the temporary agency that was created to oversee the U.S. federal stimulus spending starting in 2009. The Recovery Board decided to take all the reports that grantees and contractors who received stimulus money were submitting to 28 separate federal agencies, put them in a standardized XML format, and publish them online for everyone to see on Recovery.gov. This allowed both the public and the government a more complete, accurate, and searchable view of spending – albeit only stimulus spending, not all spending – than ever before. And both the public and the government made good use of it. Activists used Recovery.gov to find local examples of both successful stimulus projects and wasteful ones and used them to call for change. Inspectors general at all the agencies used Recovery.gov to deploy sophisticated data analysis tools to find fraud. They recovered $40 million from questionable grantees and contractors and they prevented an additional $30 million from being paid out in the first place.
 
In the regulatory area, one good example of data transparency comes from the Securities and Exchange Commission. In 2009, the SEC started requiring public companies to submit their financial statements in the XBRL format as well as in plain text. This means that every number has an individual electronic tag, making it possible for investors to track companies’ performance across time and against competitors’ without having to enter the data into their own systems or spreadsheets (or pay someone else to do that). The SEC’s system isn’t anywhere near complete. It only applies to financial statements and doesn’t cover the other information that companies submit. But it’s a great start, and tech start-ups are inventing software that uses this data to make financial analysis faster, better, and cheaper.
 
Castro: Both the House and the Senate have introduced versions of the DATA Act. What would this legislation do?
 
Hollister: The Digital Accountability and Transparency Act, or DATA Act, would essentially expand the Recovery Board’s approach to all U.S. federal spending. This proposal would require the executive branch to publish its budget actions, grants and contracts, and disbursements on one website. It would also require standardized identifiers and markup languages to make this information searchable and machine-readable.
 
All this information is already being reported and collected. But it’s managed by four different agencies; some of the systems are public and others are not; and nobody has even tried to come up with common data identifiers or formats.
 
The House of Representatives passed the DATA Act – unanimously! – last April. Then, last September, it was introduced in the Senate by a Democrat and a Republican. But there wasn’t time for it to go through committee in the Senate, so the bill died when the Congressional session ended. The Data Transparency Coalition is campaigning for the re-introduction of the DATA Act in the new 113th Congress.
 
The DATA Act would help the U.S. government move toward data transparency for spending. We are hoping to pursue similar proposals in the other four areas as well.
 
Castro: It seems like some steps are already being taken to create a more open government. Why is federal data transparency legislation necessary? 
 
Hollister: Open government in the United States got a lot of attention when the Obama Administration announced in 2009 that agencies would be required to publish “high-value data sets” in machine-readable formats. The administration has indeed put a good deal of effort into building the electronic infrastructure that will be needed to publish standardized government data.
 
But despite all that attention, the most important data sets in all five of the areas I mentioned – spending, management/performance, regulation, legislation, judicial – are no more transparent than they were before.
 
Let’s take a look at spending data, for example. The flagship U.S. government spending website, USASpending.gov, provides nothing like real data transparency. First, USASpending.gov is incomplete – it only shows grants and contracts while ignoring internal expenditures. Plus it doesn’t show individual payments, just total amounts. Second, USASpending.gov isn’t fully searchable. There’s no way to view all the contracts that a particular company received, because without reliable identifiers the same company might have several different listings in the system. Third, its data is inaccurate because, without standardization, there’s no way to check the data against other systems for quality. The DATA Act would transform USASpending.gov into a complete, fully searchable, and reliable portal by applying the Recovery Board’s approach: publish everything, not just summaries or selections; and standardize it.
 
In the other four areas, the situation is the same or worse. Why is this? Because the most important data sets – the ones that show what the government is doing and what regulated entities are doing – are usually managed by more than one agency, or by more than one office within an agency.
 
There is only one way to get multiple agencies and offices to move toward data transparency, and that’s a legislative mandate.
 
As everyone from Washington Post columnist Dana Milbank to the Government Accountability Office to the American Institute of CPAs to the former chairman of the Recovery Board has said – we need legislation to achieve transparency in federal spending. We need to pass the DATA Act.
 
And eventually, we’ll probably need similar mandates for other types of federal data.
 
Castro: State and local governments also produce a lot of data. What advice would you offer state and local government leaders who are thinking about data transparency? 
 
Hollister: Eventually, I hope our Coalition will have the resources to work with state legislatures and agencies the way we’re already engaging Congress and executive branch leaders.
 
But many state and local governments are already making great strides in data transparency. I’d just encourage them to pursue both principles – publish everything, and also standardize it – and to recognize that to achieve cross-agency and cross-office cooperation sometimes legal mandates are necessary.

Wednesday, January 2, 2013

New year, new Congress: what's next for DATA?

Tomorrow a new U.S. Congress – the 113th, elected last November – will be sworn in.  The 112th Congress finished its run with a divisive confrontation on fiscal priorities.

Advocates of opening up government data are rightfully disappointed at the lack of Congressional action on the Digital Accountability and Transparency Act, or DATA Act. The DATA Act, championed by Reps. Darrell Issa (R-CA) and Elijah Cummings (D-MD) in the House of Representatives and by Sens. Mark Warner (D-VA) and Rob Portman (R-OH) in the Senate, would have required the U.S. government to publish all executive-branch spending data on a single Internet portal. Spending data is currently reported to at least seven separate compilations maintained by different agencies, some publicly accessible and others not.

The DATA Act also would have set up consistent data identifiers and markup languages for all spending data, which would have allowed the separate compilations to be searched together, dramatically enhancing government accountability.

The DATA Act would generate new opportunities for the tech industry, as our Coalition demonstrated at its DATA Demo Day last summer. If U.S. federal spending data were consistently published and standardized – Big Data analytics companies could develop new ways to find waste and fraud; financial management vendors could offer new services to their existing federal clients; and infomediaries like Google could republish and repackage the data for the public and profit from advertising revenues.

What happened?

The DATA Act was first introduced in both chambers in June 2011, and shortly thereafter was approved by the House Committee on Oversight and Government Reform and sent to the full House of Representatives. After further revisions the bill passed the full House in April 2012. In September 2012, Sens. Warner and Portman rewrote and reintroduced the bill in the Senate to address objections from grantees and contractors. But the next step – consideration by the Senate Homeland Security and Governmental Affairs Committee (HSGAC) and the committee’s recommendation to the full Senate – never occurred, because the committee did not hold a business meeting between the DATA Act’s reintroduction and the end of the Congress. The HSGAC’s leaders – chairman Joseph Lieberman (I-CT) and ranking member Susan Collins (R-ME) – didn’t find the DATA Act important enough to merit the time and effort of a committee meeting, deliberation, and a vote.

With the end of the 112th Congress and the start of the 113th, all pending legislative proposals are wiped out and must be re-introduced.

Is the DATA Act dead? Absolutely not! Chances are good that the bill will be introduced again and passed – by both chambers, this time! – in 2013.

Here’s why. 

The DATA Act’s champions aren't going away. Rep. Issa, the DATA Act's original author, and Sen. Warner, its first Senate sponsor, remain well-placed to champion a new version of the bill, with Issa keeping his position as chairman of the powerful House Committee on Oversight and Government Reform. Meanwhile, term limits and retirement will bring new leadership to the Senate HSGAC: chairman Tom Carper (D-DE) and ranking member Tom Coburn (R-OK). Carper and Coburn may well prove more enthusiastic about pursuing data transparency in federal spending than Lieberman and Collins were. 

The Recovery Board's spending accountability portal IS going away. The original DATA Act was based on lessons learned from the work of the Recovery Accountability and Transparency Board, the temporary federal agency established by Congress in 2009 to oversee the spending mandated by the economic stimulus law. The Recovery Board used standardized electronic identifiers and markup languages to make its online accountability portal fully accurate, complete, and searchable. According to the Recovery Board's most recent report, inspectors general using the portal recovered $40 million in stimulus funds from questionable contractors and grantees and prevented $30 million from being paid out in the first place. The Recovery Board's portal could easily be expanded to cover all spending rather than just stimulus spending. (No reliable government-wide spending accountability portal exists.) But the Recovery Board's legal mandate only covers overseeing the stimulus. The temporary agency will be eliminated when the stimulus law expires on September 30, 2013. Unless Congress acts, the Recovery Board's portal will be decommissioned on that date - and replaced with nothing. The Recovery Board's imminent demise should put pressure on Congress to pass legislation to make sure that the time, money, and effort spent to create the stimulus accountability portal are not wasted. The April 2012 House version of the DATA Act - and amendments our Coalition suggested for the September 2012 Senate version - would do exactly that.

The press is taking notice. Last month, Washington Post columnist Dana Milbank asked why the DATA Act - "so uncontroversial that it passed the House on a voice vote" - has yet to achieve full passage. Milbank's reaction is that of any smart layperson: the need to publish all federal spending data online, and make it fully searchable through electronic standardization, sounds like something the executive branch should be doing already - and if it isn't, Congress should tell it to. 

The nonprofit sector supports the DATA Act. The DATA Act has earned support from a raft of public-interest and political nonprofits, including the American Institute of CPAs, OMB Watch, Americans for Tax Reform, the Sunlight Foundation, and many other organizations and activists on the left and right.

The for-profit sector supports the DATA Act, too. Our Coalition was organized in April 2012 as the only trade association of technology companies calling for federal data reform. We now have eighteen (and growing) members, including companies like Teradata Corporation, RR Donnelley, and WebFilings; nonprofit organizations interested in data standards; and individuals who support our campaign. Our members realize that the DATA Act (and similar policies in other parts of the government's data portfolio, especially financial regulatory reporting) will be good for democracy and bring new opportunities to the technology industry. The Coalition will help keep the spotlight on the DATA Act in the 113th Congress.

Inspectors general and the Government Accountability Office have called for legislation. In testimony before the House of Representatives, the chair of the Council of Inspectors General for Integrity and Efficiency called on Congress to pass the DATA Act. The Comptroller General of the United States, who heads the Government Accountability Office, sent a similar message to the Senate (video - 56:00).
 
There's no other way. Connecting Big Government to Big Data will help citizens and watchdogs confront waste and fraud, while also bolstering our fastest-growing industry. No other proposal promises to do that.

Welcome to the official mouthpiece of the Data Transparency Coalition.