Letters from America

Wednesday, June 23, 2010

Toward collaborative information sharing

I gave testimony today in New York on their proposed legislation on open government and open data. This whole world of Web 3.0 collaborative information and Gov 2.0 solutions is supposed to be self-describing as per the W3C and their “open data” RDF syntax.

However it is very unclear how this utopia is attained and at what costs.

The city of course is looking at their costs of getting data and then supporting that with archives and updates and publication feeds. Having a harmonized approach can potentially significantly reduce deployment and sustainment costs along with potential software development collaboration and cost savings for cities themselves. Having a common view also of course helps solution providers market to cities nationally not just locally. Perhaps the biggest challenge is the unspoken one of complexity. The more one steps into data sharing one sees the opportunity for people to interpose complexity. Keeping things simple, yet consistent and transparent requires constant vigilance and oversight to ensure that solution providers are not injecting their own self-serving complexity. After all complexity costs money to build and support, is a barrier to competitors, and hence vendors are naturally drawn to inject complexity.

This could be the opportunity for standards based development of “CityHallXML” providing the most common information components of financial, infrastructure and performance data along with census and demographic data.

Today also I published a paper on creating dictionaries of information canonical XML components, aligned to the NIEM.gov approach and CEFACT core components model.

http://www.oasis-open.org/committees/document.php?document_id=38385

This juxtaposes with the W3C world view of self-describing data instances and RDF. You have the approach of either the embedded RDF semantics, with all that overhead on each and every data item (aka “Open Data”), or you have this OASIS-based approach of semantics referenced in domain dictionary components and information structure templates that allow comparatively small concise data instances where the XML tags provide the content referencing between content and semantics about the content.

Equally important is that the canonical components are built using naming and design rules (NDR) that drive consistency of approach and convergence on terms and meaning.

This all contrasts with today’s approach of publishing mega-structure as a Schema that contains all possible exchange components for every facet of a business process. This then forces developers to unravel the puzzle of what each part of the business process needs from that mega-structure, often sending redundant or empty data elements, instead of dynamic content assembly templates using selected parts from a dictionary of canonical components.

Now, lets assume everyone drinks this OASIS "Cool-Aid" - they create domain dictionaries of canonical components, and then use shared open source tooling to create their information structures dynamically and the tooling takes care of all the plumbing, templates, extraction and creation of XML instances from backend data stores, and submission to XML online repositories for archiving and exposure through search and retrieval services.

Vendors and government collaborate to develop and deploy open source based portals that allow further sharing and open access to data. Additional niche services using collaborative social platform tools integrate into these and deliver a wealth of community facing solutions to citizens.

Life is good.

This means technically no one strictly needs to publish formal exchange structure schema any more, exchanges are dynamically built to purpose by the communities. We already saw this need happen recently during the Haiti relief effort, when OASIS Emergency EDXL had to be extended on the fly to support on the ground situations with hospitals and the services they can provide.

So what is left to achieve in this uber Web 3.0 world and data sharing dominated by XML based services driven by today’s technology underpinnings of SOAP, REST, RSS and http, IETF and W3C speak with RDF?

We could envision that there would be the need for a triumvirate to manage and steward the go forward where federal, state and local government stakeholders need independent oversight and technology guidance. This is similar to what NIEM.gov is currently doing federally and perhaps as New York and other states are seeking to do today.

Of course many vendors are out there pitching their wares and setting up stall, figuring if they can own a states data then they essentially have a license to print money from those needing access to data or pushing targeted advertizing content at them along with the data they seek. In New York was heard testimony that “It’s only a small monthly fee or one time subscription for a week’s access to what you need and we have analysts to help you”. Notice also that Microsoft has created OData to publish RSS driven feeds that link also into SharePoint, and then Google has its own open data APIs available and associated search tools.

So for the triumvirate this could be positioning in terms of long term objectives keeping data sharing truly open without the dominance of particular solution providers at the expense of smaller community based services, or even the community itself. Information empowers democracy but can also be used to track and restrict freedoms of those who would seek that truth and equality. Asking suspicious questions can incur penalties or allow law enforcement to track potential suspects.

Even in the traditional areas of formal legislated transactional information exchanges for secure B2B the gap there will continue to blur as the use case for open data encroaches on transactional data and network speeds continue to erode what is thought of as optimized high volume exchanges with small transactions.

The blurring is accelerated by building contextual business process driven data exchanges from components drawn from canonical dictionary collections with embedded links to open data sources; e.g. I send the city a price quote for items and embed reference links to my public company profile, my digital certificate public key registered with the city, and the links and references to item descriptions published by the city for the RFP. The city itself then on contract award can simply publish that same information as was submitted as the bid.

This ushers in a very collaborative new world.

A further need then is Web 3.0 enabled portals and services that can publish canonical dictionaries of component definitions to help drive standardization out there in the domain user communities. This then provides authoritative sources for good high quality components for use in building collaborative spaces and information exchanges.

Then just maybe the challenge lies beyond data and into rule sharing and systems?

If we have solved information sharing then the next piece of the puzzle is open sharing of the under laying rules and trap doors that can snag the unwary? Clearly rule sharing systems are the next step up from just data sharing because they have to be built on top of consistent information representations.

Back in the day in 1998 when we started the XML/edi work we talked about "The Fusion of Five" - XML, EDI, repositories, templates and agents.

Each of these represents:

XML - web foundation
EDI - business methods
Repositories - reference component dictionaries
Templates - process logic for exchanges
Agents - implementation control and intelligent automation tools

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.3149&rep=rep1&type=pdf

Checking off the first 4 here as becoming closed chapters in the brave new Web 3.0 world, so is the agent piece the next great frontier? We are already seeing related work such as the OASIS SET TC that is providing a framework for information mapping automation.

Clearly the world is redefining what is perceived as possible and what requires better solutions and standard representations.

// posted by DRRW @ 10:31 AM

Letters from America

Wednesday, June 23, 2010

Toward collaborative information sharing

I gave testimony today in New York on their proposed legislation on open government and open data. This whole world of Web 3.0 collaborative information and Gov 2.0 solutions is supposed to be self-describing as per the W3C and their “open data” RDF syntax.

Links

ARCHIVES