Metadata standards – Friend or Foe?

October 2001

"Friend or foe?" It is the ancient question asked in battle when a stranger approached. Yet in today’s businesses and organizations, where the imagery is so often of war on the competition, it is not helpful for those purchasing services or products to find a battle royal being waged on their doorsteps between rival suppliers. It can cost you dearly in waste and inefficiency if your suppliers are not willing to acknowledge that some of their traditional foes may be among your army of helpers, and take an obstructive rather than a constructive approach to working together.

A lot of waste and inefficiency in the survey process has been caused by the difficulty of bringing data together from different sources, especially when different software packages are used to collect or process the information. This is because each package tends to have a unique way of handling the data and the descriptions or definition of the data -- the part we call the metadata: the data about the data.

While some of this lack of co-operation was undoubtedly Machiavellian in origin, with some suppliers in the past believing the best way to keep customers loyal to their software and services was to make it as hard as possible to transfer projects from their system into anyone else’s, the commonest cause was simply muddle, due to the lack of a common strategy.

Software suppliers started to build specific import and export routines, but as more packages and data formats arrived every year, the problem got worse, not better.

Snap Surveys takes a leadership role in enabling co-operation between suppliers and software providers

Faced with the prospect that they would be putting more effort into creating imports and exports than enhancing their products, three competing survey software producers got together in the early 1990’s to thrash out what was to become the world’s first survey data metadata standard.

That was when Snap Surveys, Pulse Train and Merlinco formed the triple-s group. These days, when standards are on everyone’s agenda, Snap Surveys is continuing to take a leadership role in the developments of these standards. Not only does this ensure that Snap Surveys customers can benefit from seamless integration between their Snap Survey Software and other system, but is also ensuring that the voice of the market and social researcher is heard, and the unique needs of survey data are acknowledged in the models being developed.

Here, we look at some of the latest developments where Snap Surveys either has a seat at the table or is engaged in the debate, and examine some of the benefits these initiatives could bring.

triple-s logo

Triples survey interchange standard

Not only is triples the first computer standard in the research survey world, it is the most successful to date. Snap is one of over 30 different software packages that now support triples Much of its success is down to its simplicity, which makes it easy for software engineers to incorporate in their products, and easy for software customers to use.

Triples defines a method of defining survey data and the textual descriptions associated with it, in a system-independent, machine-readable format. For a package to be triples compliant, it does not have to keep its own data in triples format but it must be able to create triples data and metadata (the ‘export’ part) on demand, and must be able to create surveys by reading in the triples metadata before populating the survey with the data from the triples data file (the ‘import’ part).

The files created by a triples export are easily transferred over a network, attached to an e-mail or loaded onto a diskette or CD ROM. Though they are designed to be machine readable, they do also make sense to the human eye.

Triples continues to evolve, and an XML version of the standard was introduced for the new Millennium, which many of the packages now support. Very shortly, triples will be upgraded again to provide support for more advanced features, including filter conditions on questions due to routing logic and support for multiple language translations. The standard has been commended by the Object Management Group -- a powerful endorsement for triples, and it continues to receive plaudits from other standards proposers because it has been adopted so widely and is everyday use across the world.

OMG logo

OMG: promoters of the Common Warehouse Metamodel

OMG is an independent body of voluntary members from the commercial software world. It exists to promote and endorse open standards in many different industry-specific applications, and is only interested in initiatives that have a commercially realizable application. For example, it expects any proposal for a new standard to be adopted and implemented by software producers within a year.

Of particular relevance to market and social researchers is OMG’s stewardship of the highly respected Common Warehouse Metamodel or CWM. This major pan-industry initiative is starting to provide an agreed framework for defining the storage and handling of the massive amounts of data in data warehouses.

Steve Jenkins, Snap's Managing Director, representing the triples group, is an active member of one working party -- the Analytic Data Management special interest group -- along with representatives of OpenSurvey, another important open standards body. This involvement should ensure that CWM, which looks increasingly as if it will be adopted as a standard across the IT industry, will be able to accommodate questionnaire data, unlike most of the existing data warehouse models.

Metanet logo

Metanet: promoting pan-European standards for statistical data

Metanet is a European Union-funded body working to harmonize various metadata initiatives throughout the EU countries. It works closely with Eurostat, the pan-European statistical office, and an important part of Metanet’s remit is to seek ways to co-ordinate a bewildering array of metadata models and initiatives coming from each member state’s national statistical office. Metanet aims to produce a reference book for metadata standards, methodology guides and training manuals.

The issues Metanet is researching are surprisingly relevant to the kinds of survey data Snap users collect. Though the majority of members of Metanet’s various working groups come from the world of government and national statistics, Snap Surveys is involved with the groups looking at methodology and standard descriptions of terms, to ensure that the models proposed will be consistent with the needs of social and commercial researchers too.

DDI logo

DDI: making it easier to go back to survey data from yesteryear

Statistical and social research data is also the concern of an American project that is generating a lot of international interest: the University of Michigan’s Data Documentation Initiative.

DDI has developed an XML-based tool, using open standards, that will lay down extensive machine-readable textual descriptions of past surveys to make them more readily available for re-analysis and interpretation in the future. It will mean that, as analytical tools continue to develop, or as historical data gains a new relevance in the future, going back to existing data collected in formats used by long-gone packages or methods will no longer be the trouble it is today. Instead, structured layers of XML will organize the information and present it to the researcher in an easily accessible way.

Opensurvey logo

OpenSurvey

OpenSurvey is a not-for-profit organization that exists to promote open standards for survey software and encourage the development of open source software. It was founded last year and has wasted no time in proposing two important new standards for surveys: AskML, which will be an XML-based metadata standard for the survey instrument or questionnaire, and TabsML, a similar standard for cross-tab reports. While AskML is still in development, TabsML has been implemented and adopted by three software suppliers, making it easy to produce tables in one package and use another to distribute them or publish them on a web portal.

OpenSurvey has strong links with triples through Keith Hughes, a founder member of the triples group and a member of OpenSurvey, so there is real hope for convergence rather than a proliferation of competing standards, as has happened in other sectors.

While the triples import will already let you receive data from other departments or organizations using different software and load it into Snap, the standard is concerned with mapping data that exists: not for mapping out how an interview works. That would mean overloading triples and losing its valuable simplicity, which is why the triples group is endorsing OpenSurvey’s initiatives.

The Holy Grail for OpenSurvey, and for many survey software users, is that AskML will make survey instruments completely system independent. It should mean, for example, you would be able to design a survey in your favourite authoring tool, like Snap, e-mail it to a data collection specialist on the other side of the world (or the next street), and be certain they would be load your survey on to their system and use it without any intervention, regardless of the software they are using. All the questions, logic, routing, display options and so on would be faithfully reproduced using the equivalent features offered by their package.

United against the common foes of waste and inefficiency

These are among many initiatives that are bringing together organizations and individuals that, in the past, you would not have expected to be talking together and sharing ideas.

Keith Hughes is a founder member of triples, active member of OpenSurvey and also finds time to develop software for his own Merlinco software package. On why he feels it is worth the effort to be engaged with open standards, he comments: "There are two reasons – one is just the ethical belief I have in open standards and open software. I feel my value is in interpreting what technology can do for the researcher, so there is always an edgy feeling in selling software and claiming ownership. The second reason is pragmatic. I was fed up with writing imports to my competitors’ software when they could change it at a moments notice without having to tell anyone. At least if we have one common standard to adhere to, there is a lot less pain all round."

Keith considers triples has also allowed developers to specialize and no longer feel they must provide the total solution. So long that there is another solution that supports triples, then customers are not forced abandon the package they are familiar with, but can use both in tandem.

It is a sentiment echoed by Laurance Gerrard, technical director of Maritz UK, one of the world’s major research agencies. Laurance, a firm advocate of triples, comments: "Without standards like triples, we limit the number of packages we can use and we waste too much time shifting data between the packages we do want to use. Time equals money equals profits. As an industry, we need to make ourselves more efficient, and this means looking to use your time more productively instead of chasing your tail with these mechanical processes.

"I do not believe the market research industry has paid enough attention to standards, or put enough pressure on suppliers. It is down to everyone to start asking their suppliers ‘When are you going to start adhering to standards?’

Links