[an error occurred while processing this directive]
Snap Surveys

Analysing and Joining data from external sources

September 2003

Introduction

Many Snap users use the software in the conventional way to devise questionnaires, collect and record data and continue on to analysis. For many, the ease with which that whole process is conducted is the thing that attracted them to the software in the first place. Increasingly though, the ability to analyse data from individual surveys is not enough. For some, data required for analysis may be already available as a database. What is required is an ability to import that data into a form that Snap can manage Taking that one step further, some have a requirement to join that data imported from a database onto data collected from a subsequent survey. For example, where a panel of respondents has been composed, and their details managed in an external database system, it may be required to attach those details (typically unchanging demographic profiles of panel members) onto matching records representing each individual's views on a particular topic of interest. Several new features were recently added to Snap 7 simplifying many of these operations. It is the use and application of these that we consider in this month's Focus On article.

Importing data from external sources

It could be that there is already a store of suitable data available for analysis to resolve particular issues. Clubs and societies may have databases of members; charities and non-profit organizations may have databases of givers and commercial companies may have details of customers and clients held in a database. In such circumstances it would be a wasteful exercise to set up a survey specifically to interview people whose details are already available – albeit in a different form. It is the preparation of such data for use in Snap that we look at first. The process begins by exporting from the database a Comma-Separated Values (CSV) format file containing the data and each field represents a response to a particular question for that respondent.

Depending on the facilities available in the exporting program it may be necessary to edit the content of the file before importing it into Snap. In particular, you would typically want to ensure that the first line contains a list of field names each of which represented the data in the corresponding field underneath. A simple way of adding this if the exporting program did not provide it is to open the file in spreadsheet software such as Microsoft Excel and add the line in by hand. The next step is to import the file into Snap. That can be done in one of two ways:

  • Import the data into an empty survey – Snap will create literal variables for each of the incoming fields. It will use the names assigned in the first row when present and also set the labels to the text in the second row.
  • Import into an existing survey. This may be one that has been set up expressly for the purpose. This method has the advantage that dates, quantities and single-response questions can be distinguished from literals and as soon as the data is imported it is immediately ready for categorical analysis. The ability to import single-response data immediately is especially useful. If the 'code labels' box is checked then, for fields representing single-response variables, Snap will match up the text found with the code labels thus a field recording respondents gender 'Male, 'Female', 'Female', …, 'Male' will be correctly ascribed to the required category.

Connecting data from two sources

There are many circumstances where response data collected and recorded for one survey is required to be analysed against or alongside data from another. Here are just three examples:

  • Panel studies. In such studies, an initial recruitment questionnaire is used to collect data such as personal demographics (age, gender, social profile) of respondents. Some or all of the panel members are subsequently set questionnaires on specific topics of interest. By connecting the appropriate panel-member demographics to their responses from one of the topic surveys it is possible to cross-analyse responses by the demographic profile of the respondent. In surveys of this type, each panel member would have no more than one corresponding case in the topic survey (some may have no corresponding topic case because they either weren't invited or didn't respond to the topic questionnaire).
  • Agent studies. Similar to the above in principle, details of agents are recorded through an agent recruitment survey. The agents are subsequently responsible for collating further data on the topic of interest. For example the agents may be lawyers compiling details of house sales they have acted for. Each of the returned topic questionnaires would thus record characteristics of the purchasers and details of the house type, location and price. By joining cases from the agent survey to corresponding cases from the topic survey it becomes possible to cross analyse purchase details by agent characteristics – or, if required, by individual agent. In such a survey, each agent would typically be associated with many cases in the topic survey.
  • Serial studies. These are surveys where a succession of questionnaires are posed to individual respondents. For example, medical studies which aim to monitor the effectiveness of courses of treatment, or which aim to categorize expected progress following an operation, may involve selected patients completing a number of questionnaires. Each patient would be given one questionnaire at the outset of treatment, then follow-up questionnaires at predefined intervals. Patients would typically start their treatment at different times therefore, at some stage during the study, some patients will have completed all of the questionnaires, some just one and the others somewhere in-between. Thus, although there is no formal panel of respondents, the effect is to have a rolling program of panel recruitment. By joining corresponding cases from successive surveys, it becomes possible to evaluate and analyse changes between the stages and thus quantify effectiveness of the treatment or degeneration of a condition.

Necessary to give each case in the panel or agent survey a unique key value. For the serial survey, each individual involved would similarly be assigned a unique key value. The key value is then recorded a part of the topic survey data. During the join process, cases where the key values of the two surveys match are effectively cemented together to form a longer record combining selected elements from each if the two contributing surveys.

Using the Join Import function

Snap's in-built 'join import' function requires that the two sets of data to be joined exist in the form of Snap surveys. If one is initially held in a database or spreadsheet then the import technique discussed earlier can be used to create a Snap survey from which to proceed.

Talk through the process.

1. Identify one of the surveys and select the File | Join import option. The survey selected would typically be the topic survey in the case of panel or agent surveys. For serial studies it would be appropriate to select the earliest of the two studies to be joined. The wizard includes an automated 'clone' facility to which ensures that the result of the join import is a new survey. By choosing to clone the selected survey it becomes possible to repeat the process again with different settings if required.

2. The next step is to identify the survey containing cases to be joined. For panel or agent studies this would be the survey of panel members or agent characteristics. For serial studies it would be the survey representing a following stage (usually the next stage) to that represented by the survey chosen in step 1.

3. Following that, the key variables are identified for each of the contributing surveys. The values these variables have for each case will be used to match cases from one survey to corresponding cases in the other.

4. The next step is to select which of the variables from the second survey are to be brought across. These will be the ones available for analysis in the combined survey. Typically you would either select all the variables or select all those that are not Notes.

5. You also need to specify the prefix which will be added to the incoming variables to distinguish them from those variables already in the survey being joined to. The default prefix is "R." thus a variable Q1 in the incoming survey will be called R.Q1 in the final result.
6. The final step is to verify that names resulting from applying the above prefix do generate valid Snap names. Assuming that all is OK, click Finish and the join will be performed.

Analysing the joined data is then simply a matter of specifying one or more of the imported variables in the required analysis tables and charts.

Conclusions

New data import and export functions in Snap 7 simplify the exchange of data between Snap and external databases and spreadsheet systems. The inclusion of the label matching function especially enables categorical data from external sources to be imported and immediately available for analysis with minimum set-up. The join-import function provides a powerful mechanism for cross-analysing data from two or more separate, but related, studies.