EdCenter Logo EdCenter Header Links

http://www.edcenter.sdsu.edu/presentations/sdscinternposter2001/swb.html

Extending Data Compatibility for
the Sociology Workbench With
XML Parsers, Wrappers, and Stylesheets

Lindsay Stocks, Kirsten Barber, Kris Stewart
NPACI Education Center on Computational Science and Engineering
San Diego State University
Chaitan Baru, Ilya Zazlavsky
Data-Intensive Computing Environments Group
San Diego Supercomputing Center


Abstract

The Sociology WorkBench (SWB) is an interactive social science research and education portal consisting of a simple intuitive interface to a collection of on-line analysis tools for processing large datasets of survey data. One of the primary goals of version 2.0 of the SWB is to become a key node within the Digital Government Information Integration Testbed and more generally within the National Partnership for Advanced Computational Infrastructure (NPACI). An important milestone towards this goal is the development of XML parsing tools and wrappers and their incorporation into the SWB interface.

The primary focus of this REU Project was to achieve this milestone. The first step was to identify an appropriate development tool. Metadata technologies are still new, evolving rapidly, and considerable time was spent exploring and evaluating these tools before a choice was made. Once our choice was made, it was a matter of many hours of hard work until 1) XML data parsers for the SWB Dataset Upload feature and 2) XML wrappers for SWB Frequency Tables, Cross-Tabulation, and Rules Analysis were finally implemented into the SWB interface. In the latter case, the REU Student performed nearly all of the evaluation and development, and is now the 'resident expert' in the field. For the first time SWB users may now have the opportunity to upload their XML-formatted survey data into the SWB, and accept results returned from an SWB function in XML-format.


Introduction

The Sociology WorkBench (SWB) is an interactive social science portal and a collection of on-line analysis tools for processing large datasets of survey data. The SWB was founded on a collaboration between the Data-Intensive Computing Environments group (DICE) at the San Diego Supercomputing Center (SDSC), the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, and EOT-PACI. For both versions 1.0 and 2.0 of the SWB, primarily undergraduate student programmers at the Education Center on Computational Science and Engineering (the 'EdCenter') have served as developers. In return, the SWB has served as an undergraduate research laboratory where students experiment, explore, and address challenges including efficient client-server architecture, open source applications servers, database connectivity, security, user interface design, etc.

This REU Project focuses on integration of the SWB within the framework of the Mediation of Information Using XML (MIX). Following the MIX wrapper-mediator architecture, the resulting SWB wrapper accepts query fragments from mediator middleware, relays them to an Oracle database in PL/SQL or Java, converts query results in XML and returns them to the mediator. The DDI (Data Documentation Initiative) format for describing survey data, being developed at ICPSR, is now the SWB's primary input format. Parsing large and complex DDI-compliant XML files and integrating them with data from various legacy systems represents a novel and challenging task.



Project Goals

Main Goal of the SWB ver2.0: One of the primary goals of SWB v2.0 is to become a key node within the Digital Government Information Integration Testbed (DGIIT, See Figure 1) and more generally within the National Partnership for Advanced Computational Infrastructure (NPACI). Implementation of DDI parsing tools and SWB Oracle wrappers into the SWB interface, the primary focus of this REU Project, would be a significant step towards such a goal.

Specific Goals for this REU Project:
1) Become familiar with the XML/DDI format by assisting with design and development of a dataset upload feature, wherein user's data files in XML/DDI format may be uploaded through a web browser into an Oracle 8i database and processed using a servlet-based SWB wrapper,

2) Research and evaluate technologies which may then be used to design and develop a new SWB interface feature in which the user may issue queries against the SWB wrapper and eventually against a statistical mediator interacting with the SWB and other wrappers from an SWB client application,

3) Design, develop, and implement a method to store the data definition information as a query-able XML document, and the survey responses as a relational table accessible via PL/SQL or Java, in the process of ingesting DDI-compliant data.

Methods

Initial experience with Oracle 8i XML (Extensible Markup Language) was gained by assisting with the development of the SWB's XML-formatted Dataset Upload feature. Once an adequate knowledge of XML was acquired, we went on to explore and evaluate various software tools for XML wrappers and middleware to process Oracle output. Numerous options were encountered including programs such as XSQL Servlet, XDK, and XALAN.

Working mostly independently, I decided on a 'hybrid' approach, the oracle.xml.sql.query.OracleXMLQuery API

(http://technet.oracle.com/docs/tech/xml/oracle_xsu/doc_library/
oracle/xml/sql/query/OracleXMLQuery.html
)

using the jar files from the XSQL servlet along with Java and JDBC to query Oracle tables and produce XML. This was the simplest way to achieve the goal of XML output from Oracle. Using the XSQL Servlet itself required a lot of Oracle configuration and setup, XDK requires additions to oracle, XALAN is another jar file that could have been used but I just chose the jar files I already had downloaded.

Once a choice of XML tools was made, I proceeded with the development of XML data parsers and wrappers for the SWB. I chose a gradual implementation strategy, running simple tests, increasing the complexity of the test documents, with the ultimate goal of incorporating XML into the output of the SWB.


Results

As of this presentation, XML Data Parsers have been successfully integrated into two important parts of the SWB interface. The first is the Dataset Upload feature, wherein a user may format their survey data within the XML DDI framework and upload the dataset for immediate analysis with the SWB (see Figures 2 & 3). The second is the analytical results being returned from the SWB in XML format. Multiple formats for reporting results are provided depending on browser support of XML. (See Figure 1)


Figure 3: Sample SWB Output of a ddi compliant XML document

To produce XML output from Oracle, it was necessary for the SWB team to create SQL functions organized so we may easily convert Oracle tables into an XML document. This XML-formatted document may then either be displayed to the screen or processed using an appropriate stylesheet to create a more usable table. Output Tags are determined by the SQL query, and vary with each type of table.

Figure 4: This series of commands underlies the display of tables in the SWB by returning standard XML output but associating it with an XSLT stylesheet using the OracleXMLQuery API.

SWB Users such as regional governments, civil service offices, demographics survey analysts, may now consider using the metadata support offered by the SWB to incorporate their survey analyses into the larger picture of a unified digital government. You may explore these new features of the SWB by navigating with their browser to the SWB ver2.0 home page at:

http://www.edcenter.sdsu.edu/swb/swb2/



Discussion

The goals of this REU project were part of a larger research and education project focused on furthering the metadata compatibility of the SWB within the mediated framework of XML, and these goals have been attained at the time of this presentation.

The technology underlying and supporting XML is still in its infancy, imposing a considerable challenge to anyone, particularly an REU student, to make an appropriate choice for the needs of this project. Some work on our primary goals still remains, but by and large success was achieved against these odds, and the SWB ver2.0 is now a significant step closer to incorporation as a key node in the DGIIT.


Acknowledgements

Special thanks goes to Dr. Mikhail Burstein, the EdCenter's former Computer Resource Specialist (CRS), and Jeff Sale, EdCenter Staff Scientist, who helped with mentoring and supervision of my efforts.

 

[EdCenter Home ] [Projects] [News] [Training]
[NPACI Showroom] [People] [Resources] [Directions]