Objects for all

The code I’ve written for generating Data Cube RDF is mostly in the form of methods stored in modules, which need to be included in a class before they are called. Although these will work just fine for converting your data, those who are not already familiar with the model may wish for a more “friendly” interface using familiar ruby objects and idioms. Though you can access your datasets with a native query language which gives you far more flexibility than a simple Ruby interface ever could, you may not want to bother with the upfront cost of learning a whole new language (even one as lovely as SPARQL) just to get started using the tool.

This is not just a problem for me, this summer; it occurs quite often when object oriented languages are used to interact with some sort of database. Naturally, a lot of work both theoretical and practical has been done to make the connection between the language and the database as seamless as possible. In Ruby, and especially Rails, a technique called Object Relational Mapping is frequently used to create interfaces that allow Ruby objects to stand in for database concepts such as tables and columns.

The ORM translates between program objects and the database (source)

ORM and design patterns derived from it are common features of any web developer’s life today. Active Record, a core Rails gem implementing the design pattern of the same name, has been an important (if somewhat maligned) part of the frameworks success and appeal to new developers. Using Active Record and other similar libraries you can leverage the power of SQL and dedicated database software to it without needing to learn a suite of new language languages.

ORM for the Semantic Web?

There has been work done applying this pattern to the Semantic Web in the form of the ActiveRDF gem, although it hasn’t been updated in quite some time. Maybe it will be picked up again one day, but the reality is that impedance mismatch, where the fundamental differences between object oriented and relational database concepts creates ambiguity and information loss, poses a serious problem for any attempt to build this such a tool. Still, one of the root causes of this problem, that the schema of relational databases are far more constrained than those of OO objects, is somewhat mitigated for the Semantic Web, since RDF representations can often be less constraining than typical OO structures. So there is hope that a useful general mapping tool will emerge in time, but it’s going to take some doing.

Hierarchies and abstraction are a problem for relational DBs, but they’re core concepts in the OWL Semantic Web language (source)

Despite the challenges, having an object oriented interface to your database structures makes them a lot more accessible to other programmers, and helps reduces the upfront cost of picking up a new format and new software, so it has always been a part of the plan to implement such an interface for my project this summer. Fortunately the Data Cube vocabulary offers a much more constrained environment to work in than RDF in general, so creating an object interface is actually quite feasible.

Data Mapping the Data Cube

To begin with, I created a simple wrapper class for the generator module, using instance variables for various elements of the vocabulary, and a second class to represent individual observations.

Observations are added as simple hashes, as shown in this cucumber feature:

With the basic structure completed, I hooked up the generator module, which can be accessed by calling the “to_n3” command on your Data Cube object.

Having an object based interface also makes running validations at different points easier as well. Currently you can specify the <#@ validate_each? @#> option when you create your DataCube object, which if set to true will ensure that the fields in your observations match up with the measures and dimensions you’ve defined. If you don’t set this option, your data will be validated when you call the to_n3 method.

You can also include metadata about your dataset such as author, subject, and publisher, which are now added during the to_n3 method.

The other side of an interface such as this is being able to construct an object from an RDF graph. Although my work in this area hasn’t progressed as far, it is possible to automatically create an instance of the object using the ORM::Datacube.load method, either on a turtle file or an endpoint URL:

All of this is supported by a number of new modules that assist with querying, parsing, and analyzing datasets, and were designed to be useful with or without the object model. There’s still a lot to be done on this part of the project; you should be able to delete dimensions as well as add them, and more validations and inference methods need to be implemented, but most of all there needs to be a more direct mapping of methods and objects to SPARQL queries. Not only does would this conform better to the idea of an ORM pattern, but it will also allow large datasets to be handled much more easily, as loading every observation at once can take a long time and may run up against memory limits for some datasets.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s