Friday, 7 January 2011

Semantic Web - Story so far

As the name of this blog suggests, I am interested in realizing Semantic Web vision so that it becomes a norm technology and any web developer can utilize it in development tasks and make sense of it.

Any technology takes off when there are good tools and standards around it and when big players realize technology's potential and start using it or pushing for it. Clearly, when I started doing the research in semantic web technologies (around 2004) that was not the case.

I still remember getting confused and being too "theoritical" about "Description Logic(DL)", "Full" and "Lite" versions of OWL and which one was the best one. To add to the confusion, SWRL (Semantic Web Rule Language) was announced stating that OWL has not got enough expressivity(!).

Also when come to tools there were Jena and Pellet reasoners that were two of the main options where one had to rely on them to do wonderful (semantic) things at the Schema level. but when it came to doing the same on a serious scale (on data/instances) they all struggled big time. We looked at using MySQL as backend and these schema reasoners to do heavy lifting on the reasoning front and so on. In terms of query language there was none and we did all the application logic using custom APIs. And yes, there was no sight of Linked Data Cloud.....

Roll on year 2010 and ..things have changed...big time..

We have query languages, tools, reasoners, triple stores and above all linked data cloud. I have been around the Semantic Web community long enough to identify the key players (companies/personnel) that have made this difference and in this blog I would like to acknowledge them.


1. Original Semantic Web team of Tim Berners Lee and Jim Handler.

Tim Berners Lee for believing when we all thought Semantic web might not work and will be another AI failure. And of course for his His work at the W3C.

James Handler - in addition to his continued work on Semantic Web, for coming up with gems such as the definition of Semantics/Linked Data Cloud that is most effective. In a conversation he defined necessasity of semantics as "My document can point at your document on the Web, but my database can't point at something in your database without writing special purpose code. The Semantic Web aims at fixing that."


2. I am surprised that people in the community does not flag this up but we were not able to sell "Semantic Web" mainly due to the fact that performance of semantic tools was so "poor". There are players in the market that have addressed that and we have a great deal to thank them. From my own experience and market observation, I consider (in no particular order of preference)


the main players in this.


3. DBPedia & Linked Data Cloud

The world was waiting for a large scale semantic datasets and it arrived in terms of "Linked Data Cloud" and "DBPedia" where they had shown for the first time that millions of RDF Triples running using OWL/SKOS ontologies can be produced and managed. It not only demonstrated the practical working side of Semantic Web but also introduced a new architecture for data sharing/integration. The architecture was practical, clearly defined and easy to understand something one sarcastically can say was not associated with "Semantic Web technologies/specs" before.

For this, whole of the linked data community and DBPedia team deserves kudos.


4. I believe that any organisation that needs to take advantage of semantic technologies has to look at Text Mining at some point too because superimposing a new semantic structure is going to be cost prohibitive and text mining can be the solution in many cases. In this regards, toolkits such as GATE, UIMA or services such as OpenCalais that have incorporated support for URIs and Semantics deserves special mention.


5. As mentioned earlier, coming up with a SQL-like query language was a smart move to encourage adoption of RDF technologies into the hands of seasoned Web developers. It all made sense for them once SPARQL was out there.


6. OWL/RDF/SKOS

I know OWL has its ills and people have their reservation about it. I believe that is more to do with its presentation and efforts to portray (at least that is how it might appear to a novice of the technology) it as a specialized field. I believe anyone who have done "Object oriennted programming" and come from OOP world can easliy make sense of OWL.

RDF is already here and is going to stay.

I believe SKOS has a great potential in many E-Commerce taxonomical, and classification applications.


7. Google Refine and similar efforts.

We still have lots of data in CSVs, XLS files (see data.gov.uk) and as we do not expect everybody to use Linked Data, we need to deal with the fact that data from textual format has to be converted into RDF (or other variations) and Google has provided a solid tool for this. XLWrap is also in the same genre.


8. BBC & other case studies

BBC team to come up with the Football World cup website and talk about the technologies and processes in open which is definitely a big booster for semantic web as a product.

Case study of Best Buy, Google acquisition of Freebase are also significant events for semantic web uptake.


More to come as I would like to make it a list of 10 - probably your inputs can complete the list so comment on..

Friday, 10 December 2010

Text Mining and Linked Data Cloud

The text mining technologies and tools have been around for past decade. The way most of the text mining engines work, they require a good set of bootstrapping entities in order to perform well (w.r.t. Precision and recall). These bootstrapping entities are called Gazetteers/ Authority Files/ Lists etc. in different tools.

With the emergence of Linked data cloud and its open datasets there is a great opportunity to utilize Text Mining to achieve even better results where entities in these datasets can be utilized in bootstrapping.

Here is my take on utilizing Linked Data cloud with information extraction system GATE.
Presentation at GATE course in May, 2010.

Monday, 8 March 2010

Benchmarking

This blog relates to the files mentioned in our paper:

A Pragmatic Approach to Semantic Repositories Benchmarking
Dhavalkumar Thakker, Taha Osman, Shakti Gohil, Phil Lakin

Please find the PA Dataset queries as a PDF file here.
Please find the UOBM Dataset query results here.


Please note that UOBM is the work done by these authors.

Update: The query execution timing results mentioned in the paper needs to be revised with respect to new findings about how BigOWLIM and Sesame query execution mechanism works. We will publish the updated results and findings soon.

Wednesday, 2 December 2009

SPARQL - extension




I know that there is a working group that is looking for extending the current SPARQL specification/protocol. There are also some discussions outside the working group that outlines requirements for such an extension.

As a contribution to these efforts, in this post I want to start outlining things ( queries ) that are not possible to write using SPARQL in its current state. Requirement of such query syntax and support stems from demands of practical Semantic Web development work. Here are some examples. I will appreciate your comments if I am incorrectly assuming that these sort of queries are not possible to write.

1. A query to verify the cardinality of property values.
For example, in DBPedia, a Band (<http://dbpedia.org/ontology/Band>) has property <http://dbpedia.org/ontology/genre> with no cardinality restrictions (i.e. Band can have any number of genres).

Using sparql one can not extract Bands which has more than x number of genres?

The following SPARQL query is not possible to write and execute:

Select ?band
where
{
?band rdf:type <http://dbpedia.org/ontology/Band> .
FILTER (COUNT(?genre) > 3).
}

2. Support for negation. It is well known that SPARQL does not support negation. It is obvious that negation is an important feature to have, I will demonstrate the usefulness of negation using one scenario.

In DBPedia ontology, the classification hierarchy for "Organisation" is as follows:


If you are mapping your ontology to DBPedia's and if you have a slightly different hierarchy where "SportsTeam" is not a subclass of "Organisation" then
in order to retrieve all the instances of Organisation class that are not "SportsTeam" you will need to use negation (ALL(Organisation)-ALL(SportsTeam)).

The alternative is cumbersome solution where the query will involve retrieving instances of "Organisation"subclasses except SportsTeam and then merging them. For example,

Regex support in CONSTRUCT queries:

One of the purposes of CONSTRUCT queries is to map ontologies and datasets. It will be quite useful to be able to specify regex based URI patterns as part of the CONSTRUCT queries. Particularly applicable where one wants to use the unique & human readbale identifiers from DBPedia. For example, it will be useful to have something like:

CONSTRUCT
{
<http://myurischeme/resource/"unique id from DBPedia URI"> rdf:type myOntology:myType.
}
WHERE
{
<http://dbpedia.org/resource/ABC> rdf:type <http://dbpedia.org/ontology/Person>
}

Friday, 17 July 2009

JAPE grammar tutorial

We are pleased to share the JAPE grammar tutorial we have designed here at Press Association Images, UK. The tutorial can be accessed by clicking here. It is a Zip file including tutorial in PDF formant and few support files.

Please send your comments through this blog...

Tuesday, 7 July 2009

Converting from one version of semantic data to another (.N3, N-Triples, RDF/XML, OWL)

I have found a very useful little tool that allows converting data (schema/instaces) to and from .N3, N-Triples, RDF/XML and OWL. This utility is particularly handy when your RDF stores accepts schemas in particular format only (in majority of cases N3/N-Triples). Here is the link to the tool and syntax which allows you to do so ( It took me sometime to find the exact syntax so I hope it will save some of your time).

http://www.w3.org/2000/10/swap/doc/cwm.html
&
http://infomesh.net/2001/cwm/

syntax to convert from RDF/OWL to N3.

python cwm.py --rdf file-to-be-converted.owl --n3 > file-as-output.n3


or even better....
you can use Protege :)
Open .owl file in protege and go to File -> SaveAs and save it as .N3 or turtle. So easy, isnt it?

Thursday, 18 June 2009

Ontology mapping using SPARQL (Triples extraction from DBPedia)

I have been using SPARQL CONSTRUCT for ontology mapping. One of the problems, I have encountered during my work is, how do I get around BNode in the output graph, as BNode are ugly, not shareable (if thats right word!) and you can say not at all useful here. What you want in the target graph is a clear URI, as you might want to share this new URI with other graphs you extract from different source ontologies.

I stumbled on this post, which have similar conclusion. Somewhere in the post the author mentions that,

"it appears to be impossible to create new URIs for the resources in the target ontology - only bnodes can be created on the fly."

and they had to use some post-processing script to manage bnodes. I am still working on finding a workaround, if I can not then I will have to conclude the same.

Based on my findings, I will update this post soon....