Friday, 7 January 2011

Semantic Web - Story so far

As the name of this blog suggests, I am interested in realizing Semantic Web vision so that it becomes a norm technology and any web developer can utilize it in development tasks and make sense of it.

Any technology takes off when there are good tools and standards around it and when big players realize technology's potential and start using it or pushing for it. Clearly, when I started doing the research in semantic web technologies (around 2004) that was not the case.

I still remember getting confused and being too "theoritical" about "Description Logic(DL)", "Full" and "Lite" versions of OWL and which one was the best one. To add to the confusion, SWRL (Semantic Web Rule Language) was announced stating that OWL has not got enough expressivity(!).

Also when come to tools there were Jena and Pellet reasoners that were two of the main options where one had to rely on them to do wonderful (semantic) things at the Schema level. but when it came to doing the same on a serious scale (on data/instances) they all struggled big time. We looked at using MySQL as backend and these schema reasoners to do heavy lifting on the reasoning front and so on. In terms of query language there was none and we did all the application logic using custom APIs. And yes, there was no sight of Linked Data Cloud.....

Roll on year 2010 and ..things have changed...big time..

We have query languages, tools, reasoners, triple stores and above all linked data cloud. I have been around the Semantic Web community long enough to identify the key players (companies/personnel) that have made this difference and in this blog I would like to acknowledge them.


1. Original Semantic Web team of Tim Berners Lee and Jim Handler.

Tim Berners Lee for believing when we all thought Semantic web might not work and will be another AI failure. And of course for his His work at the W3C.

James Handler - in addition to his continued work on Semantic Web, for coming up with gems such as the definition of Semantics/Linked Data Cloud that is most effective. In a conversation he defined necessasity of semantics as "My document can point at your document on the Web, but my database can't point at something in your database without writing special purpose code. The Semantic Web aims at fixing that."


2. I am surprised that people in the community does not flag this up but we were not able to sell "Semantic Web" mainly due to the fact that performance of semantic tools was so "poor". There are players in the market that have addressed that and we have a great deal to thank them. From my own experience and market observation, I consider (in no particular order of preference)


the main players in this.


3. DBPedia & Linked Data Cloud

The world was waiting for a large scale semantic datasets and it arrived in terms of "Linked Data Cloud" and "DBPedia" where they had shown for the first time that millions of RDF Triples running using OWL/SKOS ontologies can be produced and managed. It not only demonstrated the practical working side of Semantic Web but also introduced a new architecture for data sharing/integration. The architecture was practical, clearly defined and easy to understand something one sarcastically can say was not associated with "Semantic Web technologies/specs" before.

For this, whole of the linked data community and DBPedia team deserves kudos.


4. I believe that any organisation that needs to take advantage of semantic technologies has to look at Text Mining at some point too because superimposing a new semantic structure is going to be cost prohibitive and text mining can be the solution in many cases. In this regards, toolkits such as GATE, UIMA or services such as OpenCalais that have incorporated support for URIs and Semantics deserves special mention.


5. As mentioned earlier, coming up with a SQL-like query language was a smart move to encourage adoption of RDF technologies into the hands of seasoned Web developers. It all made sense for them once SPARQL was out there.


6. OWL/RDF/SKOS

I know OWL has its ills and people have their reservation about it. I believe that is more to do with its presentation and efforts to portray (at least that is how it might appear to a novice of the technology) it as a specialized field. I believe anyone who have done "Object oriennted programming" and come from OOP world can easliy make sense of OWL.

RDF is already here and is going to stay.

I believe SKOS has a great potential in many E-Commerce taxonomical, and classification applications.


7. Google Refine and similar efforts.

We still have lots of data in CSVs, XLS files (see data.gov.uk) and as we do not expect everybody to use Linked Data, we need to deal with the fact that data from textual format has to be converted into RDF (or other variations) and Google has provided a solid tool for this. XLWrap is also in the same genre.


8. BBC & other case studies

BBC team to come up with the Football World cup website and talk about the technologies and processes in open which is definitely a big booster for semantic web as a product.

Case study of Best Buy, Google acquisition of Freebase are also significant events for semantic web uptake.


More to come as I would like to make it a list of 10 - probably your inputs can complete the list so comment on..

3 comments:

  1. I'm David Price of TopQuadrant Limited, a relatively new UK subsidiary of TopQuadrant Inc. Just wanted to let you know we're here in the UK now. You can download a one month evaluation of TopBraid Composer, our development environment, and Enterprise Vocabulary Net, our SKOS solution at www.topquadrant.com and contact me dprice at the email address you'd expect.

    ReplyDelete
  2. Thanks David. I know of the product offerings from TopQuardant. However to fit with this post, where do you think the work you guys are doing at the TopQuardant fits into the discussion above. I probably know the answer but will be beneficial to the reader of this blog to know about it from you.

    Thanks
    Dhaval

    ReplyDelete
  3. The work at TopQuadrant in three areas that fit into the discussion: 1) providing a Web application development tool that is agnostic wrt backend triple stores with spreadsheet, D2RQ and other importers, 2) developing SPARQL Rules in part to address reasoner performance issues and to enable transforms of the imported spreadsheets, etc into proper ontologies/data and 3) delivering a commercial product supporting SKOS (Enterprise Vocabulary Net).

    ReplyDelete