Tuesday 9 August 2011

Lesson Learned : Always use "_" for Ontology Concepts

While Defining ontological concepts - I am finding out building an app it is always better to use "_" pattern. For example, "Anatomical_Structure" instead of often  used "AnatomicalStructure". It is so much convenient for Text Mining Systems and for building interfaces.


More later...

Wednesday 13 April 2011

Sentiment Analysis on Social Media

This is an interesting article on the topic. Here is one of the quotes that I think is blunt but spot on.

"assuming that you manage to get cleaner data from twitter..


then you have to figure out what people actually are saying “in whatever language it is they’re writing in. 


"We’ve put years and years of work to understand grammar, capitalization, and crap like that, and there ain’t none.”

Monday 14 February 2011

Ontology 101 using Protege

As part of my work, I have created a basic tutorial on using Protege for creating an ontology that I am happy to share here

Thursday 10 February 2011

Linked Data Cloud for the Local Goverments

I was listening to the latest episode in the row between my home council (http://www.nottinghamcity.gov.uk/index.aspx?articleid=1) and the Communities Secretary Eric Pickles(http://en.wikipedia.org/wiki/Eric_Pickles) where Eric Pickles called the council Deputy whose name is Graham Chapman (remember Monty Python ?) "very naugthy boy".  The reason for such comments from Mr Pickles is because the Communities Secretary has urged all councils to publish the expenditure details as part of a "government revolution in transparency", although they will not be forced to do so by law. Nottingham City Council is I think one of the very few who hasn't obliged. Here is what Mr Chapman has to say: "It costs virtually nothing - a couple of thousand pounds - to put it online. It costs another couple of thousand pounds to keep it going online. It's likely to cost about £100,000 to service it.


The jokes and politics (or politics as joke!!) aside, here is a great chance for local governments to take part in the Open Government Data movement and specifically expose the data using Linked Data Cloud way. It might have similar costs to put in online but the service costs can be significantly reduced and can allow the developers in the wild to generate interesting statistics and applications out of it.  



Thursday 3 February 2011

JAPE grammar - performance bottleneck

({Token})* pattern in the JAPE grammar can severely hit performance especially while performing Text Mining on large piece of text.  As advised in the GATE user guide, if the if you can predict that you won’t need to recognise a string of Tokens longer than x. Then it is possible to utilize 
({Token})[0,x]


However when it is not possible to predict it, are there any workaround? I am trying to figure out one.
If anyone else can suggest one?

Friday 7 January 2011

Semantic Web - Story so far

As the name of this blog suggests, I am interested in realizing Semantic Web vision so that it becomes a norm technology and any web developer can utilize it in development tasks and make sense of it.

Any technology takes off when there are good tools and standards around it and when big players realize technology's potential and start using it or pushing for it. Clearly, when I started doing the research in semantic web technologies (around 2004) that was not the case.

I still remember getting confused and being too "theoritical" about "Description Logic(DL)", "Full" and "Lite" versions of OWL and which one was the best one. To add to the confusion, SWRL (Semantic Web Rule Language) was announced stating that OWL has not got enough expressivity(!).

Also when come to tools there were Jena and Pellet reasoners that were two of the main options where one had to rely on them to do wonderful (semantic) things at the Schema level. but when it came to doing the same on a serious scale (on data/instances) they all struggled big time. We looked at using MySQL as backend and these schema reasoners to do heavy lifting on the reasoning front and so on. In terms of query language there was none and we did all the application logic using custom APIs. And yes, there was no sight of Linked Data Cloud.....

Roll on year 2010 and ..things have changed...big time..

We have query languages, tools, reasoners, triple stores and above all linked data cloud. I have been around the Semantic Web community long enough to identify the key players (companies/personnel) that have made this difference and in this blog I would like to acknowledge them.


1. Original Semantic Web team of Tim Berners Lee and Jim Handler.

Tim Berners Lee for believing when we all thought Semantic web might not work and will be another AI failure. And of course for his His work at the W3C.

James Handler - in addition to his continued work on Semantic Web, for coming up with gems such as the definition of Semantics/Linked Data Cloud that is most effective. In a conversation he defined necessasity of semantics as "My document can point at your document on the Web, but my database can't point at something in your database without writing special purpose code. The Semantic Web aims at fixing that."


2. I am surprised that people in the community does not flag this up but we were not able to sell "Semantic Web" mainly due to the fact that performance of semantic tools was so "poor". There are players in the market that have addressed that and we have a great deal to thank them. From my own experience and market observation, I consider (in no particular order of preference)


the main players in this.


3. DBPedia & Linked Data Cloud

The world was waiting for a large scale semantic datasets and it arrived in terms of "Linked Data Cloud" and "DBPedia" where they had shown for the first time that millions of RDF Triples running using OWL/SKOS ontologies can be produced and managed. It not only demonstrated the practical working side of Semantic Web but also introduced a new architecture for data sharing/integration. The architecture was practical, clearly defined and easy to understand something one sarcastically can say was not associated with "Semantic Web technologies/specs" before.

For this, whole of the linked data community and DBPedia team deserves kudos.


4. I believe that any organisation that needs to take advantage of semantic technologies has to look at Text Mining at some point too because superimposing a new semantic structure is going to be cost prohibitive and text mining can be the solution in many cases. In this regards, toolkits such as GATE, UIMA or services such as OpenCalais that have incorporated support for URIs and Semantics deserves special mention.


5. As mentioned earlier, coming up with a SQL-like query language was a smart move to encourage adoption of RDF technologies into the hands of seasoned Web developers. It all made sense for them once SPARQL was out there.


6. OWL/RDF/SKOS

I know OWL has its ills and people have their reservation about it. I believe that is more to do with its presentation and efforts to portray (at least that is how it might appear to a novice of the technology) it as a specialized field. I believe anyone who have done "Object oriennted programming" and come from OOP world can easliy make sense of OWL.

RDF is already here and is going to stay.

I believe SKOS has a great potential in many E-Commerce taxonomical, and classification applications.


7. Google Refine and similar efforts.

We still have lots of data in CSVs, XLS files (see data.gov.uk) and as we do not expect everybody to use Linked Data, we need to deal with the fact that data from textual format has to be converted into RDF (or other variations) and Google has provided a solid tool for this. XLWrap is also in the same genre.


8. BBC & other case studies

BBC team to come up with the Football World cup website and talk about the technologies and processes in open which is definitely a big booster for semantic web as a product.

Case study of Best Buy, Google acquisition of Freebase are also significant events for semantic web uptake.


More to come as I would like to make it a list of 10 - probably your inputs can complete the list so comment on..