Friday 10 December 2010

Text Mining and Linked Data Cloud

The text mining technologies and tools have been around for past decade. The way most of the text mining engines work, they require a good set of bootstrapping entities in order to perform well (w.r.t. Precision and recall). These bootstrapping entities are called Gazetteers/ Authority Files/ Lists etc. in different tools.

With the emergence of Linked data cloud and its open datasets there is a great opportunity to utilize Text Mining to achieve even better results where entities in these datasets can be utilized in bootstrapping.

Here is my take on utilizing Linked Data cloud with information extraction system GATE.
Presentation at GATE course in May, 2010.

Monday 8 March 2010

Benchmarking

This blog relates to the files mentioned in our paper:

A Pragmatic Approach to Semantic Repositories Benchmarking
Dhavalkumar Thakker, Taha Osman, Shakti Gohil, Phil Lakin

Please find the PA Dataset queries as a PDF file here.
Please find the UOBM Dataset query results here.


Please note that UOBM is the work done by these authors.

Update: The query execution timing results mentioned in the paper needs to be revised with respect to new findings about how BigOWLIM and Sesame query execution mechanism works. We will publish the updated results and findings soon.