The text mining technologies and tools have been around for past decade. The way most of the text mining engines work, they require a good set of bootstrapping entities in order to perform well (w.r.t. Precision and recall). These bootstrapping entities are called Gazetteers/ Authority Files/ Lists etc. in different tools.
With the emergence of Linked data cloud and its open datasets there is a great opportunity to utilize Text Mining to achieve even better results where entities in these datasets can be utilized in bootstrapping.
Here is my take on utilizing Linked Data cloud with information extraction system GATE.
Presentation at GATE course in May, 2010.
Friday, 10 December 2010
Monday, 8 March 2010
Benchmarking
This blog relates to the files mentioned in our paper:
A Pragmatic Approach to Semantic Repositories Benchmarking
Dhavalkumar Thakker, Taha Osman, Shakti Gohil, Phil Lakin
Please find the PA Dataset queries as a PDF file here.
Please find the UOBM Dataset query results here.
Please note that UOBM is the work done by these authors.
A Pragmatic Approach to Semantic Repositories Benchmarking
Dhavalkumar Thakker, Taha Osman, Shakti Gohil, Phil Lakin
Please find the PA Dataset queries as a PDF file here.
Please find the UOBM Dataset query results here.
Please note that UOBM is the work done by these authors.
Update: The query execution timing results mentioned in the paper needs to be revised with respect to new findings about how BigOWLIM and Sesame query execution mechanism works. We will publish the updated results and findings soon.
Subscribe to:
Posts (Atom)