Improving multi-term topics focused crawling by introducing term Frequency-Information Content (TF-IC) measure

Citation

Pesaranghader, Ahmad and Pesaranghader, Ali and Mustapha, Norwati and Mohd Sharef, Nurfadhlina (2013) Improving multi-term topics focused crawling by introducing term Frequency-Information Content (TF-IC) measure. In: 3rd International Conference on Research and Innovation in Information Systems – 2013 (ICRIIS’13). IEEE Xplore, pp. 102-106. ISBN 978-1-4799-2486-8

[img] Text
Improving multi-term topics focused crawling by introducing term Frequency-Information Content (TF-IC) measure.pdf
Restricted to Repository staff only

Download (519kB)

Abstract

By rapid growth of the Internet, finding desirable information would be a challenging and time consuming task. In order to tackle this issue, focused crawlers, as the ideal solution, through mining of the Web, help us to find web pages closely relevant to the desired information. For this purpose, a variety of methods are devised and implemented. Nonetheless, the majority of these methods do not favor more informative terms in a given multi-term topic. In this paper, we propose a new measure called Term Frequency-Information Content (TF-IC) to prioritize terms in a multi-term topic accordingly. Through conducted experiments, we compare our measure against both Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI) measures applied in focused crawlers. Experimental results indicate superiority of our measure over TF-IDF and LSI for collecting more relevant web pages of both general and specialized multi-term topics. © 2013 IEEE.

Item Type: Book Section
Subjects: Q Science > Q Science (General)
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 07 May 2014 05:26
Last Modified: 07 May 2014 05:26
URII: http://shdl.mmu.edu.my/id/eprint/5481

Downloads

Downloads per month over past year

View ItemEdit (login required)