Enhancing URL Normalization Using Metadata of Web Pages

Citation

Soon, Lay-Ki and Lee, Sang Ho (2008) Enhancing URL Normalization Using Metadata of Web Pages. In: International Conference on Computer and Electrical Engineering, 2008. ICCEE 2008. IEEE, pp. 331-335. ISBN 978-0-7695-3504-3

[img] Text
04741001.pdf - Published Version
Restricted to Repository staff only

Download (235kB)

Abstract

In this paper, we present our proposed method of incorporating metadata of Web pages to identify equivalent URLs in addition to the standard URL normalization methodology. The metadata considered are the page size and the body text of Web pages. These metadata can be obtained during HTML parsing in the process of crawling without incurring unnecessary cost. Our experiment shows an accuracy of up to 95.38% in identifying equivalent URLs by using the body text of Web pages.

Item Type: Book Section
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 14 Nov 2013 07:01
Last Modified: 14 Nov 2013 07:01
URII: http://shdl.mmu.edu.my/id/eprint/4409

Downloads

Downloads per month over past year

View ItemEdit (login required)