Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/6321

TítuloGeographical partition for distributed web crawling
Autor(es)Exposto, José
Macedo, Joaquim
Pina, António Manuel Silva
Alves, Albano Agostinho Gomes
Amaro, José Carlos Rufino
Palavras-chaveWeb Mining
Parallel Crawling
Web Partitioning
Data2005
EditoraAssociation for Computing Machinery
CitaçãoHerzog, Otthein [et. al], ed. lit. – “Proceedings of the 2005 ACM CIKM : International Conference on Information and Knowledge Management, Bremen, Germany, 2005." New York : ACM Press, 2005. ISBN 1-59593-140-6.
Resumo(s)This paper evaluates scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones. The work considers a distributed crawler where the assignment of pages to visit is based on page content geographical scope. For the initial assignment of a page to a partition we use a simple heuristic that marks a page within the same scope of the hosting web server geographical location. During download, if the analyze of a page contents recommends a different geographical scope, the page is forwarded to the well-located web server.A sample of the Portuguese Web pages, extracted during the year 2005, was used to evaluate: a) page download communication times and the b) overhead of pages exchange among servers. Evaluation results permit to compare our approach to conventional hash partitioning strategies.
TipoArtigo em ata de conferência
URIhttps://hdl.handle.net/1822/6321
ISBN1-59593-140-6
DOI10.1145/1096985.1096999
Arbitragem científicayes
AcessoAcesso aberto
Aparece nas coleções:DI/CCTC - Artigos (papers)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
GIR2005-exp_pina.pdfDocumento principal180,21 kBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID