Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/19420

TítuloEfficient partitioning strategies for distributed web crawling
Autor(es)Exposto, José
Macedo, Joaquim
Pina, António Manuel Silva
Alves, Albano Agostinho Gomes
Rufino, José
Palavras-chaveInformation retrieval
Multi-objective
Crawling
Web space
Data2008
EditoraSpringer
Resumo(s)This paper presents a multi-objective approach to Web space partitioning, aimed to improve distributed crawling efficiency. The investigation is supported by the construction of two different weighted graphs. The first is used to model the topological communication infrastructure between crawlers and Web servers and the second is used to represent the amount of link connections between servers' pages. The values of the graph edges represent, respectively, computed RTTs and pages links between nodes. The two graphs are further combined, using a multi-objective partitioning algorithm, to support Web space partitioning and load allocation for an adaptable number of geographical distributed crawlers. Partitioning strategies were evaluated by varying the number of partitions (crawlers) to obtain merit figures for: i) download time, ii) exchange time and iii) relocation time. Evaluation has showed that our partitioning schemes outperform traditional hostname hash based counterparts in all evaluated metric, achieving on average 18% reduction for download time, 78% reduction for exchange time and 46% reduction for relocation time.
TipoArtigo em ata de conferência
URIhttps://hdl.handle.net/1822/19420
ISBN978-3-540-89523-7
Versão da editorahttp://www.springerlink.com/
Arbitragem científicayes
AcessoAcesso restrito UMinho
Aparece nas coleções:DI/CCTC - Artigos (papers)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
lncs-exposto-2008.pdf
Acesso restrito!
texto integral297,08 kBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID