ViTOR: Learning to Rank Webpages Based on Visual Features

The visual appearance of a webpage carries valuable information about page’s quality and can be used to improve the performance of learning to rank (LTR). We introduce the Visual learning TO Rank (ViTOR) model that integrates state-of-the-art visual features extraction methods: (i) transfer learning from a pre-trained image classification model, and (ii) synthetic saliency heatmaps generated from webpage snapshots. Since there is currently no public dataset available for the task of LTR with visual features, we also introduce and release the ViTOR dataset, containing visually rich and diverse webpages. The ViTOR dataset consists of visual snapshots, non-visual features and relevance judgments for ClueWeb12 webpages and TREC Web Track queries. We experiment with the proposed ViTOR model on the newly introduced ViTOR dataset and show that our model significantly improves the performance of LTR with visual features.

Identifier
DOI https://doi.org/10.17026/dans-xah-fkcq
PID https://nbn-resolving.org/urn:nbn:nl:ui:13-ai-5j8y
Related Identifier http://www2019.thewebconf.org
Related Identifier https://lemurproject.org/clueweb12/
Related Identifier https://trec.nist.gov/data/web2013.html
Related Identifier https://trec.nist.gov/data/web2014.html
Metadata Access https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:115980
Provenance
Creator Akker, B van den; Markov, I; Rijke, M de
Publisher Data Archiving and Networked Services (DANS)
Contributor University of Amsterdam
Publication Year 2023
Rights info:eu-repo/semantics/openAccess; License: https://creativecommons.org/licenses/by-nd/4.0/; https://creativecommons.org/licenses/by-nd/4.0/
OpenAccess true
Representation
Resource Type Dataset
Discipline Other