Combining text and vision in compound semantics: Towards a cognitively plausible multimodal model

Dataset

DOI PID

In the current state-of-the art distributionalsemantics model of the meaning of noun-noun compounds (such aschainsaw, but-terfly, home phone),CAOSS(Marelli et al.2017), the semantic vectors of the individ-ual constituents are combined, and enrichedby position-specific information for each con-stituent in its role as either modifier or head. Most recently there have been attempts to in-clude vision-based embeddings in these mod-els (G ̈unther et al., 2020b), using the linear ar-chitecture implemented in theCAOSSmodel.In the present paper, we extend this line ofresearch and demonstrate that moving to non-linear models improves the results for visionwhile linear models are a good choice for text.Simply concatenating text and vision vectorsdoes not currently (yet) improve the predictionof human behavioral data over models usingtext- and vision-based measures separately.

Identifier
DOI	https://doi.org/10.17026/dans-xdp-3qhj
PID	https://nbn-resolving.org/urn:nbn:nl:ui:13-gb-vwz6
Metadata Access	https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:219229

Provenance
Creator	Gupta, ABHIJEET
Publisher	Data Archiving and Networked Services (DANS)
Contributor	Gupta, ABHIJEET; Dr Abhijeet Gupta (Heinrich-Heine-Universität Düsseldorf )
Publication Year	2021
Rights	info:eu-repo/semantics/openAccess; License: http://creativecommons.org/publicdomain/zero/1.0; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true

Representation
Language	English
Resource Type	Dataset
Format	text/plain
Discipline	Other