Dense Unsupervised Learning for Video Segmentation

Dataset

We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train such a model results in a degenerate solution. We propose to prevent this with a simple regularisation scheme, accommodating the equivariance property of the segmentation task to similarity transformations. Our training objective admits efficient implementation and exhibits fast training convergence. On established VOS benchmarks, our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.

Identifier
Source	https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3365.2
Metadata Access	https://tudatalib.ulb.tu-darmstadt.de/oai/openairedata?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:tudatalib.ulb.tu-darmstadt.de:tudatalib/3365.2

Provenance
Creator	Araslanov, Nikita; Schaub-Meyer, Simone; Roth, Stefan
Publisher	TU Darmstadt
Contributor	European Commission; TU Darmstadt
Publication Year	2021
Funding Reference	European Commission info:eu-repo/grantAgreement/EC/H2020/866008
Rights	Apache License 2.0; info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	https://tudatalib.ulb.tu-darmstadt.de/page/contact

Representation
Language	English
Resource Type	Software
Format	application/zip; application/gzip
Discipline	Other