One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic Representations
Oda, Chuang, Shirai et al.
Sentence embedding methods have made remarkable progress, yet they still struggle to capture the implicit semantics within sentences. This can be attributed to the inherent limitations of conventional sentence embedding methods that assign only a single vector per sentence. To overcome this limitation, we propose DualCSE, a sentence embedding method that assigns two embeddings to each sentence: one representing the explicit semantics and the other representing the implicit semantics. These embeddings coexist in the shared space, enabling the selection of the desired semantics for specific purposes such as information retrieval and text classification. Experimental results demonstrate that DualCSE can effectively encode both explicit and implicit meanings and improve the performance of the downstream task.
academic
One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic Representations
Sentence embedding methods have achieved significant progress, yet still face difficulties in capturing implicit semantics within sentences. This can be attributed to the inherent limitation of traditional sentence embedding methods, which assign only a single vector to each sentence. To overcome this limitation, this paper proposes DualCSE, a method that assigns two embeddings to each sentence: one representing explicit semantics and another representing implicit semantics. These embeddings coexist in a shared space, enabling the selection of desired semantics for specific purposes such as information retrieval and text classification. Experimental results demonstrate that DualCSE effectively encodes both explicit and implicit meanings, improving performance on downstream tasks.
Existing sentence embedding methods exhibit significant deficiencies in handling implicit semantics. Sun et al. (2025) point out that even state-of-the-art sentence embedding methods show approximately a 20% performance gap between explicit and implicit semantics on the MTEB classification benchmark.
Completeness of Semantic Understanding: Natural language contains both literal meanings (explicit semantics) and figurative or pragmatic meanings (implicit semantics)
Practical Application Requirements: Tasks such as information retrieval and text classification require understanding different levels of semantics
Model Limitations: Traditional methods represent sentences with only a single vector, overlooking the existence of multiple interpretations
This paper cites important works from multiple domains including sentence embeddings, natural language inference, and contrastive learning, including:
Gao et al. (2021): SimCSE method
Havaldar et al. (2025): INLI dataset
Wang et al. (2025): Implicitness scoring method
Reimers and Gurevych (2019): Sentence-BERT
Overall Assessment: This is a paper with strong technical innovation, proposing an interesting and practical dual semantic representation method. While there is room for improvement in theoretical depth and evaluation breadth, it opens new directions for sentence embedding research and possesses certain academic value and application potential.