Adhiraj Ghosh

Seeking: Research internships and PhD positions!

I am a second-year MSc student in Machine Learning at the University of Tübingen. I focus my research on multimodal deep learning, especially in the context of Vision-Language Representation Learning. Currently, I am at Bethge Lab, working on the holistic understanding of Vision-Language models. Previously, I worked on visualising figurative speech at the Computer Graphics Group, which led to an Outstanding Paper award at EMNLP 2023.

Before starting my master's, I used to be a Computer Vision Researcher at the Center of Artificial Intelligence,ZHAW, working on domain adaptation in Optical Music Recognition. I have also worked with Dr. Daniel Lin Wen-Yan at SMU on feature correspondence-based object tracking. I studied Electrical and Electronics Engineering for my BSc in Manipal/Singapore.

I am very eager to collaborate on relevant projects, so please reach out if you are interested!

Email  /  CV  /  Google Scholar  /  Github  /  LinkedIn  /  Twitter  /  YouTube

profile photo

Recent News

Apr 2024 : New paper on arXiv. Check out coverage by Computerphile and AI 'N Stuff!
Dec 2023 : ViPE awarded outstanding paper at EMNLP!
Oct 2023 : 1 paper accepted at EMNLP 2023(main conference).
Sep 2023 : Work on Real World Music Object Recognition published in TISMIR.
Mar 2023 : Started working in the Tubingen AI Centre in Dr. Hendrik Lensch's group.
Oct 2022 : Moved to Germany! Started my MSc at the University of Tübingen.
Aug 2022 : RPTM accepted for oral presentation at WACV 2023. Check out the paper and SOTA comparisons!

Work Experience

Mar 2023 - Sep 2023: Research Assistant at the Computer Graphics group, Tübingen AI Centre.
May 2021 - Aug 2022: Computer Vision Researcher, Zürich University of Applied Sciences.
Jan 2020 - Dec 2020: Visiting Researcher, Singapore Management University
Jun 2018 - Aug 2019 : Undergraduate Research Intern, Jadavpur University.

Publications
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao*, Ameya Prabhu*, Adhiraj Ghosh, Yash Sharma, Philip H.S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge.
arXiv:2404.04125, 2024
Paper / Code / Let It Wag! Benchmark

The impressive empirical performance of VLMs is attributed to test concepts within their pretraining datasets, thus not showcasing "zero-shot" generalization. Instead, they need exponentially more data on a concept to linearly improve performance.

ViPE: Visualise Pretty-much Everything
Hassan Shahmohammadi, Adhiraj Ghosh, Hendrik Lensch.
EMNLP 2023 (Outstanding Paper Award)
Paper / Code / Dataset / HuggingFace / Music Videos

ViPE is the first automated model for translating any arbitrary piece of text into a visualisable prompt. It helps any text-to-image model in figurative or non-lexical language visualisations.

Real World Music Object Recognition
Adhiraj Ghosh*,Lukas Tuggener*, Raphael Emberger*, Pascal Sager*, et al.
TISMIR 2023
Paper / Code

We present solutions to improve recognition accuracy in Music Object Recognition on low-quality, real-world music sheet data and provide confidence-rated model outputs to enable efficient human post-processing.

Relation Preserving Triplet Mining for Stabilising the Triplet Loss in Re-identification Systems
Adhiraj Ghosh, Kuruparan Shanmugalingam, Wen-Yan Lin
WACV 2023
Paper / Code / Video / Poster

We propose a new, feature-guided triplet mining scheme for understanding intrinsic pose to solve the intra-class variance problem in re-identification datasets.

Irony Detection in Bengali Tweets: A New Dataset, Experimentation and Results
Adhiraj Ghosh, Kamal Sarkar
ICCIDS 2020
Paper / Dataset

This paper presents the description of the Bengali irony detection dataset developed by us and reports results obtained on our Bengali irony dataset using SOTA machine learning methodologies.