Job Description
Research Fellow / Engineer (Vision-Language Models) - WS1
Posting Start Date:  05/05/2026
Schemes of Service:  Research
Division:  Engineering
Employment Type:  Fixed Term

As a University of Applied Learning, the Singapore Institute of Technology (SIT) works closely with industry in its research pursuits. This position is situated within the SIT x NVIDIA AI Centre (SNAIC).

This role is part of an industry innovation project with a large consumer goods company, where you will develop an evaluation framework for vision-language model (VLM) with applications in the personal care sector. The research focuses on fine-grained VLM capabilities such as spatial reasoning, temporal grounding, event tracking, and domain knowledge using a curated multimodal dataset.

 

Key Responsibilities

  • Manage the research project together with the Principal Investigator (PI) and industry partner to ensure all project deliverables are met
  • Design and implement evaluation frameworks and metrics for vision-language models
  • Develop annotated video datasets and capability-tagged evaluation tasks
  • Build end-to-end evaluation pipelines and failure mode analysis tools to analyze VLM performance across reasoning dimensions
  • Prepare technical reports, publications, and industry-facing deliverables
  • Mentor student assistants
  • Candidates are to communicate with any internal or external parties to ensure project deliverables are met.
  • Any other ad-hoc duties as assigned by Supervisor.

 

Requirements

  • PhD in Computer Science or related field
  • Expertise in computer vision and vision-language models
  • Experience with ML evaluation metrics and benchmarking
  • Proficiency in Python and deep learning frameworks (e.g., PyTorch)
  • Interest in applied, industry-collaborative research