Learning why do we do what we do - Understanding human actions using Neurosymbolic AI

This research is supported by the National Research Foundation Fellowship. Duration : April 2022 to March 2027.

Recognizing what we do or what we will do has been well investigated under action recognition and anticipation literature in the Computer Vision (CV) and Machine Learning (ML) research communities. However, computational learning of why do we do what we do is not well investigated. The objective of this project is to develop Artificial Intelligent (AI) models to process videos and learn why do humans do what they do? by reducing the gap between neural and symbolic representation through novel neurosymbolic AI. These neurosymbolic AI models can see what we do and then reason about our behavior to interpret, justify, explain and understand our actions.

Codes are also available here!

Publications

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma, Debaditya Roy, Basura Fernando
International Journal of Computer Vision - IJCV 2025 (Accepted)
PDF Code
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen, Hong Yang, Tzeh Yuan Neoh, Hao Zhang, Ee Yeo Keat, Basura Fernando
Preprint
PDF
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou, Lingling Zhang, Jun Liu
Preprint
PDF
Learning to Visually Connect Actions and their Effects
Paritosh Parmar and Eric Peh and Basura Fernando
WACV 2025
PDF code
Inferring Past Human Actions in Homes with Abductive Reasoning
Clement Tan Son and Chai Kiat Yeo and Cheston Tan and Basura Fernando
WACV 2025
PDF code
Effective Scene Graph Generation by Statistical Relation Distillation
Nguyen Thanh Son and Hong Yang and Basura Fernando
WACV 2025
PDF code
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika and Chen Li and Deepu Rajan and Basura Fernando
WACV 2025
PDF code
Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference
Yeo Keat Ee and Hao Zhang and Alexander Matyasko and Basura Fernando
WACV 2025
PDF code
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar, Eric Peh, Ruirui Chen, Ting En Lam, Yuhan Chen, Elston Tan, Basura Fernando
NeurIPS 2024 (Accepted)
PDF code
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal, Debaditya Roy, Basura Fernando, Cheston Tan
NeurIPS 2024 (Accepted)
PDF code (soon)
RCA: Region Conditioned Adaptation for Visual Abductive Reasoning
Hao Zhang and Yeo Keat Ee and Basura Fernando
ACM MM (Accepted 2024)
PDF code
Predicting the Next Action by Modeling the Abstract Goal
Debaditya Roy and Basura Fernando
ICPR (Accepted 2024)
PDF code
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
Ishaan Singh Rawal, Alexander Matyasko, Shantanu Jaiswal, Basura Fernando, Cheston Tan
ICML (Accepted 2024)
PDF code
Who are you referring to? Coreference resolution in image narrations
Arushi Goel and Basura Fernando and Frank Keller and Hakan Bilen
International Conference on Computer Vision - ICCV (2023)
PDF CIN Dataset Code Bibtex
Semi-supervised multimodal coreference resolution in image narrations
Arushi Goel and Basura Fernando and Frank Keller and Hakan Bilen
Empirical Methods in Natural Language Processing - EMNLP (2023)
PDF Bibtex
Energy-based Self-Training and Normalization for Unsupervised Domain Adaptation
Samitha Herath, Basura Fernando, Ehsan Abbasnejad, Munawar Hayat, Shahram Khadivi, Mehrtash Harandi, Hamid Rezatofighi, and Reza Haffari
International Conference on Computer Vision - ICCV (2023)
PDF Bibtex
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition
Debaditya Roy and Dhruv Verma and Basura Fernando
IEEE/CVF Winter Conference on Applications of Computer Vision - WACV (2024)
Best results in SWiG - 2024
Best results in imSitu - 2024
Code PDF Bibtex
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang, Yeo Keat Ee, Basura Fernando
Preprint
PDF
Abductive Action Inference
Clement Tan and Chai Kiat Yeo and Cheston Tan and Basura Fernando
Preprint
PDF
Learning to Visually Connect Actions and their Effects
Eric Peh and Paritosh Parmar and Basura Fernando
Preprint
PDF

Past and Current Team Members

Dr. Chen Li (Research Scientist)

Dr. Paritosh Parmar (Research Scientist)

Dr. Hao Zhang (Research Scientist)

Dr. Thanh-Son Nguyen (Research Scientist)

Dr. Alexander Matyasko (Research Scientist)

Dr. Debaditya Roy (Research Scientist)

Eric Peh (Research Engineer)

Dhruv Verma (Research Engineer)

Ee Yeo Keat (Research Engineer)

Yang Hong (Research Engineer)

Jaiswal Shantanu (Research Engineer)

PhD student - Clement Tan (NTU)

PhD student - BURTON-BARR JONATHAN WESTON (NTU)

PhD student - Chinthani Sugandhika (NTU)