Learning why do we do what we do - Understanding of human actions using AI

Recognizing what we do or what we will do has been well investigated under action recognition and anticipation literature in the Computer Vision (CV) and Machine Learning (ML) research communities. However, computational learning of why do we do what we do is not well investigated. The objective of this project is to develop Artificial Intelligent (AI) models to process videos and learn why do humans do what they do? by reducing the gap between neural and symbolic representation through novel neurosymbolic AI. These neurosymbolic AI models can see what we do and then reason about our behavior to interpret, justify, explain and understand our actions.

Codes are also available here!

Publications

	Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Dhruv Verma, Debaditya Roy, Basura Fernando International Journal of Computer Vision - IJCV 2025 (Accepted) PDF Code
	Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering Thanh-Son Nguyen, Hong Yang, Tzeh Yuan Neoh, Hao Zhang, Ee Yeo Keat, Basura Fernando Preprint PDF
	PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning Xinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou, Lingling Zhang, Jun Liu Preprint PDF
	Learning to Visually Connect Actions and their Effects Paritosh Parmar and Eric Peh and Basura Fernando WACV 2025 PDF code
	Inferring Past Human Actions in Homes with Abductive Reasoning Clement Tan Son and Chai Kiat Yeo and Cheston Tan and Basura Fernando WACV 2025 PDF code
	Effective Scene Graph Generation by Statistical Relation Distillation Nguyen Thanh Son and Hong Yang and Basura Fernando WACV 2025 PDF code
	Situational Scene Graph for Structured Human-centric Situation Understanding Chinthani Sugandhika and Chen Li and Deepu Rajan and Basura Fernando WACV 2025 PDF code
	Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference Yeo Keat Ee and Hao Zhang and Alexander Matyasko and Basura Fernando WACV 2025 PDF code
	CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes Paritosh Parmar, Eric Peh, Ruirui Chen, Ting En Lam, Yuhan Chen, Elston Tan, Basura Fernando NeurIPS 2024 (Accepted) PDF code
	Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Shantanu Jaiswal, Debaditya Roy, Basura Fernando, Cheston Tan NeurIPS 2024 (Accepted) PDF code (soon)
	RCA: Region Conditioned Adaptation for Visual Abductive Reasoning Hao Zhang and Yeo Keat Ee and Basura Fernando ACM MM (Accepted 2024) PDF code
	Predicting the Next Action by Modeling the Abstract Goal Debaditya Roy and Basura Fernando ICPR (Accepted 2024) PDF code
	Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion Ishaan Singh Rawal, Alexander Matyasko, Shantanu Jaiswal, Basura Fernando, Cheston Tan ICML (Accepted 2024) PDF code
	Who are you referring to? Coreference resolution in image narrations Arushi Goel and Basura Fernando and Frank Keller and Hakan Bilen International Conference on Computer Vision - ICCV (2023) PDF CIN Dataset Code Bibtex @inproceedings{arushi2023, title={Who are you referring to? Coreference resolution in image narrations}, author={Arushi Goel and Basura Fernando and Frank Keller and Hakan Bilen}, booktitle={International Conference on Computer Vision 2023}, pages={}, year={2023} }
	Semi-supervised multimodal coreference resolution in image narrations Arushi Goel and Basura Fernando and Frank Keller and Hakan Bilen Empirical Methods in Natural Language Processing - EMNLP (2023) PDF Bibtex @inproceedings{emnlp2023, title={Semi-supervised multimodal coreference resolution in image narrations}, author={Arushi Goel and Basura Fernando and Frank Keller and Hakan Bilen}, booktitle={Empirical Methods in Natural Language Processing 2023}, pages={}, year={2023} }
	Energy-based Self-Training and Normalization for Unsupervised Domain Adaptation Samitha Herath, Basura Fernando, Ehsan Abbasnejad, Munawar Hayat, Shahram Khadivi, Mehrtash Harandi, Hamid Rezatofighi, and Reza Haffari International Conference on Computer Vision - ICCV (2023) PDF Bibtex @inproceedings{samith2023, title={Energy-based Self-Training and Normalization for Unsupervised Domain Adaptation}, author={Samitha Herath and Basura Fernando and Ehsan Abbasnejad and Munawar Hayat and Shahram Khadivi and Mehrtash Harandi and Hamid Rezatofighi and Reza Haffari}, booktitle={International Conference on Computer Vision 2023}, pages={}, year={2023} }
	ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition Debaditya Roy and Dhruv Verma and Basura Fernando IEEE/CVF Winter Conference on Applications of Computer Vision - WACV (2024) Best results in SWiG - 2024 Best results in imSitu - 2024 Code PDF Bibtex @inproceedings{clipsitu2024, title={ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition}, author={Debaditya Roy and Dhruv Verma and Basura Fernando}, booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision WACV 2024}, pages={}, year={2024} }
	A Region-Prompted Adapter Tuning for Visual Abductive Reasoning Hao Zhang, Yeo Keat Ee, Basura Fernando Preprint PDF
	Abductive Action Inference Clement Tan and Chai Kiat Yeo and Cheston Tan and Basura Fernando Preprint PDF
	Learning to Visually Connect Actions and their Effects Eric Peh and Paritosh Parmar and Basura Fernando Preprint PDF

Learning why do we do what we do - Understanding human actions using Neurosymbolic AI

This research is supported by the National Research Foundation Fellowship. Duration : April 2022 to March 2027.

Publications

Past and Current Team Members