1 code implementation • 24 Feb 2024 • Wentao Mo, Yang Liu
In 3D Visual Question Answering (3D VQA), the scarcity of fully annotated data and limited visual content diversity hampers the generalization to novel scenes and 3D concepts (e. g., only around 800 scenes are utilized in ScanQA and SQA dataset).