When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning

16 Dec 2023  ·  Qihang Ai, Jianwu Zhou, Haiyun Jiang, Lemao Liu, Shuming Shi ·

Graph data is ubiquitous in the physical world, and it has always been a challenge to efficiently model graph structures using a unified paradigm for the understanding and reasoning on various graphs. Moreover, in the era of large language models, integrating complex graph information into text sequences has become exceptionally difficult, which hinders the ability to interact with graph data through natural language instructions.The paper presents a new paradigm for understanding and reasoning about graph data by integrating image encoding and multimodal technologies. This approach enables the comprehension of graph data through an instruction-response format, utilizing GPT-4V's advanced capabilities. The study evaluates this paradigm on various graph types, highlighting the model's strengths and weaknesses, particularly in Chinese OCR performance and complex reasoning tasks. The findings suggest new direction for enhancing graph data processing and natural language interaction.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here