no code implementations • 19 Oct 2023 • Kaya Stechly, Matthew Marquez, Subbarao Kambhampati
The study seems to indicate that (i) LLMs are bad at solving graph coloring instances (ii) they are no better at verifying a solution--and thus are not effective in iterative modes with LLMs critiquing LLM-generated solutions (iii) the correctness and content of the criticisms--whether by LLMs or external solvers--seems largely irrelevant to the performance of iterative prompting.
no code implementations • 12 Oct 2023 • Karthik Valmeekam, Matthew Marquez, Subbarao Kambhampati
We evaluate a planning system that employs LLMs for both plan generation and verification.
2 code implementations • 25 May 2023 • Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati
We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers.
no code implementations • 13 Feb 2023 • Karthik Valmeekam, Sarath Sreedharan, Matthew Marquez, Alberto Olmo, Subbarao Kambhampati
On this benchmark, we evaluate LLMs in three modes: autonomous, heuristic and human-in-the-loop.
no code implementations • 27 Oct 2022 • Utkarsh Soni, Nupur Thakur, Sarath Sreedharan, Lin Guan, Mudit Verma, Matthew Marquez, Subbarao Kambhampati
If the relevant concept is not in the shared vocabulary, then it is learned.
2 code implementations • NeurIPS 2023 • Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati
PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.