We are happy to release MMBench-GUI, a hierarchical, multi-platform benchmark framework and toolbox, to evaluate GUI agents. MMBench-GUI is comprising four evaluation levels: GUI Content Understanding ...
This repository contains scripts to set up a workflow using Python for the three cases in the SPE11 project, and to reproduce the sumbitted results from the OPM team published in the SPE11 benchmark ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results