|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 10134 occurrences of 4020 keywords
|
|
|
Results
Found 17418 publication records. Showing 17418 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
8 | Pouria Alikhanifard, Nikolaos Tsantalis |
A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Huiming Sun, Jiacheng Guo, Zibo Meng, Tianyun Zhang, Jianwu Fang, Yuewei Lin, Hongkai Yu |
EVD4UAV: An Altitude-Sensitive Benchmark to Evade Vehicle Detection in UAV. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Anna Varbella, Kenza Amara, Blazhe Gjorgiev, Giovanni Sansavini |
PowerGraph: A power grid benchmark dataset for graph neural networks. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yifan He, Claus Aranha |
Evolving Benchmark Functions to Compare Evolutionary Algorithms via Genetic Programming. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Haotian Xia, Zhengbang Yang, Yuqing Wang, Rhys Tracy, Yun Zhao, Dongdong Huang, Zezhi Chen, Yan Zhu, Yuan-Fang Wang, Weining Shen |
SportQA: A Benchmark for Sports Understanding in Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiaobo Guo, Soroush Vosoughi |
Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Luca Salvatore Lorello, Marco Lippi 0001, Stefano Melacci |
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi |
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xin Zhao 0012, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li |
BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Haoran Yin, Diederick Vermetten, Furong Ye, Thomas H. W. Bäck, Anna V. Kononova |
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, Wei Bi |
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu, Hua Huang |
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Nantheera Anantrasirichai, Ruirui Lin, Alexandra Malyugina, David R. Bull |
BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Abdelrahman Younes, Tamim Asfour |
KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Sadman Sadeed Omee, Nihang Fu, Rongzhi Dong, Ming Hu, Jianjun Hu |
Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zheng Li, Xiang Chen, Xiaojun Wan 0001 |
WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann |
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu Jin, Lingyao Li, Haoyang Ling, Jinkui Chi, Jindong Wang, Xin Ma, Yongfeng Zhang |
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zexuan Qiu, Jingjing Li, Shijue Huang, Wanjun Zhong, Irwin King |
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau |
Improvements & Evaluations on the MLCommons CloudMask Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | João Vitorino, Miguel Silva, Eva Maia, Isabel Praça |
An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Sayantan Adak, Daivik Agrawal, Animesh Mukherjee 0001, Somak Aditya |
GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiaowei Qian, Zhimeng Guo, Jialiang Li 0003, Haitao Mao, Bingheng Li, Suhang Wang, Yao Ma 0001 |
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | David Owen |
How predictable is language model benchmark performance? |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao |
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Mingzhe Du, Anh Tuan Luu, Bin Ji, See-Kiong Ng |
Mercury: An Efficiency Benchmark for LLM Code Synthesis. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Fangqiang Ding, Yunzhou Zhu, Xiangyu Wen, Chris Xiaoxuan Lu |
ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Nikola Bugarin, Jovana Bugaric, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto |
Unveiling the Anomalies in an Ever-Changing World: A Benchmark for Pixel-Level Anomaly Detection in Continual Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yupeng Li, Haorui He, Jin Bai, Dacheng Wen |
MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jia Li 0011, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin |
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Liang Xu, Hang Xue, Lei Zhu, Kangkang Zhao |
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, Jinyue Zhao, Wenrui Li, Yanting Chen |
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yusen Zhang |
A General Benchmark Framework is Dynamic Graph Neural Network Need. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Qusai Abu Obaidah, Muhy Eddin Za'ter, Adnan Jaljuli, Ali Mahboub, Asma Hakouz, Bashar Alfrou, Yazan Estaitia |
A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao 0001 |
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang |
OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiting Zhao, Sören Schwertfeger |
3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Lisa Mais, Peter Hirsch 0001, Claire Managan, Ramya Kandarpa, Josef Lorenz Rumberger, Annika Reinke, Lena Maier-Hein, Gudrun Ihrke, Dagmar Kainmueller |
FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yihua Zhang, Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jiancheng Liu, Xiaoming Liu, Sijia Liu 0001 |
UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Phong Nguyen-Thuan Do, Son Quoc Tran, Phu Gia Hoang, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen |
VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi |
Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Vitaliy Pozdnyakov, Aleksandr Kovalenko, Ilya Makarov, Mikhail Drobyshevskiy, Kirill Lukyanov |
Adversarial Attacks and Defenses in Automated Control Systems: A Comprehensive Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo |
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jianlin Chen |
LMStyle Benchmark: Evaluating Text Style Transfer for Chatbots. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang |
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Chuang Liu, Renren Jin, Yuqi Ren, Deyi Xiong |
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jie Tian, Lingxiao Yang, Ran Ji, Yuexin Ma, Lan Xu, Jingyi Yu, Ye Shi 0001, Jingya Wang |
Gaze-guided Hand-Object Interaction Synthesis: Benchmark and Method. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yueyao Wang, Samuel Furman, Nicolás Hardy, Margaret Ellis 0001, Godmar Back, Yili Hong 0001, Kirk W. Cameron |
A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yuepei Li, Kang Zhou, Qiao Qiao, Qing Wang, Qi Li |
Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Tao Chen 0008, Siqi Zuo, Cheng Li 0012, Mingyang Zhang 0001, Qiaozhu Mei, Michael Bendersky |
Unlocking the 'Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen |
DevBench: A Comprehensive Benchmark for Software Development. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Sheetal Harris, Jinshuo Liu, Hassan Jalil Hadi, Yue Cao 0002 |
Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | |
A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Namyong Park, Ryan A. Rossi, Xing Wang, Antoine Simoulin, Nesreen K. Ahmed, Christos Faloutsos |
GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Paul Daoudi, Bojan Mavkov, Bogdan Robu, Christophe Prieur 0001, Emmanuel Witrant, Merwan Barlier, Ludovic Dos Santos |
Improving a Proportional Integral Controller with Reinforcement Learning on a Throttle Valve Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Fangjun Li, David C. Hogg, Anthony G. Cohn 0001 |
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei, Xuanjing Huang 0001 |
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xilai Li, Wuyang Liu, Xiaosong Li, Haishu Tan |
Physical Perception Network and an All-weather Multi-modality Benchmark for Adverse Weather Image Fusion. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shuhao Li, Yue Cui, Jingyi Xu, Libin Li, Lingkai Meng, Weidong Yang, Fan Zhang, Xiaofang Zhou |
Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Gyubok Lee, Woosog Chay, Seonhee Cho, Edward Choi |
TrustSQL: A Reliability Benchmark for Text-to-SQL Models with Diverse Unanswerable Questions. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Prottay Kumar Adhikary, Aseem Srivastava, Shivani Kumar, Salam Michael Singh, Puneet Manuja, Jini K. Gopinath, Vijay Krishnan, Swati Kedia, Koushik Sinha Deb, Tanmoy Chakraborty 0002 |
Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zeyu Xi, Ge Shi 0002, Lifang Wu, Xuefen Li, Junchi Yan, Liang Wang, Zilin Liu |
Knowledge Graph Supported Benchmark and Video Captioning for Basketball. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Alex Golts, Vadim Ratner, Yoel Shoshan, Moshe Raboh, Sagi Polaczek, Michal Ozery-Flato, Daniel Shats, Liam Hazan, Sivan Ravid, Efrat Hexter |
A large dataset curation and benchmark for drug target interaction. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Samuel Mallick, Azita Dabiri, Bart De Schutter |
A Comparison Benchmark for Distributed Hybrid MPC Control Methods: Distributed Vehicle Platooning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Ankit Yadav, Mayank Singh 0001 |
Boldly Going Where No Benchmark Has Gone Before: Exposing Bias and Shortcomings in Code Generation Evaluation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yuxuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang |
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Thibaut Thonet, Jos Rozen, Laurent Besacier |
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Pan Zhou, Yao Wan 0001, Lichao Sun 0001 |
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | |
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Lukas Rauch, Raphael Schwinger, Moritz Wirth, René Heinrich, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz 0001 |
BirdSet: A Multi-Task Benchmark for Classification in Avian Bioacoustics. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Cong Ma, Lei Qiao, Chengkai Zhu, Kai Liu, Zelong Kong, Qing Li, Xueqi Zhou, Yuheng Kan, Wei Wu 0021 |
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Varshitha Chennamsetti, Gregor von Laszewski, Ruochen Gu, Laiba Mehnaz, Juri Papay, Samuel Jackson, Jeyan Thiyagalingam, Sergey V. Samsonau, Geoffrey C. Fox |
MLCommons Cloud Masking Benchmark with Early Stopping. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shruti Singh, Shoaib Alam, Mayank Singh 0001 |
LEGOBench: Leaderboard Generation Benchmark for Scientific Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yang Liu, Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Maosong Sun 0001, Erhong Yang |
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Slawomir Dadas, Michal Perelkiewicz, Rafal Poswiata |
PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng 0001, Peng Hu 0002 |
PointCloud-Text Matching: Benchmark Datasets and a Baseline. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Till Beemelmanns, Quan Zhang, Lutz Eckstein |
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Venelin Kovatchev, Matthew Lease |
Benchmark Transparency: Measuring the Impact of Data on Evaluation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zhengran Zeng, Yidong Wang, Rui Xie, Wei Ye 0004, Shikun Zhang |
CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su 0001 |
TravelPlanner: A Benchmark for Real-World Planning with Language Agents. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, Lianli Gao, Jingkuan Song |
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng 0013, Yuncheng Hua, Lizhen Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani-Azad, Ingrid Zukerman, Gholamreza Haffari |
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Haopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen 0001 |
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo |
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Hainiu Xu, Runcong Zhao, Lixing Zhu, Jinhua Du, Yulan He 0001 |
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng 0007 |
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Quan Tu, Shilong Fan, Zihang Tian, Rui Yan 0001 |
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Nadège Alavoine, Gaëlle Laperrière, Christophe Servan, Sahar Ghannay, Sophie Rosset |
New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu |
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Noah D. Brenowitz, Yair Cohen, Jaideep Pathak, Ankur Mahesh, Boris Bonev, Thorsten Kurth, Dale R. Durran, Peter Harrington, Michael S. Pritchard |
A Practical Probabilistic Benchmark for AI Weather Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Sunjun Kweon, Byungjin Choi, Minkyu Kim, Rae Woong Park, Edward Choi |
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Wenjian Luo, Peilan Xu, Shengxiang Yang, Yuhui Shi |
Benchmark for CEC 2024 Competition on Multiparty Multiobjective Optimization. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Han Huang, Haitian Zhong, Qiang Liu 0006, Shu Wu, Liang Wang, Tieniu Tan |
KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung |
REBUS: A Robust Evaluation Benchmark of Understanding Symbols. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jinge Wu, Yunsoo Kim, Honghan Wu |
Hallucination Benchmark in Medical Visual Question Answering. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Iain J. Cruickshank, Lynnette Hui Xian Ng |
DIVERSE: Deciphering Internet Views on the U.S. Military Through Video Comment Stance Analysis, A Novel Benchmark Dataset for Stance Classification. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yongtao Wu, Fanghui Liu 0001, Carl-Johann Simon-Gabriel, Grigorios G. Chrysos, Volkan Cevher |
Robust NAS under adversarial training: benchmark, theory, and beyond. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jinchang Hou, Chang Ao, Haihong Wu, Xiangtao Kong, Zhigang Zheng, Daijia Tang, Chengming Li, Xiping Hu 0001, Ruifeng Xu, Shiwen Ni, Min Yang 0007 |
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut, Chang Ye, Zichen Liu, Lucas N. Alegre, Alexander Nikulin, Xiao Hu, Tianlin Liu, Jongwook Choi, Brent Yi |
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Fan Zhang, Shuyi Mao, Qing Li, Xiaojiang Peng |
3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
|
|