The FacetedDBLP logo    Search for: in:

Disable automatic phrases ?     Syntactic query expansion: ?

Searching for benchmark with no syntactic query expansion in all metadata.

Publication years (Num. hits)
1970-1978 (15) 1979-1985 (20) 1986-1987 (25) 1988 (36) 1989 (42) 1990 (55) 1991 (59) 1992 (62) 1993 (88) 1994 (105) 1995 (154) 1996 (165) 1997 (190) 1998 (202) 1999 (268) 2000 (338) 2001 (321) 2002 (507) 2003 (571) 2004 (754) 2005 (1010) 2006 (1133) 2007 (1207) 2008 (1277) 2009 (889) 2010 (284) 2011 (153) 2012 (168) 2013 (305) 2014 (218) 2015 (246) 2016 (285) 2017 (314) 2018 (437) 2019 (522) 2020 (720) 2021 (974) 2022 (1252) 2023 (1619) 2024 (428)
Publication types (Num. hits)
article(6248) book(4) data(48) incollection(120) inproceedings(10975) phdthesis(20) proceedings(3)
Venues (Conferences, Journals, ...)
GrowBag graphs for keyword ? (Num. hits/coverage)

Group by:
The graphs summarize 10134 occurrences of 4020 keywords

Results
Found 17418 publication records. Showing 17418 according to the selection in the facets
Hits ? Authors Title Venue Year Link Author keywords
8Pouria Alikhanifard, Nikolaos Tsantalis A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Huiming Sun, Jiacheng Guo, Zibo Meng, Tianyun Zhang, Jianwu Fang, Yuewei Lin, Hongkai Yu EVD4UAV: An Altitude-Sensitive Benchmark to Evade Vehicle Detection in UAV. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Anna Varbella, Kenza Amara, Blazhe Gjorgiev, Giovanni Sansavini PowerGraph: A power grid benchmark dataset for graph neural networks. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yifan He, Claus Aranha Evolving Benchmark Functions to Compare Evolutionary Algorithms via Genetic Programming. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Haotian Xia, Zhengbang Yang, Yuqing Wang, Rhys Tracy, Yun Zhao, Dongdong Huang, Zezhi Chen, Yan Zhu, Yuan-Fang Wang, Weining Shen SportQA: A Benchmark for Sports Understanding in Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiaobo Guo, Soroush Vosoughi Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Luca Salvatore Lorello, Marco Lippi 0001, Stefano Melacci The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xin Zhao 0012, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Haoran Yin, Diederick Vermetten, Furong Ye, Thomas H. W. Bäck, Anna V. Kononova Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, Wei Bi GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu, Hua Huang CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Nantheera Anantrasirichai, Ruirui Lin, Alexandra Malyugina, David R. Bull BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Abdelrahman Younes, Tamim Asfour KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Sadman Sadeed Omee, Nihang Fu, Rongzhi Dong, Ming Hu, Jianjun Hu Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zheng Li, Xiang Chen, Xiaojun Wan 0001 WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu Jin, Lingyao Li, Haoyang Ling, Jinkui Chi, Jindong Wang, Xin Ma, Yongfeng Zhang NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zexuan Qiu, Jingjing Li, Shijue Huang, Wanjun Zhong, Irwin King CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau Improvements & Evaluations on the MLCommons CloudMask Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8João Vitorino, Miguel Silva, Eva Maia, Isabel Praça An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Sayantan Adak, Daivik Agrawal, Animesh Mukherjee 0001, Somak Aditya GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiaowei Qian, Zhimeng Guo, Jialiang Li 0003, Haitao Mao, Bingheng Li, Suhang Wang, Yao Ma 0001 Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8David Owen How predictable is language model benchmark performance? Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Mingzhe Du, Anh Tuan Luu, Bin Ji, See-Kiong Ng Mercury: An Efficiency Benchmark for LLM Code Synthesis. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Fangqiang Ding, Yunzhou Zhu, Xiangyu Wen, Chris Xiaoxuan Lu ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Nikola Bugarin, Jovana Bugaric, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto Unveiling the Anomalies in an Ever-Changing World: A Benchmark for Pixel-Level Anomaly Detection in Continual Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yupeng Li, Haorui He, Jin Bai, Dacheng Wen MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jia Li 0011, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Liang Xu, Hang Xue, Lei Zhu, Kangkang Zhao SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, Jinyue Zhao, Wenrui Li, Yanting Chen MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yusen Zhang A General Benchmark Framework is Dynamic Graph Neural Network Need. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Qusai Abu Obaidah, Muhy Eddin Za'ter, Adnan Jaljuli, Ali Mahboub, Asma Hakouz, Bashar Alfrou, Yazan Estaitia A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao 0001 Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiting Zhao, Sören Schwertfeger 3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Lisa Mais, Peter Hirsch 0001, Claire Managan, Ramya Kandarpa, Josef Lorenz Rumberger, Annika Reinke, Lena Maier-Hein, Gudrun Ihrke, Dagmar Kainmueller FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yihua Zhang, Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jiancheng Liu, Xiaoming Liu, Sijia Liu 0001 UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Phong Nguyen-Thuan Do, Son Quoc Tran, Phu Gia Hoang, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Vitaliy Pozdnyakov, Aleksandr Kovalenko, Ilya Makarov, Mikhail Drobyshevskiy, Kirill Lukyanov Adversarial Attacks and Defenses in Automated Control Systems: A Comprehensive Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jianlin Chen LMStyle Benchmark: Evaluating Text Style Transfer for Chatbots. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Chuang Liu, Renren Jin, Yuqi Ren, Deyi Xiong LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jie Tian, Lingxiao Yang, Ran Ji, Yuexin Ma, Lan Xu, Jingyi Yu, Ye Shi 0001, Jingya Wang Gaze-guided Hand-Object Interaction Synthesis: Benchmark and Method. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yueyao Wang, Samuel Furman, Nicolás Hardy, Margaret Ellis 0001, Godmar Back, Yili Hong 0001, Kirk W. Cameron A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yuepei Li, Kang Zhou, Qiao Qiao, Qing Wang, Qi Li Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Tao Chen 0008, Siqi Zuo, Cheng Li 0012, Mingyang Zhang 0001, Qiaozhu Mei, Michael Bendersky Unlocking the 'Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen DevBench: A Comprehensive Benchmark for Software Development. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Sheetal Harris, Jinshuo Liu, Hassan Jalil Hadi, Yue Cao 0002 Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8 A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Namyong Park, Ryan A. Rossi, Xing Wang, Antoine Simoulin, Nesreen K. Ahmed, Christos Faloutsos GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Paul Daoudi, Bojan Mavkov, Bogdan Robu, Christophe Prieur 0001, Emmanuel Witrant, Merwan Barlier, Ludovic Dos Santos Improving a Proportional Integral Controller with Reinforcement Learning on a Throttle Valve Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Fangjun Li, David C. Hogg, Anthony G. Cohn 0001 Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei, Xuanjing Huang 0001 Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xilai Li, Wuyang Liu, Xiaosong Li, Haishu Tan Physical Perception Network and an All-weather Multi-modality Benchmark for Adverse Weather Image Fusion. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shuhao Li, Yue Cui, Jingyi Xu, Libin Li, Lingkai Meng, Weidong Yang, Fan Zhang, Xiaofang Zhou Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Gyubok Lee, Woosog Chay, Seonhee Cho, Edward Choi TrustSQL: A Reliability Benchmark for Text-to-SQL Models with Diverse Unanswerable Questions. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Prottay Kumar Adhikary, Aseem Srivastava, Shivani Kumar, Salam Michael Singh, Puneet Manuja, Jini K. Gopinath, Vijay Krishnan, Swati Kedia, Koushik Sinha Deb, Tanmoy Chakraborty 0002 Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zeyu Xi, Ge Shi 0002, Lifang Wu, Xuefen Li, Junchi Yan, Liang Wang, Zilin Liu Knowledge Graph Supported Benchmark and Video Captioning for Basketball. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Alex Golts, Vadim Ratner, Yoel Shoshan, Moshe Raboh, Sagi Polaczek, Michal Ozery-Flato, Daniel Shats, Liam Hazan, Sivan Ravid, Efrat Hexter A large dataset curation and benchmark for drug target interaction. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Samuel Mallick, Azita Dabiri, Bart De Schutter A Comparison Benchmark for Distributed Hybrid MPC Control Methods: Distributed Vehicle Platooning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Ankit Yadav, Mayank Singh 0001 Boldly Going Where No Benchmark Has Gone Before: Exposing Bias and Shortcomings in Code Generation Evaluation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yuxuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Thibaut Thonet, Jos Rozen, Laurent Besacier ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Pan Zhou, Yao Wan 0001, Lichao Sun 0001 MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8 PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Lukas Rauch, Raphael Schwinger, Moritz Wirth, René Heinrich, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz 0001 BirdSet: A Multi-Task Benchmark for Classification in Avian Bioacoustics. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Cong Ma, Lei Qiao, Chengkai Zhu, Kai Liu, Zelong Kong, Qing Li, Xueqi Zhou, Yuheng Kan, Wei Wu 0021 HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Varshitha Chennamsetti, Gregor von Laszewski, Ruochen Gu, Laiba Mehnaz, Juri Papay, Samuel Jackson, Jeyan Thiyagalingam, Sergey V. Samsonau, Geoffrey C. Fox MLCommons Cloud Masking Benchmark with Early Stopping. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shruti Singh, Shoaib Alam, Mayank Singh 0001 LEGOBench: Leaderboard Generation Benchmark for Scientific Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yang Liu, Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Maosong Sun 0001, Erhong Yang OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Slawomir Dadas, Michal Perelkiewicz, Rafal Poswiata PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng 0001, Peng Hu 0002 PointCloud-Text Matching: Benchmark Datasets and a Baseline. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Till Beemelmanns, Quan Zhang, Lutz Eckstein MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Venelin Kovatchev, Matthew Lease Benchmark Transparency: Measuring the Impact of Data on Evaluation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zhengran Zeng, Yidong Wang, Rui Xie, Wei Ye 0004, Shikun Zhang CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su 0001 TravelPlanner: A Benchmark for Real-World Planning with Language Agents. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, Lianli Gao, Jingkuan Song CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng 0013, Yuncheng Hua, Lizhen Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani-Azad, Ingrid Zukerman, Gholamreza Haffari RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Haopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen 0001 Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Hainiu Xu, Runcong Zhao, Lixing Zhu, Jinhua Du, Yulan He 0001 OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng 0007 ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Quan Tu, Shilong Fan, Zihang Tian, Rui Yan 0001 CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Nadège Alavoine, Gaëlle Laperrière, Christophe Servan, Sahar Ghannay, Sophie Rosset New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Noah D. Brenowitz, Yair Cohen, Jaideep Pathak, Ankur Mahesh, Boris Bonev, Thorsten Kurth, Dale R. Durran, Peter Harrington, Michael S. Pritchard A Practical Probabilistic Benchmark for AI Weather Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Sunjun Kweon, Byungjin Choi, Minkyu Kim, Rae Woong Park, Edward Choi KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Wenjian Luo, Peilan Xu, Shengxiang Yang, Yuhui Shi Benchmark for CEC 2024 Competition on Multiparty Multiobjective Optimization. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Han Huang, Haitian Zhong, Qiang Liu 0006, Shu Wu, Liang Wang, Tieniu Tan KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung REBUS: A Robust Evaluation Benchmark of Understanding Symbols. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jinge Wu, Yunsoo Kim, Honghan Wu Hallucination Benchmark in Medical Visual Question Answering. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Iain J. Cruickshank, Lynnette Hui Xian Ng DIVERSE: Deciphering Internet Views on the U.S. Military Through Video Comment Stance Analysis, A Novel Benchmark Dataset for Stance Classification. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yongtao Wu, Fanghui Liu 0001, Carl-Johann Simon-Gabriel, Grigorios G. Chrysos, Volkan Cevher Robust NAS under adversarial training: benchmark, theory, and beyond. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jinchang Hou, Chang Ao, Haihong Wu, Xiangtao Kong, Zhigang Zheng, Daijia Tang, Chengming Li, Xiping Hu 0001, Ruifeng Xu, Shiwen Ni, Min Yang 0007 E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut, Chang Ye, Zichen Liu, Lucas N. Alegre, Alexander Nikulin, Xiao Hu, Tianlin Liu, Jongwook Choi, Brent Yi Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Fan Zhang, Shuyi Mao, Qing Li, Xiaojiang Peng 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
Displaying result #801 - #900 of 17418 (100 per page; Change: )
Pages: [<<][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][>>]
Valid XHTML 1.1! Valid CSS! [Valid RSS]
Maintained by L3S.
Previously maintained by Jörg Diederich.
Based upon DBLP by Michael Ley.
open data data released under the ODC-BY 1.0 license