The FacetedDBLP logo    Search for: in:

Disable automatic phrases ?     Syntactic query expansion: ?

Searching for benchmark with no syntactic query expansion in all metadata.

Publication years (Num. hits)
1970-1978 (15) 1979-1985 (20) 1986-1987 (25) 1988 (36) 1989 (42) 1990 (55) 1991 (59) 1992 (62) 1993 (88) 1994 (105) 1995 (154) 1996 (165) 1997 (190) 1998 (202) 1999 (268) 2000 (338) 2001 (321) 2002 (507) 2003 (571) 2004 (754) 2005 (1010) 2006 (1133) 2007 (1207) 2008 (1277) 2009 (889) 2010 (284) 2011 (153) 2012 (168) 2013 (305) 2014 (218) 2015 (246) 2016 (285) 2017 (314) 2018 (437) 2019 (522) 2020 (720) 2021 (974) 2022 (1252) 2023 (1619) 2024 (428)
Publication types (Num. hits)
article(6248) book(4) data(48) incollection(120) inproceedings(10975) phdthesis(20) proceedings(3)
Venues (Conferences, Journals, ...)
GrowBag graphs for keyword ? (Num. hits/coverage)

Group by:
The graphs summarize 10134 occurrences of 4020 keywords

Results
Found 17418 publication records. Showing 17418 according to the selection in the facets
Hits ? Authors Title Venue Year Link Author keywords
8Rodrigo Laigner, Zhexiang Zhang, Yijian Liu, Leonardo Freitas Gomes, Yongluan Zhou A Benchmark for Data Management Challenges in Microservices. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, Holger Caesar Towards learning-based planning: The nuPlan benchmark for real-world autonomous driving. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Béatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud, Richard Dufour DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Advaith Venkatramanan Sethuraman, Anja Sheppard, Onur Bagoren, Christopher Pinnow, Jamey Anderson, Timothy C. Havens, Katherine A. Skinner Machine Learning for Shipwreck Segmentation from Side Scan Sonar Imagery: Dataset and Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Mahathir Mohammad Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yilan Dong, Chunlin Yu, Ruiyang Ha, Ye Shi 0001, Yuexin Ma, Lan Xu, Yanwei Fu, Jingya Wang HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li 0002, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Dongjun Jang, Jean Seo, Sungjoo Byun, Taekyoung Kim, Minseok Kim, Hyopil Shin CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jannat Ara Meem, Muhammad Shihab Rashid, Yue Dong, Vagelis Hristidis PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel DyPyBench: A Benchmark of Executable Python Software. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shivanand Venkanna Sheshappanavar, Tejas Anvekar, Shivanand Kundargi, Yufan Wang, Chandra Kambhamettu A Benchmark Grocery Dataset of Realworld Point Clouds From Single View. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jiaxin Zhang 0024, Zhongzhi Li, Mingliang Zhang, Fei Yin, Chenglin Liu 0001, Yashar Moshfeghi GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, Eric Wong 0001 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay RouterBench: A Benchmark for Multi-LLM Routing System. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jiaqing Zhang, Jie Lei 0001, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Masanori Hirano Construction of a Japanese Financial Benchmark for Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Haotian Si, Changhua Pei, Hang Cui, Jingwen Yang, Yongqian Sun, Shenglin Zhang, Jingjing Li, Haiming Zhang, Jing Han, Dan Pei, Jianhui Li, Gaogang Xie TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yao Wan 0001, Yang He, Zhangqian Bi, Jianguo Zhang 0005, Hongyu Zhang 0002, Yulei Sui, Guandong Xu 0001, Hai Jin 0001, Philip S. Yu Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Corentin Royer, Bjoern H. Menze, Anjany Sekuboyina MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Junhao Zheng, Shengjie Qiu, Qianli Ma 0001 Concept-1K: A Novel Benchmark for Instance Incremental Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Avimita Chatterjee, Swaroop Ghosh Magic Mirror on the Wall, How to Benchmark Quantum Error Correction Codes, Overall ? Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Junyu Gao 0001, Liangliang Zhao, Xuelong Li 0001 NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Avi Rudich, Isaac Rudich, Rachel Rue Simple Stochastic Stopping Games: A Generator and Benchmark Library. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Andrea Esuli, Giovanni Puccetti 0002 The Invalsi Benchmark: measuring Language Models Mathematical and Language understanding in Italian. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiaoyue Wang, Jianyou Wang, Weili Cao, Kaicheng Wang, Ramamohan Paturi, Leon Bergen BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Asad Aali, Dave Van Veen, Yamin Ishraq Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash S. Tehrani, Jangwon Kim, Akshay S. Chaudhari A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Nihar Ranjan Sahoo, Pranamya Prashant Kulkarni, Narjis Asad, Arif Ahmad, Tanu Goyal, Aparna Garimella, Pushpak Bhattacharyya IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Maxence Lamarque, Luke Bhan, Yuanyuan Shi, Miroslav Krstic Adaptive Neural-Operator Backstepping Control of a Benchmark Hyperbolic PDE. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Itay Manes, Naama Ronn, David Cohen, Ran Ilan Ber, Zehavi Horowitz-Kugler, Gabriel Stanovsky K-QA: A Real-World Medical Q&A Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Spencer Carmichael, Austin Buchan, Mani Ramanagopal, Radhika Ravi, Ram Vasudevan, Katherine A. Skinner Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zubair Qazi, William Shiao, Evangelos E. Papalexakis GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiaoyun Zheng, Liwei Liao, Xufeng Li, Jianbo Jiao, Rongjie Wang, Feng Gao, Shiqi Wang 0001, Ronggang Wang PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yinghui Li, Qingyu Zhou, Yuanzhen Luo, Shirong Ma, Yangning Li, Hai-Tao Zheng 0002, Xuming Hu, Philip S. Yu When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yifan Wu, Jiawei Du, Ping Liu, Yuewei Lin, Wenqing Cheng, Wei Xu DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Nesrine Bannour, Christophe Servan, Aurélie Névéol, Xavier Tannier A Benchmark Evaluation of Clinical Named Entity Recognition in French. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xuanming Zhang, Zixun Chen, Zhou Yu ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Avinash Kumar Chaurasia, Matin Fallahi, Thorsten Strufe, Philipp Terhörst, Patricia Arias Cabarcos NeuroBench: An Open-Source Benchmark Framework for the Standardization of Methodology in Brainwave-based Authentication Research. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Martin Spitznagel, Janis Keuper Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Niklas Wretblad, Fredrik Gordh Riseby, Rahul Biswas, Amin Ahmadi, Oskar Holmström Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Caleb Robinson, Isaac Corley, Anthony Ortiz, Rahul Dodhia, Juan M. Lavista Ferres, Peyman Najafirad Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xin Tong, Bo Jin, Zhi Lin, Binjun Wang, Ting Yu, Qiang Cheng CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Fabian Retkowski, Alexander Waibel From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Daniel Batrakhanov, Tuomas Eerola, Kaisa Kraft, Lumi Haraguchi, Lasse Lensu, Sanna Suikkanen, María Teresa Camarena-Gómez, Jukka Seppälä, Heikki Kälviäinen DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yunhao Zhang, Xiaohan Zhang, Chong Li, Shaonan Wang, Chengqing Zong MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiangshuo Qiao, Xianxin Li, Xiaozhe Qu, Jie Zhang, Yang Liu, Yu Luo, Cihang Jin, Jin Ma 0003 CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar 0003, William Laney, Andrew Owens, Alexander Richard Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, Dacheng Tao OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti 0004, Thomas Arnold 0002, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Hamed Zibaei, Mohammad Saadi Mesgari Improved discrete particle swarm optimization using Bee Algorithm and multi-parent crossover method (Case study: Allocation problem and benchmark functions). Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiuying Chen, Tairan Wang, Qingqing Zhu, Taicheng Guo, Shen Gao, Zhiyong Lu, Xin Gao 0001, Xiangliang Zhang 0001 Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8André Silva, Nuno Saavedra, Martin Monperrus GitBug-Java: A Reproducible Benchmark of Recent Java Bugs. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8José L. Risco-Martín, Saurabh Mittal, Juan Carlos Fabero Jiménez, Marina Zapater, Román Hermida Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Bingchao Wang ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwanghyun Kim, Seunghyun Won, Edward Choi EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins 0001, Roee Aharoni, Mor Geva A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang 0001, Yulia Tsvetkov, Antonios Anastasopoulos DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Loddo Fabio, Dario Piga, Umberto Michelucci, El Ghazouali Safouane BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu, Wenpeng Yin 0001, Caiming Xiong FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zeyu Wang 0007, Haoran Xiong, Zhenying He, Peng Wang 0027, Wei Wang 0009 Distance Comparison Operators for Approximate Nearest Neighbor Search: Exploration and Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Junjie Ye, Yilong Wu, Songyang Gao, Caishuang Huang, Sixian Li, Guanyu Li, Xiaoran Fan, Qi Zhang 0001, Tao Gui, Xuanjing Huang 0001 RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang 0008, Li Liu, Chao Shen BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan 0001 Data-Effective Learning: A Comprehensive Medical Benchmark. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Butian Xiong, Zhuo Li, Zhen Li GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Xiaogang Jia, Denis Blessing, Xinkai Jiang, Moritz Reuss, Atalay Donat, Rudolf Lioutikov, Gerhard Neumann Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew T. Jackson, Samuel Coward, Jakob N. Foerster Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao 0001 An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkilä, Nicolas Kourtellis Speech Robust Bench: A Robustness Benchmark For Speech Recognition. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Congyun Jin, Ming Zhang, Xiaowei Ma, Yujiao Li, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Qiwei Peng 0002, Yekun Chai, Xuhong Li 0002 HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Frank Reyes, Yogya Gamage, Gabriel Skoglund, Benoit Baudry, Martin Monperrus BUMP: A Benchmark of Reproducible Breaking Dependency Updates. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Venktesh V, Abhijit Anand, Avishek Anand, Vinay Setty NUMTEMP: A real-world benchmark to verify claims with statistical and temporal expressions. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jonathan Viquerat, Philippe Meliga, Pablo Jeken, Elie Hachem Beacon, a lightweight deep reinforcement learning benchmark library for flow control. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Azmine Toushik Wasi, Md Shafikul Islam, Adipto Raihan Akib SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida I. Wang CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu Anchor function: a type of benchmark functions for studying language models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Jianlan Luo, Charles Xu 0003, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, Sergey Levine FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Danial Yazdani, Jürgen Branke, Mohammad Sadegh Khorshidi, Mohammad Nabi Omidvar, Xiaodong Li 0001, Amir H. Gandomi, Xin Yao 0001 Clustering in Dynamic Environments: A Framework for Benchmark Dataset Generation With Heterogeneous Changes. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yi Zong, Xipeng Qiu GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yu Xiong, Zhipeng Hu, Ye Huang, Runze Wu, Kai Guan, Xingchen Fang, Ji Jiang, Tianze Zhou, Yujing Hu, Haoyu Liu, Tangjie Lyu, Changjie Fan XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yufei Li, Simin Chen, Yanghong Guo, Wei Yang 0013, Yue Dong, Cong Liu 0005 Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Dominik Schlechtweg, Shafqat Mumtaz Virk, Nikolay Arefyev The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Rui Sun, Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li Enhancing Protein Predictive Models via Proteins Data Augmentation: A Benchmark and New Directions. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yan Liu, Renren Jin, Lin Shi, Zheng Yao, Deyi Xiong FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu 0002, Zhixin Feng, Kai Zhao, Yan Zheng Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Raghav Kapoor, Yash Parag Butala, Melisa Russak, Jing Yu Koh, Kiran Kamble, Waseem AlShikh, Ruslan Salakhutdinov OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Yin Zhang, Jinhong Deng, Peidong Liu, Wen Li, Shiyu Zhao Domain Adaptive Detection of MAVs: A Benchmark and Noise Suppression Network. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang 0007, Bowen Zhang, Ruifeng Xu, Xiaojun Chen 0006, Chengming Li, Xiping Hu 0001, Ye Li 0002, Jianping Fan 0002 MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Zili Liu, Hao Chen, Lei Bai 0001, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
8Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, Yue Zhang NovelQA: A Benchmark for Long-Range Novel Question Answering. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
Displaying result #901 - #1000 of 17418 (100 per page; Change: )
Pages: [<<][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][>>]
Valid XHTML 1.1! Valid CSS! [Valid RSS]
Maintained by L3S.
Previously maintained by Jörg Diederich.
Based upon DBLP by Michael Ley.
open data data released under the ODC-BY 1.0 license