|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 10134 occurrences of 4020 keywords
|
|
|
Results
Found 17418 publication records. Showing 17418 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
8 | Rodrigo Laigner, Zhexiang Zhang, Yijian Liu, Leonardo Freitas Gomes, Yongluan Zhou |
A Benchmark for Data Management Challenges in Microservices. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng |
Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, Holger Caesar |
Towards learning-based planning: The nuPlan benchmark for real-world autonomous driving. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Béatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud, Richard Dufour |
DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna |
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Advaith Venkatramanan Sethuraman, Anja Sheppard, Onur Bagoren, Christopher Pinnow, Jamey Anderson, Timothy C. Havens, Katherine A. Skinner |
Machine Learning for Shipwreck Segmentation from Side Scan Sonar Imagery: Dataset and Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Mahathir Mohammad Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir |
COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yilan Dong, Chunlin Yu, Ruiyang Ha, Ye Shi 0001, Yuexin Ma, Lan Xu, Yanwei Fu, Jingya Wang |
HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li 0002, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov |
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Dongjun Jang, Jean Seo, Sungjoo Byun, Taekyoung Kim, Minseok Kim, Hyopil Shin |
CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jannat Ara Meem, Muhammad Shihab Rashid, Yue Dong, Vagelis Hristidis |
PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel |
DyPyBench: A Benchmark of Executable Python Software. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shivanand Venkanna Sheshappanavar, Tejas Anvekar, Shivanand Kundargi, Yufan Wang, Chandra Kambhamettu |
A Benchmark Grocery Dataset of Realworld Point Clouds From Single View. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jiaxin Zhang 0024, Zhongzhi Li, Mingliang Zhang, Fei Yin, Chenglin Liu 0001, Yashar Moshfeghi |
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel |
HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, Eric Wong 0001 |
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux |
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay |
RouterBench: A Benchmark for Multi-LLM Routing System. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jiaqing Zhang, Jie Lei 0001, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li |
Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Masanori Hirano |
Construction of a Japanese Financial Benchmark for Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Haotian Si, Changhua Pei, Hang Cui, Jingwen Yang, Yongqian Sun, Shenglin Zhang, Jingjing Li, Haiming Zhang, Jing Han, Dan Pei, Jianhui Li, Gaogang Xie |
TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo |
Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yao Wan 0001, Yang He, Zhangqian Bi, Jianguo Zhang 0005, Hongyu Zhang 0002, Yulei Sui, Guandong Xu 0001, Hai Jin 0001, Philip S. Yu |
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang |
MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan |
SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Corentin Royer, Bjoern H. Menze, Anjany Sekuboyina |
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Junhao Zheng, Shengjie Qiu, Qianli Ma 0001 |
Concept-1K: A Novel Benchmark for Instance Incremental Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Avimita Chatterjee, Swaroop Ghosh |
Magic Mirror on the Wall, How to Benchmark Quantum Error Correction Codes, Overall ? |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Junyu Gao 0001, Liangliang Zhao, Xuelong Li 0001 |
NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Avi Rudich, Isaac Rudich, Rachel Rue |
Simple Stochastic Stopping Games: A Generator and Benchmark Library. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Andrea Esuli, Giovanni Puccetti 0002 |
The Invalsi Benchmark: measuring Language Models Mathematical and Language understanding in Italian. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu |
ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiaoyue Wang, Jianyou Wang, Weili Cao, Kaicheng Wang, Ramamohan Paturi, Leon Bergen |
BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Asad Aali, Dave Van Veen, Yamin Ishraq Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash S. Tehrani, Jangwon Kim, Akshay S. Chaudhari |
A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Nihar Ranjan Sahoo, Pranamya Prashant Kulkarni, Narjis Asad, Arif Ahmad, Tanu Goyal, Aparna Garimella, Pushpak Bhattacharyya |
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Maxence Lamarque, Luke Bhan, Yuanyuan Shi, Miroslav Krstic |
Adaptive Neural-Operator Backstepping Control of a Benchmark Hyperbolic PDE. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Itay Manes, Naama Ronn, David Cohen, Ran Ilan Ber, Zehavi Horowitz-Kugler, Gabriel Stanovsky |
K-QA: A Real-World Medical Q&A Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Spencer Carmichael, Austin Buchan, Mani Ramanagopal, Radhika Ravi, Ram Vasudevan, Katherine A. Skinner |
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zubair Qazi, William Shiao, Evangelos E. Papalexakis |
GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiaoyun Zheng, Liwei Liao, Xufeng Li, Jianbo Jiao, Rongjie Wang, Feng Gao, Shiqi Wang 0001, Ronggang Wang |
PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yinghui Li, Qingyu Zhou, Yuanzhen Luo, Shirong Ma, Yangning Li, Hai-Tao Zheng 0002, Xuming Hu, Philip S. Yu |
When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yifan Wu, Jiawei Du, Ping Liu, Yuewei Lin, Wenqing Cheng, Wei Xu |
DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Nesrine Bannour, Christophe Servan, Aurélie Névéol, Xavier Tannier |
A Benchmark Evaluation of Clinical Named Entity Recognition in French. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xuanming Zhang, Zixun Chen, Zhou Yu |
ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Avinash Kumar Chaurasia, Matin Fallahi, Thorsten Strufe, Philipp Terhörst, Patricia Arias Cabarcos |
NeuroBench: An Open-Source Benchmark Framework for the Standardization of Methodology in Brainwave-based Authentication Research. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim |
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Martin Spitznagel, Janis Keuper |
Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Niklas Wretblad, Fredrik Gordh Riseby, Rahul Biswas, Amin Ahmadi, Oskar Holmström |
Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Caleb Robinson, Isaac Corley, Anthony Ortiz, Rahul Dodhia, Juan M. Lavista Ferres, Peyman Najafirad |
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xin Tong, Bo Jin, Zhi Lin, Binjun Wang, Ting Yu, Qiang Cheng |
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Fabian Retkowski, Alexander Waibel |
From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Daniel Batrakhanov, Tuomas Eerola, Kaisa Kraft, Lumi Haraguchi, Lasse Lensu, Sanna Suikkanen, María Teresa Camarena-Gómez, Jukka Seppälä, Heikki Kälviäinen |
DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yunhao Zhang, Xiaohan Zhang, Chong Li, Shaonan Wang, Chengqing Zong |
MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiangshuo Qiao, Xianxin Li, Xiaozhe Qu, Jie Zhang, Yang Liu, Yu Luo, Cihang Jin, Jin Ma 0003 |
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar 0003, William Laney, Andrew Owens, Alexander Richard |
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, Dacheng Tao |
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti 0004, Thomas Arnold 0002, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov |
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Hamed Zibaei, Mohammad Saadi Mesgari |
Improved discrete particle swarm optimization using Bee Algorithm and multi-parent crossover method (Case study: Allocation problem and benchmark functions). |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiuying Chen, Tairan Wang, Qingqing Zhu, Taicheng Guo, Shen Gao, Zhiyong Lu, Xin Gao 0001, Xiangliang Zhang 0001 |
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | André Silva, Nuno Saavedra, Martin Monperrus |
GitBug-Java: A Reproducible Benchmark of Recent Java Bugs. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | José L. Risco-Martín, Saurabh Mittal, Juan Carlos Fabero Jiménez, Marina Zapater, Román Hermida |
Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Bingchao Wang |
ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwanghyun Kim, Seunghyun Won, Edward Choi |
EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins 0001, Roee Aharoni, Mor Geva |
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang 0001, Yulia Tsvetkov, Antonios Anastasopoulos |
DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Loddo Fabio, Dario Piga, Umberto Michelucci, El Ghazouali Safouane |
BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu, Wenpeng Yin 0001, Caiming Xiong |
FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zeyu Wang 0007, Haoran Xiong, Zhenying He, Peng Wang 0027, Wei Wang 0009 |
Distance Comparison Operators for Approximate Nearest Neighbor Search: Exploration and Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Junjie Ye, Yilong Wu, Songyang Gao, Caishuang Huang, Sixian Li, Guanyu Li, Xiaoran Fan, Qi Zhang 0001, Tao Gui, Xuanjing Huang 0001 |
RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang 0008, Li Liu, Chao Shen |
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan 0001 |
Data-Effective Learning: A Comprehensive Medical Benchmark. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Butian Xiong, Zhuo Li, Zhen Li |
GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Xiaogang Jia, Denis Blessing, Xinkai Jiang, Moritz Reuss, Atalay Donat, Rudolf Lioutikov, Gerhard Neumann |
Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew T. Jackson, Samuel Coward, Jakob N. Foerster |
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao 0001 |
An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkilä, Nicolas Kourtellis |
Speech Robust Bench: A Robustness Benchmark For Speech Recognition. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Congyun Jin, Ming Zhang, Xiaowei Ma, Yujiao Li, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang |
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Qiwei Peng 0002, Yekun Chai, Xuhong Li 0002 |
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao |
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Frank Reyes, Yogya Gamage, Gabriel Skoglund, Benoit Baudry, Martin Monperrus |
BUMP: A Benchmark of Reproducible Breaking Dependency Updates. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Venktesh V, Abhijit Anand, Avishek Anand, Vinay Setty |
NUMTEMP: A real-world benchmark to verify claims with statistical and temporal expressions. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jonathan Viquerat, Philippe Meliga, Pablo Jeken, Elie Hachem |
Beacon, a lightweight deep reinforcement learning benchmark library for flow control. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Azmine Toushik Wasi, Md Shafikul Islam, Adipto Raihan Akib |
SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida I. Wang |
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu |
Anchor function: a type of benchmark functions for studying language models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Jianlan Luo, Charles Xu 0003, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, Sergey Levine |
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Danial Yazdani, Jürgen Branke, Mohammad Sadegh Khorshidi, Mohammad Nabi Omidvar, Xiaodong Li 0001, Amir H. Gandomi, Xin Yao 0001 |
Clustering in Dynamic Environments: A Framework for Benchmark Dataset Generation With Heterogeneous Changes. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yi Zong, Xipeng Qiu |
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yu Xiong, Zhipeng Hu, Ye Huang, Runze Wu, Kai Guan, Xingchen Fang, Ji Jiang, Tianze Zhou, Yujing Hu, Haoyu Liu, Tangjie Lyu, Changjie Fan |
XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yufei Li, Simin Chen, Yanghong Guo, Wei Yang 0013, Yue Dong, Cong Liu 0005 |
Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Dominik Schlechtweg, Shafqat Mumtaz Virk, Nikolay Arefyev |
The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Rui Sun, Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li |
Enhancing Protein Predictive Models via Proteins Data Augmentation: A Benchmark and New Directions. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yan Liu, Renren Jin, Lin Shi, Zheng Yao, Deyi Xiong |
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu 0002, Zhixin Feng, Kai Zhao, Yan Zheng |
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Raghav Kapoor, Yash Parag Butala, Melisa Russak, Jing Yu Koh, Kiran Kamble, Waseem AlShikh, Ruslan Salakhutdinov |
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser |
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Yin Zhang, Jinhong Deng, Peidong Liu, Wen Li, Shiyu Zhao |
Domain Adaptive Detection of MAVs: A Benchmark and Noise Suppression Network. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang 0007, Bowen Zhang, Ruifeng Xu, Xiaojun Chen 0006, Chengming Li, Xiping Hu 0001, Ye Li 0002, Jianping Fan 0002 |
MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Zili Liu, Hao Chen, Lei Bai 0001, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi |
Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
8 | Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, Yue Zhang |
NovelQA: A Benchmark for Long-Range Novel Question Answering. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
|
|