Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
1 | Yuan Lu, Yu-Ting Lin |
Characterised LLMs Affect its Evaluation of Summary and Translation. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Ghazaleh Mahmoudi |
Exploring Prompting Large Language Models as Explainable Metrics. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Neema Kotonya, Saran Krishnasamy, Joel R. Tetreault, Alejandro Jaimes |
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Daniil Larionov, Vasiliy Viskov, George Kokush, Alexander Panchenko, Steffen Eger |
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Abhishek Pradhan, Ketan Kumar Todi |
Understanding Large Language Model Based Metrics for Text Summarization. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Pavan Baswani, Ananya Mukherjee, Manish Shrivastava 0001 |
LTRC_IIITH's 2023 Submission for Prompting Large Language Models as Explainable Metrics Task. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Jeremy Block, Yu-Peng Chen, Abhilash Budharapu, Lisa Anthony, Bonnie J. Dorr |
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models' Interaction with Interaction Log Information. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Savita Bhat, Vasudeva Varma |
Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Lukas Weber, Krishnan Jothi Ramalingam, Matthias Beyer, Axel Zimmermann 0005 |
WRF: Weighted Rouge-F1 Metric for Entity Recognition. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao 0021, Rotem Dror, Steffen Eger |
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Joonghoon Kim, Sangmin Lee, Seung Hun Han, Saeran Park, Jiyoon Lee, Kiyoon Jeong, Pilsung Kang 0001 |
Which is better? Exploring Prompting Strategy For LLM-based Metrics. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao 0021, Christoph Leiter, Juri Opitz, Andreas Rücklé (eds.) |
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2023, Bali, Indonesia, November 1, 2023 |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Rui Zhang, Fuhai Song, Hui Huang, Jinghao Yuan, Muyun Yang, Tiejun Zhao |
HIT-MI&T Lab's Submission to Eval4NLP 2023 Shared Task. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Abbas Akkasi, Kathleen C. Fraser, Majid Komeili |
Reference-Free Summarization Evaluation with Large Language Models. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Yanran Chen, Steffen Eger |
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Jad Doughman, Shady Shehata, Leen Al Qadi, Youssef Nafea, Fakhri Karray |
Can a Prediction's Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Nitin Ramrakhiyani, Vasudeva Varma, Girish K. Palshikar, Sachin Pawar |
Zero-shot Probing of Pretrained Language Models for Geography Knowledge. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Yixuan Wang, Qingyan Chen, Duygu Ataman |
Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Zahra Kolagar, Sebastian Steindl, Alessandra Zarcone |
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Vatsal Raina, Adian Liusie, Mark J. F. Gales |
Assessing Distractors in Multiple-Choice Tests. |
Eval4NLP |
2023 |
DBLP BibTeX RDF |
|
1 | Shohei Higashiyama, Masao Ideuchi, Masao Utiyama, Yoshiaki Oida, Eiichiro Sumita |
A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Ryoko Tokuhisa, Ana Brassard, Kentaro Inui |
Chat Translation Error Detection for Assisting Cross-lingual Communications. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Kaori Abe, Sho Yokoi, Tomoyuki Kajiwara, Kentaro Inui |
Why is sentence similarity benchmark not predictive of application-oriented task performance? |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Guanyi Chen, Fahime Same, Kees van Deemter |
Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Ryan Chi, Nathan Kim, Patrick Liu, Zander Lack, Ethan A. Chi |
GLARE: Generative Left-to-right AdversaRial Examples. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Mateusz Krubi'nski, Pavel Pecina |
From COMET to COMES - Can Summary Evaluation Benefit from Translation Evaluation? |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Parush Gera, Tempestt J. Neal |
A Comparative Analysis of Stance Detection Approaches and Datasets. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Daniel Deutsch, Can Udomcharoenchaikit, Juri Opitz, Yang Gao 0021, Marina Fomicheva, Steffen Eger (eds.) |
Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2022, Online, November 20, 2022 |
Eval4NLP |
2022 |
DBLP BibTeX RDF |
|
1 | Shohei Zhou, Alisha Zachariah, Devin Conathan, Jeffery Kline |
Assessing Resource-Performance Trade-off of Natural Language Models using Data Envelopment Analysis. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Zhengxiang Wang |
Random Text Perturbations Work, but not Always. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Juri Opitz, Anette Frank |
Better Smatch = Better Parser? AMR evaluation is not so simple anymore. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Roberta Rocca, Alejandro de la Vega |
Evaluating the role of non-lexical markers in GPT-2's language modeling behavior. |
Eval4NLP |
2022 |
DBLP DOI BibTeX RDF |
|
1 | Qingkai Zeng 0001, Mengxia Yu, Wenhao Yu 0002, Tianwen Jiang, Meng Jiang 0001 |
Validating Label Consistency in NER Data Annotation. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Nicolas Garneau, Luc Lamontagne |
Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Heather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev, Xi Victoria Lin |
Testing Cross-Database Semantic Parsers With Canonical Utterances. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Melda Eksi, Erik Gelbing, Jonathan Stieber, Chi Viet Vu |
Explaining Errors in Machine Translation with Absolute Gradient Ensembles. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Emma Manning, Nathan Schneider 0001 |
Referenceless Parsing-Based Evaluation of AMR-to-English Generation. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Benjamin Murauer, Günther Specht |
Developing a Benchmark for Reducing Data Bias in Authorship Attribution. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Chester Palen-Michel, Nolan Holley, Constantine Lignos |
SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Lucie Gianola, Hicham El Boukkouri, Cyril Grouin, Thomas Lavergne, Patrick Paroubek, Pierre Zweigenbaum |
Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Oleg V. Vasilyev 0001, John Bohannon |
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Alexey Tikhonov, Igor Samenko, Ivan P. Yamshchikov |
StoryDB: Broad Multi-language Narrative Dataset. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Ayush Garg 0001, Sammed S. Kagi, Vivek Srivastava, Mayank Singh 0001 |
MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Christoph Wolfgang Leiter |
Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Vivek Srivastava, Mayank Singh 0001 |
HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish Text. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Yang Liu 0254, Alan Medlar, Dorota Glowacka |
Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Urja Khurana, Eric T. Nalisnick, Antske Fokkens |
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Peter Polák, Muskaan Singh, Ondrej Bojar |
Explainable Quality Estimation: CUNI Eval4NLP Submission. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao 0033, Steffen Eger, Yang Gao 0021 |
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Enzo Terreau, Antoine Gourru, Julien Velcin |
Writing Style Author Embedding Evaluation. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Yang Gao 0021, Steffen Eger, Wei Zhao 0033, Piyawat Lertvittayakumjorn, Marina Fomicheva (eds.) |
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2021, Punta Cana, Dominican Republic, November 10, 2021 |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Marcos V. Treviso, Nuno Miguel Guerreiro, Ricardo Rei, André F. T. Martins |
IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Oskar Wysocki, Malina Florea, Dónal Landers, André Freitas |
What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Yo Ehara |
Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Tasnim Kabir, Marine Carpuat |
The UMD Submission to the Explainable MT Quality Estimation Shared Task: Combining Explanation Models with Sequence Labeling. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Raphael Rubino, Atsushi Fujita, Benjamin Marie |
Error Identification for Machine Translation with Metric Embedding and Attention. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | David Chen, Maury Courtland, Adam Faulkner, Aysu Ezen-Can |
Error-Sensitive Evaluation for Ordinal Target Variables. |
Eval4NLP |
2021 |
DBLP BibTeX RDF |
|
1 | Neslihan Iskender, Tim Polzehl, Sebastian Möller 0001 |
Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Kawin Ethayarajh, Dorsa Sadigh |
BLEU Neighbors: A Reference-less Approach to Automatic Evaluation. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Hwanhee Lee, Seunghyun Yoon 0002, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung |
ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Jingcheng Niu, Gerald Penn |
Grammaticality and Language Modelling. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Hanna Wecker, Annemarie Friedrich, Heike Adel |
ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Rahul Jha, Keping Bi, Yang Li, Mahdi Pakdaman, Asli Celikyilmaz, Ivan Zhiboedov, Kieran McDonald |
Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Shiran Dudy, Steven Bedrick |
Are Some Words Worth More than Others? |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Adam Poliak |
A survey on Recognizing Textual Entailment as an NLP Evaluation. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Kiril Gashteovski, Rainer Gemulla, Bhushan Kotnis, Sven Hertling, Christian Meilicke |
On Aligning OpenIE Extractions with Knowledge Bases: A Case Study. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Reda Yacouby, Dustin Axman |
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Steffen Eger, Yang Gao 0021, Maxime Peyrard, Wei Zhao 0033, Eduard H. Hovy (eds.) |
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2020, Online, November 20, 2020 |
Eval4NLP |
2020 |
DBLP BibTeX RDF |
|
1 | João Sedoc, Lyle H. Ungar |
Item Response Theory for Efficient Human Evaluation of Chatbots. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Nathan Stringham, Mike Izbicki |
Evaluating Word Embeddings on Low-Resource Languages. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Jesper Brink Andersen, Mikkel Bak Bertelsen, Mikkel Hørby Schou, Manuel R. Ciosici, Ira Assent |
One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Jacob Bremerman, Huda Khayrallah, Douglas W. Oard, Matt Post |
On the Evaluation of Machine Translation n-best Lists. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Oleg V. Vasilyev 0001, Vedant Dharnidharka, John Bohannon |
Fill in the BLANC: Human-free quality estimation of document summaries. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Xi Chen 0071, Nan Ding 0002, Tomer Levinboim, Radu Soricut |
Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|
1 | Klaus-Michael Lux, Maya Sappelli, Martha A. Larson |
Truth or Error? Towards systematic analysis of factual errors in abstractive summaries. |
Eval4NLP |
2020 |
DBLP DOI BibTeX RDF |
|