The FacetedDBLP logo    Search for: in:

Disable automatic phrases ?     Syntactic query expansion: ?

Searching for Tokenizer with no syntactic query expansion in all metadata.

Publication years (Num. hits)
2002-2009 (15) 2010-2019 (15) 2020-2022 (15) 2023 (24) 2024 (6)
Publication types (Num. hits)
article(36) inproceedings(39)
GrowBag graphs for keyword ? (Num. hits/coverage)

Group by:
The graphs summarize 4 occurrences of 4 keywords

Results
Found 75 publication records. Showing 75 according to the selection in the facets
Hits ? Authors Title Venue Year Link Author keywords
113Stefan Klatt, Bernd Bohnet You Don't Have to Think Twice if You Carefully Tokenize. Search on Bibsonomy IJCNLP The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
93Robert Bernecky An SPMD/SIMD parallel tokenizer for APL. Search on Bibsonomy APL The full citation details ... 2003 DBLP  DOI  BibTeX  RDF
55Bin Ma 0001, Haizhou Li 0001 A phonotactic-semantic paradigm for automatic spoken document classification. Search on Bibsonomy SIGIR The full citation details ... 2005 DBLP  DOI  BibTeX  RDF acoustic words, phonotactic-semantic, semantic domain, spoken document classification, voice tokenizer, n-gram
50Run Shao, Zhaoyang Zhang, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li 0007 Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
47Cody Boisclair Developing a tokenizer and morphological parser for English text in C#. Search on Bibsonomy ACM Southeast Regional Conference The full citation details ... 2008 DBLP  DOI  BibTeX  RDF
33Amir Shahab Shahabi, Mohammad Reza Kangavari A Fuzzy Approach for Persian Text Segmentation Based on Semantic Similarity of Sentences. Search on Bibsonomy Intelligent Information Processing The full citation details ... 2006 DBLP  DOI  BibTeX  RDF Fuzzy Similarity Relation, Fuzzy Proximity Relation, Lemma, Fuzzy Relations Composition, Anti-Redundancy, Syntax Parser, Meta Variable, Meta Rule, Paradigmatic, Tokenizer, Multi-Document Summarizer, Lemmatizer
25Nicolas Boizard, Kevin El Haddad, Céline Hudelot, Pierre Colombo Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
25Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter Greed is All You Need: An Evaluation of Tokenizer Inference Methods. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
25Gautier Dagan, Gabriel Synnaeve, Baptiste Rozière Getting the most out of your tokenizer for pre-training and domain adaptation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
25Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu ε-ViLM : Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer. Search on Bibsonomy WACV (Workshops) The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
25Goodwill Erasmo Ndomba, Young-Seob Jeong Effects of Swahili Monolingual Tokenizer on Downstream Tasks. Search on Bibsonomy BigComp The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
25Sanghyun Choo, Wonjoon Kim A study on the evaluation of tokenizer performance in natural language processing. Search on Bibsonomy Appl. Artif. Intell. The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Jungeun Kim, Ha Young Kim CSLT-AK: Convolutional-embedded transformer with an action tokenizer and keypoint emphasizer for sign language translation. Search on Bibsonomy Pattern Recognit. Lett. The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Zhiwei Deng, Ting Chen, Yang Li Perceptual Group Tokenizer: Building Perception with Iterative Grouping. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, Cornelia Caragea Semantic Tokenizer for Enhanced Natural Language Processing. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Zipeng Xu, Enver Sangineto, Nicu Sebe StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang 0001, Irfan Essa, David A. Ross, Lu Jiang 0004 Language Model Beats Diffusion - Tokenizer is Key to Visual Generation. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan Making LLaMA SEE and Draw with SEED Tokenizer. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Zhiyuan Liu, Yaorui Shi, An Zhang 0003, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang 0010, Tat-Seng Chua Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Felix Stollenwerk Training and Evaluation of a Multilingual Tokenizer for GPT-SW3. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Miao Fan, Chen Hu, Shuchang Zhou 0001 Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, Sernam Lim Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Christopher Meaney, Therese A. Stukel, Peter C. Austin, Michael D. Escobar Comparing Variation in Tokenizer Outputs Using a Series of Problematic and Challenging Biomedical Sentences. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Mehdi Ali, Michael Fromm 0001, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr Tokenizer Choice For LLM Training: Negligible or Crucial? Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Tatsuya Hiraoka, Tomoya Iwakura Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Wenhao Li, Mengyuan Liu, Hong Liu 0009, Pichao Wang, Jialun Cai, Nicu Sebe Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation. Search on Bibsonomy CoRR The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Zipeng Xu, Enver Sangineto, Nicu Sebe StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model. Search on Bibsonomy ICCV The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Jimin Sun, Patrick Fernandes, Xinyi Wang, Graham Neubig A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models. Search on Bibsonomy EACL (Findings) The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, Sernam Lim Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding. Search on Bibsonomy ICCV (Workshops) The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Tuan Aqeel Bohoran, Polydoros N. Kampaktsis, Laura McLaughlin, Jay Leb, Serafeim P. Moustakidis, Gerry P. McCann, Archontis Giannakidis Right Ventricular Volume Prediction by Feature Tokenizer Transformer-Based Regression of 2D Echocardiography Small-Scale Tabular Data. Search on Bibsonomy FIMH The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Zhiyuan Liu, Yaorui Shi, An Zhang 0003, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules. Search on Bibsonomy NeurIPS The full citation details ... 2023 DBLP  BibTeX  RDF
25Adhiraj Banerjee, Vipul Arora 0001 wav2tok: Deep Sequence Tokenizer for Audio Retrieval. Search on Bibsonomy ICLR The full citation details ... 2023 DBLP  BibTeX  RDF
25Rinka Kiriyama, Akio Sashima, Ikuko Shimizu Robust Tokenizer for Vision Transformer. Search on Bibsonomy GCCE The full citation details ... 2023 DBLP  DOI  BibTeX  RDF
25Eugene Bagdasaryan, Congzheng Song, Rogier C. van Dalen, Matt Seigel, Áine Cahill Training a Tokenizer for Free with Private Federated Learning. Search on Bibsonomy CoRR The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Md Mofijul Islam, Gustavo Aguilar, Pragaash Ponnusamy, Clint Solomon Mathialagan, Chengyuan Ma, Chenlei Guo A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning. Search on Bibsonomy CoRR The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal 0002 TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer. Search on Bibsonomy CoRR The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Jimin Sun, Patrick Fernandes, Xinyi Wang, Graham Neubig A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models. Search on Bibsonomy CoRR The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? Search on Bibsonomy CoRR The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal 0002 TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer. Search on Bibsonomy EMNLP (Findings) The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Md Mofijul Islam, Gustavo Aguilar, Pragaash Ponnusamy, Clint Solomon Mathialagan, Chengyuan Ma, Chenlei Guo A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning. Search on Bibsonomy RepL4NLP@ACL The full citation details ... 2022 DBLP  DOI  BibTeX  RDF
25Jinghao Zhou, Chen Wei 0005, Huiyu Wang, Wei Shen 0002, Cihang Xie, Alan L. Yuille, Tao Kong Image BERT Pre-training with Online Tokenizer. Search on Bibsonomy ICLR The full citation details ... 2022 DBLP  BibTeX  RDF
25Pavel Rychlý, Samuel Spalek Utok: The Fast Rule-based Tokenizer. Search on Bibsonomy RASLAN The full citation details ... 2022 DBLP  BibTeX  RDF
25Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? Search on Bibsonomy AMTA The full citation details ... 2022 DBLP  BibTeX  RDF
25Jinghao Zhou, Chen Wei 0005, Huiyu Wang, Wei Shen 0002, Cihang Xie, Alan L. Yuille, Tao Kong iBOT: Image BERT Pre-Training with Online Tokenizer. Search on Bibsonomy CoRR The full citation details ... 2021 DBLP  BibTeX  RDF
25Sangah Lee, Hyopil Shin The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts. Search on Bibsonomy W-NUT The full citation details ... 2021 DBLP  DOI  BibTeX  RDF
25Phillip Rust, Jonas Pfeiffer, Ivan Vulic, Sebastian Ruder, Iryna Gurevych How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. Search on Bibsonomy ACL/IJCNLP (1) The full citation details ... 2021 DBLP  DOI  BibTeX  RDF
25Phillip Rust, Jonas Pfeiffer, Ivan Vulic, Sebastian Ruder, Iryna Gurevych How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. Search on Bibsonomy CoRR The full citation details ... 2020 DBLP  BibTeX  RDF
25Daniele Mazzei, Giacomo Baldi, Gualtiero Fantoni, Gabriele Montelisciani, Antonio Pitasi, Laura Ricci, Lorenzo Rizzello A Blockchain Tokenizer for Industrial IOT trustless applications. Search on Bibsonomy Future Gener. Comput. Syst. The full citation details ... 2020 DBLP  DOI  BibTeX  RDF
25Dokook Choe, Rami Al-Rfou, Mandy Guo, Heeyoung Lee, Noah Constant Bridging the Gap for Tokenizer-Free Language Models. Search on Bibsonomy CoRR The full citation details ... 2019 DBLP  BibTeX  RDF
25Kazuhisa Nakasho Development of a Flexible Mizar Tokenizer and Parser for Information Retrieval System. Search on Bibsonomy FedCSIS The full citation details ... 2019 DBLP  DOI  BibTeX  RDF
25Taku Kudo, John Richardson SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Search on Bibsonomy CoRR The full citation details ... 2018 DBLP  BibTeX  RDF
25Taku Kudo, John Richardson SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Search on Bibsonomy EMNLP (Demonstration) The full citation details ... 2018 DBLP  DOI  BibTeX  RDF
25Johannes Graën, Mara Bertamini, Martin Volk 0001 Cutter - a Universal Multilingual Tokenizer. Search on Bibsonomy SwissText The full citation details ... 2018 DBLP  BibTeX  RDF
25Matthieu Jimenez, Maxime Cordy, Yves Le Traon, Mike Papadakis On the Impact of Tokenizer and Parameters on N-Gram Based Code Analysis. Search on Bibsonomy ICSME The full citation details ... 2018 DBLP  DOI  BibTeX  RDF
25Kazuma Takaoka, Sorami Hisamoto, Noriko Kawahara, Miho Sakamoto, Yoshitaka Uchida, Yuji Matsumoto 0001 Sudachi: a Japanese Tokenizer for Business. Search on Bibsonomy LREC The full citation details ... 2018 DBLP  BibTeX  RDF
25Luz Marina Sierra, Carlos Alberto Cobos Lozada, Juan Carlos Corrales Tokenizer Adapted for Nasa Yuwe Language. Search on Bibsonomy Computación y Sistemas The full citation details ... 2016 DBLP  DOI  BibTeX  RDF
25K. Divyavarma, M. Remya, G. Deepa An Enhanced Bug Mining for Identifying Frequent Bug Pattern Using Word Tokenizer and FP-Growth. Search on Bibsonomy FICTA (1) The full citation details ... 2016 DBLP  DOI  BibTeX  RDF
25György Szaszák, Máté Ákos Tündik, András Beke Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer. Search on Bibsonomy KDIR The full citation details ... 2016 DBLP  DOI  BibTeX  RDF
25Juhaida Abu Bakar, Khairuddin Omar, Mohammad Faidzul Nasrudin, Mohd Zamri Murah Tokenizer for the Malay language using pattern matching. Search on Bibsonomy ISDA The full citation details ... 2014 DBLP  DOI  BibTeX  RDF
25Arianna Pipitone, Maria Carmela Campisi, Roberto Pirrone An A* Based Semantic Tokenizer for Increasing the Performance of Semantic Applications. Search on Bibsonomy ICSC The full citation details ... 2013 DBLP  DOI  BibTeX  RDF
25Jirí Marsík, Ondrej Bojar TrTok: A Fast and Trainable Tokenizer for Natural Languages. Search on Bibsonomy Prague Bull. Math. Linguistics The full citation details ... 2012 DBLP  BibTeX  RDF
25Neil Barrett, Jens H. Weber-Jahnke Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm. Search on Bibsonomy BMC Bioinform. The full citation details ... 2011 DBLP  DOI  BibTeX  RDF
25Neil Barrett, Jens H. Weber-Jahnke Building a Biomedical Tokenizer Using the Token Lattice Design Pattern and the Adapted Viterbi Algorithm. Search on Bibsonomy ICMLA The full citation details ... 2010 DBLP  DOI  BibTeX  RDF
25Aasish Pappu, Ratna Sanyal Vaakkriti: Sanskrit Tokenizer. Search on Bibsonomy IJCNLP The full citation details ... 2008 DBLP  BibTeX  RDF
25Chengguo Jin, Seung-Hoon Na, Dong-Il Kim, Jong-Hyeok Lee Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer. Search on Bibsonomy IJCNLP The full citation details ... 2008 DBLP  BibTeX  RDF
25Oana Frunza A Trainable Tokenizer, solution for multilingual texts and compound expression tokenization. Search on Bibsonomy LREC The full citation details ... 2008 DBLP  BibTeX  RDF
25Zhi-Jie Chang, Hsiao-Chuan Wang 以高斯混合模型表徵器與語言模型為基礎之語言辨認 (Language Identification based on Gaussian Mixture Model Tokenizer and Language Model) [In Chinese]. Search on Bibsonomy ROCLING The full citation details ... 2005 DBLP  BibTeX  RDF
23Rong Tong, Bin Ma 0001, Haizhou Li 0001, Chng Eng Siong A Target-Oriented Phonotactic Front-End for Spoken Language Recognition. Search on Bibsonomy IEEE Trans. Speech Audio Process. The full citation details ... 2009 DBLP  DOI  BibTeX  RDF
23Yu-Chieh Wu, Jie-Chi Yang A Robust Passage Retrieval Algorithm for Video Question Answering. Search on Bibsonomy IEEE Trans. Circuits Syst. Video Technol. The full citation details ... 2008 DBLP  DOI  BibTeX  RDF
23Rong Tong, Bin Ma 0001, Haizhou Li 0001, Engsiong Chng Target-oriented phone tokenizers for spoken language recognition. Search on Bibsonomy ICASSP The full citation details ... 2008 DBLP  DOI  BibTeX  RDF
23Hong Phuong Le, Nguyên Thi Minh Huyên, Azim Roussanaly, Hô Tuòng Vinh A Hybrid Approach to Word Segmentation of Vietnamese Texts. Search on Bibsonomy LATA The full citation details ... 2008 DBLP  DOI  BibTeX  RDF
23Francisco-Mario Barcala, Jesús Vilares Ferro, Miguel A. Alonso 0001, Jorge Graña Gil, Manuel Vilares Ferro Tokenization and Proper Noun Recognition for Information Retrieval. Search on Bibsonomy DEXA Workshops The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
23Jesús Vilares Ferro, Francisco-Mario Barcala, Miguel A. Alonso 0001, Jorge Graña Gil, Manuel Vilares Ferro Practical NLP-Based Text Indexing. Search on Bibsonomy IBERAMIA The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
Displaying result #1 - #75 of 75 (100 per page; Change: )
Valid XHTML 1.1! Valid CSS! [Valid RSS]
Maintained by L3S.
Previously maintained by Jörg Diederich.
Based upon DBLP by Michael Ley.
open data data released under the ODC-BY 1.0 license