The FacetedDBLP logo    Search for: in:

Disable automatic phrases ?     Syntactic query expansion: ?

Searching for reward with no syntactic query expansion in all metadata.

Publication years (Num. hits)
1967-1985 (17) 1986-1990 (17) 1991-1993 (24) 1994-1995 (22) 1996 (16) 1997-1998 (31) 1999 (27) 2000 (31) 2001 (49) 2002 (63) 2003 (54) 2004 (91) 2005 (98) 2006 (149) 2007 (170) 2008 (164) 2009 (123) 2010 (93) 2011 (81) 2012 (89) 2013 (107) 2014 (86) 2015 (105) 2016 (125) 2017 (159) 2018 (195) 2019 (268) 2020 (321) 2021 (403) 2022 (471) 2023 (585) 2024 (185)
Publication types (Num. hits)
article(2178) incollection(26) inproceedings(2189) mastersthesis(1) phdthesis(25)
Venues (Conferences, Journals, ...)
CoRR(908) NeuroImage(175) AAMAS(88) ICML(74) NeurIPS(73) AAAI(71) J. Cogn. Neurosci.(52) IJCNN(46) IJCAI(40) IEEE Access(38) ICRA(36) ICLR(34) HICSS(29) IROS(28) CogSci(24) AISTATS(22) More (+10 of total 1273)
GrowBag graphs for keyword ? (Num. hits/coverage)

Group by:
The graphs summarize 862 occurrences of 567 keywords

Results
Found 4420 publication records. Showing 4419 according to the selection in the facets
Hits ? Authors Title Venue Year Link Author keywords
16Robin Jaulmes, Joelle Pineau, Doina Precup Active Learning in Partially Observable Markov Decision Processes. Search on Bibsonomy ECML The full citation details ... 2005 DBLP  DOI  BibTeX  RDF
16Gautam A. Gupta, Stavros Toumpis, Jossy Sayir, Ralf R. Müller On the Transport Capacity of Gaussian Multiple Access and Broadcast Channels. Search on Bibsonomy WiOpt The full citation details ... 2005 DBLP  DOI  BibTeX  RDF
16Hyeong Soo Chang, Robert Givan, Edwin K. P. Chong Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes. Search on Bibsonomy Discret. Event Dyn. Syst. The full citation details ... 2004 DBLP  DOI  BibTeX  RDF rollout, multiclass scheduling, simulation, buffer management, partially observable Markov decision process
16Hong-Ren Chen, Yeh-Hao Chin Scheduling Value-Based Nested Transactions in Distributed Real-Time Database Systems. Search on Bibsonomy Real Time Syst. The full citation details ... 2004 DBLP  DOI  BibTeX  RDF distributed real-time scheduling, distributed real-time database, two-phase locking mechanism, communication delay, nested transaction
16Dezhen Song, A. Frank van der Stappen, Kenneth Y. Goldberg An Exact Algorithm Optimizing Coverage-resolution for Automated Satellite Frame Selection. Search on Bibsonomy ICRA The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Vinh Vi Lam, Peter Buchholz 0001, William H. Sanders A Structured Path-Based Approach for Computing Transient Rewards of Large CTMCs. Search on Bibsonomy QEST The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Alberto Reyes, Pablo H. Ibargüengoytia, Luis Enrique Sucar Power Plant Operator Assistant: An Industrial Application of Factored MDPs. Search on Bibsonomy MICAI The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Norihisa Sato, Masaharu Adachi, Makoto Kotani Control of Associative Chaotic Neural Networks Using a Reinforcement Learning. Search on Bibsonomy ISNN (1) The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Min-Xiou Chen, Ben-Jye Chang, Ren-Hung Hwang, Jun-Fan Juang MDP-based OVSF code assignment scheme and call admission control for wideband-CDMA communications. Search on Bibsonomy ISCC The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Yuichi Kobayashi, Shigeyuki Hosoe Motion planning with multiple resolutions: integration of evaluation space. Search on Bibsonomy SMC (1) The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Jesús Herrera, Anselmo Peñas, Felisa Verdejo Question Answering Pilot Task at CLEF 2004. Search on Bibsonomy CLEF The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Emilia I. Barakova Emergent behaviours based on episodic encoding and familiarity driven retrieval. Search on Bibsonomy AIMSA The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Sridharan Devarajan, P. S. Prashanth, V. S. Chakravarthy The Role of the Basal Ganglia in Exploratory Behavior in a Model Based on Reinforcement Learning. Search on Bibsonomy ICONIP The full citation details ... 2004 DBLP  DOI  BibTeX  RDF
16Mark Lanus, Liang Yin, Kishor S. Trivedi Hierarchical composition and aggregation of state-based availability and performability models. Search on Bibsonomy IEEE Trans. Reliab. The full citation details ... 2003 DBLP  DOI  BibTeX  RDF
16Jean-Michel Fourneau, Mathieu Le Coz, Nihal Pekergin, Franck Quessette An open tool to compute stochastic bounds on steady-state distributions and rewards. Search on Bibsonomy MASCOTS The full citation details ... 2003 DBLP  DOI  BibTeX  RDF
16Suzana Andova, Holger Hermanns, Joost-Pieter Katoen Discrete-Time Rewards Model-Checked. Search on Bibsonomy FORMATS The full citation details ... 2003 DBLP  DOI  BibTeX  RDF
16M. Benmammoun, Jean-Michel Fourneau, Nihal Pekergin, Alexis Troubnikoff An Algorithmic and Numerical Approach to Bound the Performance of High Speed Networks. Search on Bibsonomy MASCOTS The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
16Iadine Chades, Bruno Scherrer, François Charpillet A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem. Search on Bibsonomy SAC The full citation details ... 2002 DBLP  DOI  BibTeX  RDF decision theoretic agents, multiagent systems
16S. Swaminathan, G. Manimaran A Reliability-Aware Value-Based Scheduler for Dynamic Multiprocessor Real-Time Systems. Search on Bibsonomy IPDPS The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
16Cosmin Rusu, Rami G. Melhem, Daniel Mossé Maximizing the System Value while Satisfying Time and Energy Constraints. Search on Bibsonomy RTSS The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
16Tim Kovacs XCS's Strength-Based Twin: Part I. Search on Bibsonomy IWLCS The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
16Jean-Michel Fourneau, Nihal Pekergin An Algorithmic Approach to Stochastic Bounds. Search on Bibsonomy Performance The full citation details ... 2002 DBLP  DOI  BibTeX  RDF
16Kagan Tumer, Adrian K. Agogino, David H. Wolpert Learning sequences of actions in collectives of autonomous agents. Search on Bibsonomy AAMAS The full citation details ... 2002 DBLP  DOI  BibTeX  RDF MAS, reinforcement learning, Q-learning
16Amy Csizmar Dalal, Scott Jordan An optimal service ordering for a world wide web server. Search on Bibsonomy SIGMETRICS Perform. Evaluation Rev. The full citation details ... 2001 DBLP  DOI  BibTeX  RDF
16Scott Lenser, James Bruce, Manuela M. Veloso A Modular Hierarchical Behavior-Based Architecture. Search on Bibsonomy RoboCup The full citation details ... 2001 DBLP  DOI  BibTeX  RDF
16Shie Mannor, Nahum Shimkin Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments. Search on Bibsonomy COLT/EuroCOLT The full citation details ... 2001 DBLP  DOI  BibTeX  RDF
16Ricardo Vilalta, Mark Brodie, Daniel Oblinger, Irina Rish A Unified Framework for Evaluation Metrics in Classification Using Decision Trees. Search on Bibsonomy ECML The full citation details ... 2001 DBLP  DOI  BibTeX  RDF
16Jeffrey O. Pfaffmann, Klaus-Peter Zauner Scouting Context-Sensitive Components. Search on Bibsonomy Evolvable Hardware The full citation details ... 2001 DBLP  DOI  BibTeX  RDF
16Constantinos Maglaras Dynamic scheduling in multiclass queueing networks: Stability under discrete-review policies. Search on Bibsonomy Queueing Syst. Theory Appl. The full citation details ... 1999 DBLP  DOI  BibTeX  RDF open multiclass queueing networks, discrete-review policies, scheduling, stability, fluid models
11Federico Cornalba, Constantin Disselkamp, Davide Scassola, Christopher Helf Multi-objective reward generalization: improving performance of Deep Reinforcement Learning for applications in single-asset trading. Search on Bibsonomy Neural Comput. Appl. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Lucas de Azevedo Takara, André Alves Portela Santos, Viviana Cocco Mariani, Leandro dos Santos Coelho Deep reinforcement learning applied to a sparse-reward trading environment with intraday data. Search on Bibsonomy Expert Syst. Appl. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Victor R. F. Miranda, Armando Alves Neto, Gustavo Medeiros Freitas, Leonardo A. Mozelli Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping. Search on Bibsonomy IEEE Trans. Ind. Electron. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jing Zhang, Dan Guo, Xun Yang, Peipei Song, Meng Wang 0001 Visual-linguistic-stylistic Triple Reward for Cross-lingual Image Captioning. Search on Bibsonomy ACM Trans. Multim. Comput. Commun. Appl. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Meng Xu 0009, Xinhong Chen 0003, Yechao She, Yang Jin, Jianping Wang 0001 Time-Varying Weights in Multi-Reward Architecture for Deep Reinforcement Learning. Search on Bibsonomy IEEE Trans. Emerg. Top. Comput. Intell. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Qian Zhou, Guimeng Zhang Event camera object recognition using spatiotemporal event time surface and reward-modulated spike-timing-dependent plasticity learning rule. Search on Bibsonomy J. Electronic Imaging The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Xuchuang Wang, Hong Xie 0004, John C. S. Lui Analyzing Queueing Problems via Bandits With Linear Reward & Nonlinear Workload Fairness. Search on Bibsonomy IEEE Trans. Mob. Comput. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Rakesh Kumar, Amrita Chaturvedi Software Bug Prediction Using Reward-Based Weighted Majority Voting Ensemble Technique. Search on Bibsonomy IEEE Trans. Reliab. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Sherif B. Azmy, Nizar Zorba, Hossam S. Hassanein Incentive-Vacation Queueing in Extreme Edge Computing: An Analytical Reward-Based Framework. Search on Bibsonomy IEEE Open J. Commun. Soc. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Tim Rüterbories, Axel Mecklinger, Kathrin C. J. Eschmann, Jordan Crivelli-Decker, Charan Ranganath, Matthias J. Gruber Curiosity Satisfaction Increases Event-related Potentials Sensitive to Reward. Search on Bibsonomy J. Cogn. Neurosci. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Varun Devakonda, Zexi Zhou, Beiming Yang, Yang Qu Neural Reward Anticipation Moderates Longitudinal Relation between Parents' Familism Values and Latinx American Youth's School Disengagement. Search on Bibsonomy J. Cogn. Neurosci. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Zhaoxiang Zang, Zhao Li, Zhiping Dan, Junying Wang Improving selection strategies in zeroth-level classifier systems based on average reward reinforcement learning. Search on Bibsonomy J. Ambient Intell. Humaniz. Comput. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Erdi Sayar, Giovanni Iacca, Alois Knoll Curriculum Learning for Robot Manipulation Tasks With Sparse Reward Through Environment Shifts. Search on Bibsonomy IEEE Access The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Qian Zhao, Jinhui Han, Mao Xu Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation. Search on Bibsonomy IEEE Access The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Amjad Ali 0004, Shah Zeb, Madallah Alruwaili, Asad Masood Khattak, Bashir Hayat, Ki-Il Kim Mixed Criticality Reward-Based Systems Using Resource Reservation. Search on Bibsonomy IEEE Access The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jens Gudmundsson, Jens Leth Hougaard Blockchain-based Decentralized Reward Sharing: The Case of Mining Pools. Search on Bibsonomy ACM Trans. Economics and Comput. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jiasen Li, Huiping Yao Aberrant Reward Anticipating and Processing in Abstinent Heroin Addicts. Search on Bibsonomy IEEE Trans. Comput. Soc. Syst. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Keita Terashima, Koichi Kobayashi, Yuh Yamashita On reward distribution in reinforcement learning of multi-agent surveillance systems with temporal logic specifications. Search on Bibsonomy Adv. Robotics The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Naman Saxena, Sandeep Gorantla, Pushpak Jagtap Funnel-Based Reward Shaping for Signal Temporal Logic Tasks in Reinforcement Learning. Search on Bibsonomy IEEE Robotics Autom. Lett. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jaehwi Jang, Minjae Song, Daehyung Park Inverse Constraint Learning and Generalization by Transferable Reward Decomposition. Search on Bibsonomy IEEE Robotics Autom. Lett. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Bei Chen, Fazhan Liu, Herbert Ho-Ching Iu, Han Bao 0001, Quan Xu 0001 Memristive Neural Network Circuit of Operant Conditioning With Reward Delay and Variable Punishment Intensity. Search on Bibsonomy IEEE Trans. Circuits Syst. II Express Briefs The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Wenzheng Xu, Chengxi Wang, Hongbin Xie, Weifa Liang, Haipeng Dai, Zichuan Xu, Ziming Wang, Bing Guo, Sajal K. Das 0001 Reward Maximization for Disaster Zone Monitoring With Heterogeneous UAVs. Search on Bibsonomy IEEE/ACM Trans. Netw. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Anayo K. Akametalu, Shromona Ghosh, Jaime F. Fisac, Vicenc Rubies-Royo, Claire J. Tomlin A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets. Search on Bibsonomy IEEE Trans. Autom. Control. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Xiangxi Meng, Luyi Bai, Jiahui Hu, Lin Zhu 0014 Multi-hop path reasoning over sparse temporal knowledge graphs based on path completion and reward shaping. Search on Bibsonomy Inf. Process. Manag. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Francesco Betti Sorbelli, Alfredo Navarra, Lorenzo Palazzetti, Cristina M. Pinotti, Giuseppe Prencipe Wireless IoT sensors data collection reward maximization by leveraging multiple energy- and storage-constrained UAVs. Search on Bibsonomy J. Comput. Syst. Sci. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji Song Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning. Search on Bibsonomy IEEE Trans. Syst. Man Cybern. Syst. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Lisheng Wu, Ke Chen 0001 Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning. Search on Bibsonomy Mach. Learn. The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jiajia Cao, Na Chen The Influence of Robots' Fairness on Humans' Reward-Punishment Behaviors and Trust in Human-Robot Cooperative Teams. Search on Bibsonomy Hum. Factors The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jueming Hu, Zhe Xu 0005, Weichang Wang, Guannan Qu, Yutian Pang, Yongming Liu Decentralized graph-based multi-agent reinforcement learning using reward machines. Search on Bibsonomy Neurocomputing The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Na Chen, Jiajia Cao, Xueyan Hu The Effects of Robot Managers' Reward-Punishment Behaviours on Human-Robot Trust and Job Performance. Search on Bibsonomy Int. J. Soc. Robotics The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Xuejing Zheng, Chao Yu 0004 Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Deepthi Pathare, Leo Laine, Morteza Haghir Chehreghani Tactical Decision Making for Autonomous Trucks by Deep Reinforcement Learning with Total Cost of Operation Based Reward. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11David Venuto, Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry Yang, Ankit Anand Code as Reward: Empowering Reinforcement Learning with VLMs. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Xin Mao, Feng-Lin Li, Huimin Xu, Wei Zhang, Anh Tuan Luu Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Rati Devidze, Parameswaran Kamalaruban, Adish Singla Informativeness of Reward Functions in Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Shentao Yang, Tianqi Chen, Mingyuan Zhou A Dense Reward View on Aligning Text-to-Image Diffusion with Preference. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jumman Hossain, Abu Zaher Md Faridee, Nirmalya Roy, Jade Freeman, Timothy Gregory, Theron T. Trout TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Zhaoyue Wang Towards Socially and Morally Aware RL agent: Reward Design With LLM. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Yinghui Li, Jinze Wu, Xin Liu, Weizhong Guo, Yufei Xue Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Karin de Langis, Ryan Koo, Dongyeop Kang Reinforcement Learning with Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca D. Dragan, Erdem Biyik A Generalized Acquisition Function for Preference-based Reward Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu Prior Constraints-based Reward Model Training for Aligning Large Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Sungdong Kim, Minjoon Seo Preference-free Alignment Learning with Regularized Relevance Reward. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt To the Max: Reinventing Reward in Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Lichang Chen, Chen Zhu, Davit Soselia, Jiuhai Chen, Tianyi Zhou 0001, Tom Goldstein, Heng Huang, Mohammad Shoeybi, Bryan Catanzaro ODIN: Disentangled Reward Mitigates Hacking in RLHF. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Shayan Meshkat Alsadat, Jean-Raphaël Gaglione, Daniel Neider, Ufuk Topcu, Zhe Xu 0005 Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta VLRM: Vision-Language Models act as Reward Models for Image Captioning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Banghua Zhu, Michael I. Jordan, Jiantao Jiao Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Swaroop Nath, Tejpalsingh Siledar, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Harshad Khadilkar, Pushpak Bhattacharyya, Suman Banerjee, Amey Patil, Sudhanshu Shekhar Singh, Muthusamy Chelliah, Nikesh Garera Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant 0001, Shie Mannor On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Bohao Qu, Xiaofeng Cao 0002, Qing Guo 0005, Yi Chang, Ivor W. Tsang, Chengqi Zhang Transductive Reward Inference on Graph. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Chen Jia Generalizing Reward Modeling for Out-of-Distribution Preference Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Varul Srivastava, Sujit Gujar DECENT-BRM: Decentralization through Block Reward Mechanisms. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Yangchun Zhang, Yirui Zhou Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jan Wehner, Frans A. Oliehoek, Luciano Cavalcante Siebert Explaining Learned Reward Functions with Counterfactual Trajectories. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Jinyeob Kim, Sumin Kang, Sungwoo Yang, Beomjoon Kim, Jargalbaatar Yura, Donghan Kim 0001 Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Zhiyu An, Xianzhong Ding, Wan Du Reward Bound for Behavioral Guarantee of Model-based Planning Agents. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Ashish Rana, Michael Oesterle, Jannik Brinkmann GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Kenneth Li 0002, Samy Jelassi, Hugh Zhang, Sham M. Kakade, Martin Wattenberg, David Brandfonbrener Q-Probe: A Lightweight Approach to Reward Maximization for Language Models. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Ling Liang, Haizhao Yang On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Nafis Tanveer Islam, Joseph Khoury, Andrew Seong, Gonzalo De La Torre Parra, Elias Bou-Harb, Peyman Najafirad LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Angela Zhou Reward-Relevance-Filtered Linear Offline Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Rahul N. R, Vaibhav Katewa Transfer in Sequential Multi-armed Bandits via Reward Samples. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Gregory Hyde, Eugene Santos Jr. Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Yige Hong, Qiaomin Xie, Yudong Chen 0001, Weina Wang 0001 Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Gaurav Pandey 0001, Yatin Nandwani, Tahira Naseem, Mayank Mishra, Guangxuan Xu, Dinesh Raghu, Sachindra Joshi, Asim Munawar, Ramón Fernandez Astudillo BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
11Junseok Park, Yoonsung Kim, Hee bin Yoo, Min Whoo Lee, Kibeom Kim, Won-Seok Choi 0006, Minsu Lee, Byoung-Tak Zhang Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning. Search on Bibsonomy CoRR The full citation details ... 2024 DBLP  DOI  BibTeX  RDF
Displaying result #401 - #500 of 4419 (100 per page; Change: )
Pages: [<<][1][2][3][4][5][6][7][8][9][10][11][12][13][14][>>]
Valid XHTML 1.1! Valid CSS! [Valid RSS]
Maintained by L3S.
Previously maintained by Jörg Diederich.
Based upon DBLP by Michael Ley.
open data data released under the ODC-BY 1.0 license