| Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
| 3 | Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya |
A New Natural Policy Gradient by Stationary Distribution Metric.  |
ECML/PKDD  |
2008 |
DBLP DOI BibTeX RDF |
policy gradient reinforcement learning, Riemannian metric matrix, Markov decision process, natural gradient |
| 2 | Emmanuel Daucé |
A Model of Neuronal Specialization Using Hebbian Policy-Gradient with "Slow" Noise.  |
ICANN  |
2009 |
DBLP DOI BibTeX RDF |
|
| 2 | Abdeslam Boularias, Brahim Chaib-draa |
Predictive representations for policy gradient in POMDPs.  |
ICML  |
2009 |
DBLP DOI BibTeX RDF |
|
| 2 | Thomas Rückstieß, Martin Felder, Jürgen Schmidhuber |
State-Dependent Exploration for Policy Gradient Methods.  |
ECML/PKDD  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Maarten Peeters, Ville Könönen, Katja Verbeeck, Ann Nowé |
A Learning Automata Approach to Multi-agent Policy Gradient Learning.  |
KES  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Yu Hiei, Takeshi Mori, Shin Ishii |
Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments.  |
ICANN  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Tomoya Tamei, Tomohiro Shibata |
Policy Gradient Learning of Cooperative Interaction with a Robot Using User's Biological Signals.  |
ICONIP  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Nguyen Hoang Viet, Ngo Anh Vien, TaeChoong Chung |
Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks.  |
ICNSC  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Seiji Ishihara, Harukazu Igarashi |
Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies.  |
PRICAI  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Harukazu Igarashi, K. Nakamura, Seiji Ishihara |
Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks.  |
IJCNN  |
2008 |
DBLP DOI BibTeX RDF |
|
| 2 | Andrea Cherubini, Francesca Giannone, Luca Iocchi, Pier Francesco Palamara |
An extended policy gradient algorithm for robot task learning.  |
IROS  |
2007 |
DBLP DOI BibTeX RDF |
|
| 2 | Daan Wierstra, Jürgen Schmidhuber |
Policy Gradient Critics.  |
ECML  |
2007 |
DBLP DOI BibTeX RDF |
|
| 2 | Dongbing Gu, Erfu Yang |
Fuzzy Policy Reinforcement Learning in Cooperative Multi-robot Systems.  |
Journal of Intelligent and Robotic Systems  |
2007 |
DBLP DOI BibTeX RDF |
flocking behavior, policy gradient reinforcement learning, cooperative control, multi-agent reinforcement learning |
| 2 | Jan Peters, Stefan Schaal |
Policy Gradient Methods for Robotics.  |
IROS  |
2006 |
DBLP DOI BibTeX RDF |
|
| 2 | Yutaka Nakamura, Takeshi Mori, Shin Ishii |
An Off-Policy Natural Policy Gradient Method for a Partial Observable Markov Decision Process.  |
ICANN  |
2005 |
DBLP DOI BibTeX RDF |
|
| 2 | Yutaka Nakamura, Takeshi Mori, Shin Ishii |
Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot.  |
PPSN  |
2004 |
DBLP DOI BibTeX RDF |
|
| 2 | Nate Kohl, Peter Stone |
Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion.  |
ICRA  |
2004 |
DBLP DOI BibTeX RDF |
|
| 2 | Ville Könönen |
Policy Gradient Method for Team Markov Games.  |
IDEAL  |
2004 |
DBLP DOI BibTeX RDF |
|
| 2 | Bikramjit Banerjee, Jing Peng |
Adaptive policy gradient in multiagent learning.  |
AAMAS  |
2003 |
DBLP DOI BibTeX RDF |
gradient ascent learning, game theory, nash equilibria |
| 1 | Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama |
Analysis and improvement of policy gradient estimation.  |
Neural Networks  |
2012 |
DBLP DOI BibTeX RDF |
|
| 1 | Ngo Anh Vien, Hwanjo Yu, TaeChoong Chung |
Hessian matrix distribution for Bayesian policy gradient reinforcement learning.  |
Inf. Sci.  |
2011 |
DBLP DOI BibTeX RDF |
|
| 1 | Jervis Pinto, Alan Fern, Tim Bauer, Martin Erwig |
Improving Policy Gradient Estimates with Influence Information.  |
Journal of Machine Learning Research - Proceedings Track  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Peter L. Bartlett, Jonathan Baxter |
Infinite-Horizon Policy-Gradient Estimation  |
CoRR  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Michael Fairbank, Eduardo Alonso |
The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning  |
CoRR  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Peter L. Bartlett, Jonathan Baxter, Lex Weaver |
Experiments with Infinite-Horizon, Policy-Gradient Estimation  |
CoRR  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Kfir Y. Levy, Nahum Shimkin |
Unified Inter and Intra Options Learning Using Policy Gradient Methods.  |
EWRL  |
2011 |
DBLP DOI BibTeX RDF |
|
| 1 | Seiji Ishihara, Harukazu Igarashi |
Policy Gradient Reinforcement Learning with Environmental Dynamics and Action-Values in Policies.  |
KES  |
2011 |
DBLP DOI BibTeX RDF |
|
| 1 | Hunor Jakab, Lehel Csató |
Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms.  |
ICANN  |
2011 |
DBLP DOI BibTeX RDF |
|
| 1 | Mark Crowley, David Poole |
Policy Gradient Planning for Environmental Decision Making with Existing Simulators.  |
AAAI  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Philip S. Thomas |
Policy Gradient Coagent Networks.  |
NIPS  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama |
Analysis and Improvement of Policy Gradient Estimation.  |
NIPS  |
2011 |
DBLP BibTeX RDF |
|
| 1 | Andrea Cherubini, Francesca Giannone, Luca Iocchi, Daniele Nardi, Pier Francesco Palamara |
Policy gradient learning for quadruped soccer robots.  |
Robotics and Autonomous Systems  |
2010 |
DBLP DOI BibTeX RDF |
|
| 1 | Ngo Anh Vien, SeungGwan Lee, TaeChoong Chung |
Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors.  |
IEICE Transactions  |
2010 |
DBLP BibTeX RDF |
|
| 1 | Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Jan Peters, Kenji Doya |
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning.  |
Neural Computation  |
2010 |
DBLP DOI BibTeX RDF |
|
| 1 | Jan Peters |
Policy gradient methods.  |
Scholarpedia  |
2010 |
DBLP DOI BibTeX RDF |
|
| 1 | Yan-Jie Li, Fang Cao 0003, Xi-Ren Cao |
On-Line Policy Gradient Estimation with Multi-Step Sampling.  |
Discrete Event Dynamic Systems  |
2010 |
DBLP DOI BibTeX RDF |
|
| 1 | Jan Peters, J. Andrew Bagnell |
Policy Gradient Methods.  |
Encyclopedia of Machine Learning  |
2010 |
DBLP DOI BibTeX RDF |
|
| 1 | John W. Roberts, Lionel Moret, Jun Zhang, Russ Tedrake |
Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing.  |
From Motor Learning to Interaction Learning in Robots  |
2010 |
DBLP DOI BibTeX RDF |
|
| 1 | Atsushi Miyamae, Yuichi Nagata, Isao Ono, Shigenobu Kobayashi |
Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks.  |
NIPS  |
2010 |
DBLP BibTeX RDF |
|
| 1 | Jie Tang, Pieter Abbeel |
On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient.  |
NIPS  |
2010 |
DBLP BibTeX RDF |
|
| 1 | Andrea Cherubini, Francesca Giannone, Luca Iocchi, M. Lombardo, Giuseppe Oriolo |
Policy gradient learning for a humanoid soccer robot.  |
Robotics and Autonomous Systems  |
2009 |
DBLP DOI BibTeX RDF |
|
| 1 | Ngo Anh Vien, Nguyen Hoang Viet, SeungGwan Lee, TaeChoong Chung |
Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks.  |
IEICE Transactions  |
2009 |
DBLP BibTeX RDF |
|
| 1 | Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, Walter Senn, Wulfram Gerstner |
Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail.  |
PLoS Computational Biology  |
2009 |
DBLP DOI BibTeX RDF |
|
| 1 | Olivier Buffet, Douglas Aberdeen |
The factored policy-gradient planner.  |
Artif. Intell.  |
2009 |
DBLP DOI BibTeX RDF |
|
| 1 | Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner |
Code-specific policy gradient rules for spiking neurons.  |
NIPS  |
2009 |
DBLP BibTeX RDF |
|
| 1 | Verena Heidrich-Meisner, Christian Igel |
Uncertainty handling CMA-ES for reinforcement learning.  |
GECCO  |
2009 |
DBLP DOI BibTeX RDF |
covariance matrix adaptation evolution strategy, direct policy search, reinforcement learning, uncertainty handling |
| 1 | David Silver, Gerald Tesauro |
Monte-Carlo simulation balancing.  |
ICML  |
2009 |
DBLP DOI BibTeX RDF |
|
| 1 | Gen Endo, Jun Morimoto, Takamitsu Matsubara, Jun Nakanishi, Gordon Cheng |
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot.  |
I. J. Robotic Res.  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Huaxiang Zhang, Ying Fan |
An adaptive policy gradient in learning Nash equilibria.  |
Neurocomputing  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Francisco S. Melo |
Exploiting locality of interactions using a policy-gradient approach in multiagent learning.  |
ECAI  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Andres El-Fakdi, Marc Carreras |
Policy gradient based Reinforcement Learning for real autonomous underwater cable tracking.  |
IROS  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Verena Heidrich-Meisner, Christian Igel |
Similarities and differences between policy gradient methods and evolution strategies.  |
ESANN  |
2008 |
DBLP BibTeX RDF |
|
| 1 | Ngo Anh Vien, TaeChoong Chung |
Policy Gradient Semi-markov Decision Process.  |
ICTAI  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos |
Particle Filter-based Policy Gradient in POMDPs.  |
NIPS  |
2008 |
DBLP BibTeX RDF |
|
| 1 | John W. Roberts, Russ Tedrake |
Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms.  |
NIPS  |
2008 |
DBLP BibTeX RDF |
|
| 1 | Kristian Kersting, Kurt Driessens |
Non-parametric policy gradients: a unified treatment of propositional and relational domains.  |
ICML  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Matthew Zucker, James Kuffner, James A. Bagnell |
Adaptive workspace biasing for sampling-based planners.  |
ICRA  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Chenfeng Xu, Jian Yang, Hongsheng Xi, Qi Jiang, Baoqun Yin |
Event-related optimization for a class of resource location with admission control.  |
IJCNN  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Matthias Rungger, Hao Ding, Olaf Stursberg |
Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning.  |
ABiALS ![In: Anticipatory Behavior in Adaptive Learning Systems, From Psychological Theories to Artificial Cognitive Systems [4th Workshop on Anticipatory Behavior in Adaptive Learning Systems, ABiALS 2008, Munich, Germany, June 26-27, 2008], pp. 301-320, 2008, Springer, 978-3-642-02564-8. The full citation details ...](Pics/full.jpeg) |
2008 |
DBLP DOI BibTeX RDF |
hybrid automaton, behavioral programming, artificial intelligence, Reinforcement learning, planning, hierarchical model |
| 1 | Sertan Girgin, Philippe Preux |
Basis Expansion in Natural Actor Critic Methods.  |
EWRL  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Jan Peters, Jens Kober, Duy Nguyen-Tuong |
Policy Learning - A Unified Perspective with Applications in Robotics.  |
EWRL  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, Jürgen Schmidhuber |
Policy Gradients with Parameter-Based Exploration for Control.  |
ICANN  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Yuki Taniguchi, Takeshi Mori, Shin Ishii |
A Continuous Internal-State Controller for Partially Observable Markov Decision Processes.  |
ICANN  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Verena Heidrich-Meisner, Christian Igel |
Evolution Strategies for Direct Policy Search.  |
PPSN  |
2008 |
DBLP DOI BibTeX RDF |
|
| 1 | Sumeetpal S. Singh, Vladislav B. Tadic, Arnaud Doucet |
A policy gradient method for semi-Markov decision processes with application to call admission control.  |
European Journal of Operational Research  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Masa-aki Sato, Kenji Doya |
Learning a dynamic policy by using policy gradient: application to biped walking.  |
Systems and Computers in Japan  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Olivier Buffet, Douglas Aberdeen |
FF + FPG: Guiding a Policy-Gradient Planner.  |
ICAPS  |
2007 |
DBLP BibTeX RDF |
|
| 1 | Olivier Buffet, Alain Dutech, François Charpillet |
Shaping multi-agent systems with gradient reinforcement learning.  |
Autonomous Agents and Multi-Agent Systems  |
2007 |
DBLP DOI BibTeX RDF |
Policy-gradient, Multi-agent systems, Reinforcement learning, Shaping, Partially observable Markov decision processes |
| 1 | Mohammad Ghavamzadeh, Yaakov Engel |
Bayesian actor-critic algorithms.  |
ICML  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanathan |
Conditional random fields for multi-agent reinforcement learning.  |
ICML  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Diego E. Pardo Ayala, Cecilio Angulo Bahón |
Understanding Sensori-motor Coordination during a Humanoid Robot Dynamic Task.  |
FUZZ-IEEE  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Sang-Ho Hyon, Joshua G. Hale, Gordon Cheng |
Learning to acquire whole-body humanoid CoM movements to achieve dynamic tasks.  |
ICRA  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Pawel Wawrzynski |
Reinforcement Learning in Fine Time Discretization.  |
ICANNGA  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Daniel Schneegaß, Steffen Udluft, Thomas Martinetz |
Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification.  |
ICANN  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Yuki Taniguchi, Takeshi Mori, Shin Ishii |
Reinforcement Learning for Cooperative Actions in a Partially Observable Multi-agent System.  |
ICANN  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Daan Wierstra, Alexander Förster, Jan Peters, Jürgen Schmidhuber |
Solving Deep Memory POMDPs with Recurrent Policy Gradients.  |
ICANN  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Eiji Uchibe, Kenji Doya |
Finding Exploratory Rewards by Embodied Evolution and Constrained Reinforcement Learning in the Cyber Rodents.  |
ICONIP  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Diego E. Pardo, Cecilio Angulo |
Emerging Behaviors by Learning Joint Coordination in Articulated Mobile Robots.  |
IWANN  |
2007 |
DBLP DOI BibTeX RDF |
Sensor-Motor control, Coordination, Reinforcement Learning, Cognitive Robotics |
| 1 | Andrea Cherubini, Francesca Giannone, Luca Iocchi |
Layered Learning for a Soccer Legged Robot Helped with a 3D Simulator.  |
RoboCup  |
2007 |
DBLP DOI BibTeX RDF |
|
| 1 | Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Masa-aki Sato, Kenji Doya |
Learning CPG-based biped locomotion with a policy gradient method.  |
Robotics and Autonomous Systems  |
2006 |
DBLP DOI BibTeX RDF |
|
| 1 | Rémi Munos |
Policy Gradient in Continuous Time.  |
Journal of Machine Learning Research  |
2006 |
DBLP BibTeX RDF |
|
| 1 | Seiji Ishihara, Harukazu Igarashi |
Applying the policy gradient method to behavior learning in multiagent systems: The pursuit problem.  |
Systems and Computers in Japan  |
2006 |
DBLP DOI BibTeX RDF |
|
| 1 | Mohammad Ghavamzadeh, Yaakov Engel |
Bayesian Policy Gradient Algorithms.  |
NIPS  |
2006 |
DBLP BibTeX RDF |
|
| 1 | Xuening Wang, Wei Chen 0009, Daxue Liu, Tao Wu, Hangen He |
The Optimality Analysis of Hybrid Reinforcement Learning Combined with SVMs.  |
ISDA  |
2006 |
DBLP DOI BibTeX RDF |
|
| 1 | Manish Saggar, Thomas D'Silva, Nate Kohl, Peter Stone |
Autonomous Learning of Stable Quadruped Locomotion.  |
RoboCup  |
2006 |
DBLP DOI BibTeX RDF |
|
| 1 | Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Masa-aki Sato, Kenji Doya |
Learning CPG-based biped locomotion with a policy gradient method.  |
Humanoids  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Rémi Munos |
Policy gradient in continuous time.  |
CAP  |
2005 |
DBLP BibTeX RDF |
|
| 1 | Kentarou Hitomi, Tomohiro Shibata, Yutaka Nakamura, Shin Ishii |
On-line learning of a feedback controller for quasi-passive-dynamic walking by a stochastic policy gradient method.  |
IROS  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Noriaki Mitsunaga, Christian Smith, Takayuki Kanda, Hiroshi Ishiguro, Norihiro Hagita |
Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning.  |
IROS  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Masa-aki Sato, Kenji Doya |
Learning Sensory Feedback to CPG with Policy Gradient for Biped Locomotion.  |
ICRA  |
2005 |
DBLP BibTeX RDF |
|
| 1 | Huizhen Yu |
A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies.  |
UAI  |
2005 |
DBLP BibTeX RDF |
|
| 1 | Gen Endo, Jun Morimoto, Takamitsu Matsubara, Jun Nakanishi, Gordon Cheng |
Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid.  |
AAAI  |
2005 |
DBLP BibTeX RDF |
|
| 1 | Nicol N. Schraudolph, Douglas Aberdeen, Jin Yu |
Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation.  |
NIPS ![In: Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada], 2005. The full citation details ...](Pics/full.jpeg) |
2005 |
DBLP BibTeX RDF |
|
| 1 | Douglas Aberdeen |
Policy-Gradient Methods for Planning.  |
NIPS ![In: Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada], 2005. The full citation details ...](Pics/full.jpeg) |
2005 |
DBLP BibTeX RDF |
|
| 1 | Zonghua Zhang, Hong Shen |
Constructing Multi-Layered Boundary to Defend Against Intrusive Anomalies: An Autonomic Detection Coordinator.  |
DSN  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Jooyoung Park, Jongho Kim, Daesung Kang |
An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm.  |
CIS  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Jan Peters, Sethu Vijayakumar, Stefan Schaal |
Natural Actor-Critic.  |
ECML  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Zonghua Zhang, Hong Shen |
Dynamic Combination of Multiple Host-Based Anomaly Detectors with Broader Detection Coverage and Fewer False Alerts.  |
ICN  |
2005 |
DBLP DOI BibTeX RDF |
|
| 1 | Xi-Ren Cao |
Basic Ideas for Event-Based Optimization of Markov Systems.  |
Discrete Event Dynamic Systems  |
2005 |
DBLP DOI BibTeX RDF |
Markov decision processes (MDPs), performance potentials, policy gradients, aggregation, perturbation analysis, POMDPs, policy iteration |
| 1 | Douglas Aberdeen |
Filtered Reinforcement Learning.  |
ECML  |
2004 |
DBLP DOI BibTeX RDF |
|