learning. However, evaluating the half cheetah en-
vironment, the approach to online learning policies
made a very significant difference. Improving per-
formance for both the base case strategy and the best
strategy, DC-ED-DC. Online learning and the use of
a single environment have advantages in the applica-
bility of the system in real-world robotic applications.
For future work, we would like to test with other
environments. A limitation of this work is that it only
uses the policies’ actions. It would be interesting to
also consider other variables, such as their value func-
tion, as ensemble or incorporating the value functions
into the framework.
