![]() |
aGrUM
0.16.0
|
<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...
#include <adaptiveRMaxPlaner.h>
Public Member Functions | |
Planning Methods | |
void | initialize (const FMDP< double > *fmdp) |
Initializes data structure needed for making the planning. More... | |
void | makePlanning (Idx nbStep=1000000) |
Performs a value iteration. More... | |
Datastructure access methods | |
INLINE const FMDP< double > * | fmdp () |
Returns a const ptr on the Factored Markov Decision Process on which we're planning. More... | |
INLINE const MultiDimFunctionGraph< double > * | vFunction () |
Returns a const ptr on the value function computed so far. More... | |
virtual Size | vFunctionSize () |
Returns vFunction computed so far current size. More... | |
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optimalPolicy () |
Returns the best policy obtained so far. More... | |
virtual Size | optimalPolicySize () |
Returns optimalPolicy computed so far current size. More... | |
std::string | optimalPolicy2String () |
Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More... | |
Static Public Member Functions | |
static AdaptiveRMaxPlaner * | ReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
static AdaptiveRMaxPlaner * | TreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
static StructuredPlaner< double > * | spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
static StructuredPlaner< double > * | sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
Protected Attributes | |
const FMDP< double > * | _fmdp |
The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More... | |
MultiDimFunctionGraph< double > * | _vFunction |
The Value Function computed iteratively. More... | |
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | _optimalPolicy |
The associated optimal policy. More... | |
Set< const DiscreteVariable * > | _elVarSeq |
A Set to eleminate primed variables. More... | |
double | _discountFactor |
Discount Factor used for infinite horizon planning. More... | |
IOperatorStrategy< double > * | _operator |
bool | _verbose |
Boolean used to indcates whether or not iteration informations should be displayed on terminal. More... | |
Protected Member Functions | |
Value Iteration Methods | |
virtual void | _initVFunction () |
Performs a single step of value iteration. More... | |
virtual MultiDimFunctionGraph< double > * | _valueIteration () |
Performs a single step of value iteration. More... | |
Optimal policy extraction methods | |
virtual void | _evalPolicy () |
Perform the required tasks to extract an optimal policy. More... | |
Value Iteration Methods | |
virtual MultiDimFunctionGraph< double > * | _evalQaction (const MultiDimFunctionGraph< double > *, Idx) |
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More... | |
virtual MultiDimFunctionGraph< double > * | _maximiseQactions (std::vector< MultiDimFunctionGraph< double > *> &) |
Performs max_a Q(s,a) More... | |
virtual MultiDimFunctionGraph< double > * | _minimiseFunctions (std::vector< MultiDimFunctionGraph< double > *> &) |
Performs min_i F_i. More... | |
virtual MultiDimFunctionGraph< double > * | _addReward (MultiDimFunctionGraph< double > *function, Idx actionId=0) |
Perform the R(s) + gamma . function. More... | |
Optimal policy extraction methods | |
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * | _makeArgMax (const MultiDimFunctionGraph< double > *Qaction, Idx actionId) |
Creates a copy of given Qaction that can be exploit by a Argmax. More... | |
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * | _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &) |
Performs argmax_a Q(s,a) More... | |
void | _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction) |
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More... | |
Constructor & destructor. | |
AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose) | |
Default constructor. More... | |
~AdaptiveRMaxPlaner () | |
Default destructor. More... | |
Incremental methods | |
HashTable< Idx, StatesCounter *> | __counterTable |
HashTable< Idx, bool > | __initializedTable |
bool | __initialized |
void | checkState (const Instantiation &newState, Idx actionId) |
Incremental methods | |
void | setOptimalStrategy (const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol) |
virtual ActionSet | stateOptimalPolicy (const Instantiation &curState) |
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | _optPol |
ActionSet | _allActions |
<agrum/FMDP/planning/adaptiveRMaxPlaner.h>
A class to find optimal policy for a given FMDP.
Perform a RMax planning on given in parameter factored markov decision process
Definition at line 53 of file adaptiveRMaxPlaner.h.
|
private |
Default constructor.
Definition at line 63 of file adaptiveRMaxPlaner.cpp.
Referenced by ReducedAndOrderedInstance(), and TreeInstance().
gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner | ( | ) |
Default destructor.
Definition at line 76 of file adaptiveRMaxPlaner.cpp.
References __counterTable.
Referenced by TreeInstance().
|
private |
Definition at line 345 of file adaptiveRMaxPlaner.cpp.
References __actionsBoolTable, __actionsRMaxTable, gum::FMDP< GUM_SCALAR >::endActions(), and gum::StructuredPlaner< double >::fmdp().
Referenced by makePlanning(), and TreeInstance().
|
private |
Definition at line 238 of file adaptiveRMaxPlaner.cpp.
References __actionsBoolTable, __actionsRMaxTable, __counterTable, __fmdpLearner, __rmax, __rThreshold, __visitLearner(), gum::StructuredPlaner< double >::_discountFactor, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_minimiseFunctions(), gum::StructuredPlaner< double >::_operator, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::beginVariables(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::clean(), gum::FMDP< GUM_SCALAR >::endActions(), gum::FMDP< GUM_SCALAR >::endVariables(), gum::StructuredPlaner< double >::fmdp(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::IVisitableGraphLearner::insertSetOfVars(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::ILearningStrategy::modaMax(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::reduce(), gum::ILearningStrategy::rMax(), gum::IVisitableGraphLearner::root(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().
Referenced by makePlanning(), and TreeInstance().
|
private |
Definition at line 309 of file adaptiveRMaxPlaner.cpp.
References __rmax, __rThreshold, gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addInternalNode(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::DiscreteVariable::domainSize(), gum::IVisitableGraphLearner::isTerminal(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::IVisitableGraphLearner::nodeNbObservation(), gum::IVisitableGraphLearner::nodeSon(), gum::IVisitableGraphLearner::nodeVar(), and SOA_ALLOCATE.
Referenced by __makeRMaxFunctionGraphs(), and TreeInstance().
|
protectedvirtualinherited |
Perform the R(s) + gamma . function.
Definition at line 408 of file structuredPlaner_tpl.h.
References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndMultiplyByScalar(), and RECAST.
Referenced by _evalPolicy(), and _valueIteration().
|
protectedvirtualinherited |
Performs argmax_a Q(s,a)
Definition at line 540 of file structuredPlaner_tpl.h.
Referenced by _evalPolicy().
|
protectedvirtual |
Perform the required tasks to extract an optimal policy.
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 194 of file adaptiveRMaxPlaner.cpp.
References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_argmaximiseQactions(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_extractOptimalPolicy(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_makeArgMax(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().
Referenced by TreeInstance().
|
protectedvirtualinherited |
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.
Definition at line 353 of file structuredPlaner_tpl.h.
Referenced by _evalPolicy(), and _valueIteration().
|
protectedinherited |
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.
Definition at line 564 of file structuredPlaner_tpl.h.
Referenced by _evalPolicy().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 133 of file adaptiveRMaxPlaner.cpp.
References gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::IOperatorStrategy< GUM_SCALAR >::add(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), RECASTED, gum::FMDP< GUM_SCALAR >::reward(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().
Referenced by TreeInstance().
|
protectedinherited |
Creates a copy of given Qaction that can be exploit by a Argmax.
Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction
Qaction | : the function graph we want to transform |
actionId | : the action Id associated to that graph |
Definition at line 482 of file structuredPlaner_tpl.h.
References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::root(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode(), and gum::MultiDimImplementation< GUM_SCALAR >::variablesSequence().
Referenced by _evalPolicy().
|
protectedvirtualinherited |
Performs max_a Q(s,a)
Definition at line 369 of file structuredPlaner_tpl.h.
Referenced by __makeRMaxFunctionGraphs(), and _valueIteration().
|
protectedvirtualinherited |
Performs min_i F_i.
Definition at line 389 of file structuredPlaner_tpl.h.
Referenced by __makeRMaxFunctionGraphs().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 146 of file adaptiveRMaxPlaner.cpp.
References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().
Referenced by TreeInstance().
|
inlinevirtual |
Implements gum::IDecisionStrategy.
Definition at line 201 of file adaptiveRMaxPlaner.h.
References __counterTable, and __initializedTable.
|
inlineinherited |
Returns a const ptr on the Factored Markov Decision Process on which we're planning.
Definition at line 137 of file structuredPlaner.h.
References gum::StructuredPlaner< GUM_SCALAR >::_fmdp.
Referenced by __clearTables(), __makeRMaxFunctionGraphs(), and TreeInstance().
Initializes data structure needed for making the planning.
Reimplemented from gum::IDecisionStrategy.
Definition at line 97 of file adaptiveRMaxPlaner.cpp.
References __counterTable, __initialized, __initializedTable, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IDecisionStrategy::initialize(), gum::StructuredPlaner< GUM_SCALAR >::initialize(), and gum::HashTable< Key, Val, Alloc >::insert().
Referenced by TreeInstance().
|
virtual |
Performs a value iteration.
nbStep | : enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed |
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 114 of file adaptiveRMaxPlaner.cpp.
References __clearTables(), __makeRMaxFunctionGraphs(), and gum::StructuredPlaner< GUM_SCALAR >::makePlanning().
Referenced by TreeInstance().
|
inlinevirtualinherited |
Returns the best policy obtained so far.
Implements gum::IPlanningStrategy< double >.
Definition at line 157 of file structuredPlaner.h.
References gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy.
|
virtualinherited |
Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Implements gum::IPlanningStrategy< double >.
Definition at line 105 of file structuredPlaner_tpl.h.
References gum::ActionSet::beginSafe(), gum::HashTable< Key, Val, Alloc >::beginSafe(), gum::Link< T >::element(), gum::ActionSet::endSafe(), gum::HashTable< Key, Val, Alloc >::endSafe(), gum::Set< Key, Alloc >::exists(), gum::HashTable< Key, Val, Alloc >::exists(), gum::HashTable< Key, Val, Alloc >::insert(), gum::HashTable< Key, Val, Alloc >::key(), gum::DiscreteVariable::label(), gum::Variable::name(), gum::InternalNode::nbSons(), gum::Link< T >::nextLink(), gum::InternalNode::nodeVar(), and gum::InternalNode::son().
|
inlinevirtualinherited |
Returns optimalPolicy computed so far current size.
Implements gum::IPlanningStrategy< double >.
Definition at line 164 of file structuredPlaner.h.
References gum::StructuredPlaner< GUM_SCALAR >::__recurArgMaxCopy(), gum::StructuredPlaner< GUM_SCALAR >::__recurExtractOptPol(), gum::StructuredPlaner< GUM_SCALAR >::__transferActionIds(), gum::StructuredPlaner< GUM_SCALAR >::_addReward(), gum::StructuredPlaner< GUM_SCALAR >::_argmaximiseQactions(), gum::StructuredPlaner< GUM_SCALAR >::_evalPolicy(), gum::StructuredPlaner< GUM_SCALAR >::_evalQaction(), gum::StructuredPlaner< GUM_SCALAR >::_extractOptimalPolicy(), gum::StructuredPlaner< GUM_SCALAR >::_initVFunction(), gum::StructuredPlaner< GUM_SCALAR >::_makeArgMax(), gum::StructuredPlaner< GUM_SCALAR >::_maximiseQactions(), gum::StructuredPlaner< GUM_SCALAR >::_minimiseFunctions(), gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy, gum::StructuredPlaner< GUM_SCALAR >::_valueIteration(), gum::StructuredPlaner< GUM_SCALAR >::fmdp(), gum::StructuredPlaner< GUM_SCALAR >::initialize(), gum::StructuredPlaner< GUM_SCALAR >::makePlanning(), and gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String().
|
inlinestatic |
Definition at line 65 of file adaptiveRMaxPlaner.h.
References AdaptiveRMaxPlaner().
Referenced by gum::SDYNA::RMaxMDDInstance().
|
inlineinherited |
Definition at line 90 of file IDecisionStrategy.h.
References gum::IDecisionStrategy::_optPol.
Referenced by gum::SDYNA::makePlanning().
|
inlinestaticinherited |
Definition at line 80 of file structuredPlaner.h.
|
inlinevirtualinherited |
Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.
Definition at line 97 of file IDecisionStrategy.h.
References gum::IDecisionStrategy::_allActions, and gum::IDecisionStrategy::_optPol.
Referenced by gum::E_GreedyDecider::stateOptimalPolicy(), and gum::SDYNA::takeAction().
|
inlinestaticinherited |
Definition at line 94 of file structuredPlaner.h.
References gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner(), and gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner().
|
inlinestatic |
Definition at line 79 of file adaptiveRMaxPlaner.h.
References __clearTables(), __makeRMaxFunctionGraphs(), __visitLearner(), _evalPolicy(), _initVFunction(), _valueIteration(), AdaptiveRMaxPlaner(), gum::StructuredPlaner< double >::fmdp(), initialize(), makePlanning(), and ~AdaptiveRMaxPlaner().
Referenced by gum::SDYNA::RMaxTreeInstance().
|
inlineinherited |
Returns a const ptr on the value function computed so far.
Definition at line 142 of file structuredPlaner.h.
References gum::StructuredPlaner< GUM_SCALAR >::_vFunction.
|
inlinevirtualinherited |
Returns vFunction computed so far current size.
Implements gum::IPlanningStrategy< double >.
Definition at line 149 of file structuredPlaner.h.
References gum::StructuredPlaner< GUM_SCALAR >::_vFunction, and gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::realSize().
|
private |
Definition at line 189 of file adaptiveRMaxPlaner.h.
Referenced by __clearTables(), __makeRMaxFunctionGraphs(), _evalPolicy(), and _valueIteration().
|
private |
Definition at line 188 of file adaptiveRMaxPlaner.h.
Referenced by __clearTables(), __makeRMaxFunctionGraphs(), _evalPolicy(), and _valueIteration().
|
private |
Definition at line 210 of file adaptiveRMaxPlaner.h.
Referenced by __makeRMaxFunctionGraphs(), checkState(), initialize(), and ~AdaptiveRMaxPlaner().
|
private |
Definition at line 190 of file adaptiveRMaxPlaner.h.
Referenced by __makeRMaxFunctionGraphs().
|
private |
Definition at line 213 of file adaptiveRMaxPlaner.h.
Referenced by initialize().
Definition at line 211 of file adaptiveRMaxPlaner.h.
Referenced by checkState(), and initialize().
|
private |
Definition at line 193 of file adaptiveRMaxPlaner.h.
Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().
|
private |
Definition at line 192 of file adaptiveRMaxPlaner.h.
Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().
|
protectedinherited |
Definition at line 107 of file IDecisionStrategy.h.
Referenced by gum::IDecisionStrategy::initialize(), gum::RandomDecider::stateOptimalPolicy(), gum::E_GreedyDecider::stateOptimalPolicy(), and gum::IDecisionStrategy::stateOptimalPolicy().
|
protectedinherited |
Discount Factor used for infinite horizon planning.
Definition at line 363 of file structuredPlaner.h.
Referenced by __makeRMaxFunctionGraphs().
|
protectedinherited |
A Set to eleminate primed variables.
Definition at line 358 of file structuredPlaner.h.
|
protectedinherited |
The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )
Definition at line 338 of file structuredPlaner.h.
Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().
|
protectedinherited |
Definition at line 365 of file structuredPlaner.h.
Referenced by __makeRMaxFunctionGraphs(), _evalPolicy(), _initVFunction(), and _valueIteration().
|
protectedinherited |
The associated optimal policy.
Definition at line 353 of file structuredPlaner.h.
|
protectedinherited |
Definition at line 104 of file IDecisionStrategy.h.
Referenced by gum::IDecisionStrategy::initialize(), gum::IDecisionStrategy::setOptimalStrategy(), and gum::IDecisionStrategy::stateOptimalPolicy().
|
protectedinherited |
Boolean used to indcates whether or not iteration informations should be displayed on terminal.
Definition at line 371 of file structuredPlaner.h.
|
protectedinherited |
The Value Function computed iteratively.
Definition at line 343 of file structuredPlaner.h.
Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().