![]() |
aGrUM
0.16.0
|
<agrum/FMDP/planning/structuredPlaner.h> More...
#include <structuredPlaner.h>
Public Member Functions | |
Datastructure access methods | |
INLINE const FMDP< GUM_SCALAR > * | fmdp () |
Returns a const ptr on the Factored Markov Decision Process on which we're planning. More... | |
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * | vFunction () |
Returns a const ptr on the value function computed so far. More... | |
virtual Size | vFunctionSize () |
Returns vFunction computed so far current size. More... | |
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optimalPolicy () |
Returns the best policy obtained so far. More... | |
virtual Size | optimalPolicySize () |
Returns optimalPolicy computed so far current size. More... | |
std::string | optimalPolicy2String () |
Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More... | |
Planning Methods | |
virtual void | initialize (const FMDP< GUM_SCALAR > *fmdp) |
Initializes data structure needed for making the planning. More... | |
virtual void | makePlanning (Idx nbStep=1000000) |
Performs a value iteration. More... | |
Static Public Member Functions | |
static StructuredPlaner< GUM_SCALAR > * | spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true) |
static StructuredPlaner< GUM_SCALAR > * | sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true) |
Protected Attributes | |
const FMDP< GUM_SCALAR > * | _fmdp |
The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More... | |
MultiDimFunctionGraph< GUM_SCALAR > * | _vFunction |
The Value Function computed iteratively. More... | |
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | _optimalPolicy |
The associated optimal policy. More... | |
Set< const DiscreteVariable *> | _elVarSeq |
A Set to eleminate primed variables. More... | |
GUM_SCALAR | _discountFactor |
Discount Factor used for infinite horizon planning. More... | |
IOperatorStrategy< GUM_SCALAR > * | _operator |
bool | _verbose |
Boolean used to indcates whether or not iteration informations should be displayed on terminal. More... | |
Protected Member Functions | |
Value Iteration Methods | |
virtual void | _initVFunction () |
Performs a single step of value iteration. More... | |
virtual MultiDimFunctionGraph< GUM_SCALAR > * | _valueIteration () |
Performs a single step of value iteration. More... | |
virtual MultiDimFunctionGraph< GUM_SCALAR > * | _evalQaction (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx) |
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More... | |
virtual MultiDimFunctionGraph< GUM_SCALAR > * | _maximiseQactions (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &) |
Performs max_a Q(s,a) More... | |
virtual MultiDimFunctionGraph< GUM_SCALAR > * | _minimiseFunctions (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &) |
Performs min_i F_i. More... | |
virtual MultiDimFunctionGraph< GUM_SCALAR > * | _addReward (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0) |
Perform the R(s) + gamma . function. More... | |
Constructor & destructor. | |
StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose) | |
Default constructor. More... | |
virtual | ~StructuredPlaner () |
Default destructor. More... | |
Optimal policy extraction methods | |
virtual void | _evalPolicy () |
Perform the required tasks to extract an optimal policy. More... | |
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * | _makeArgMax (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId) |
Creates a copy of given Qaction that can be exploit by a Argmax. More... | |
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * | _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &) |
Performs argmax_a Q(s,a) More... | |
void | _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction) |
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More... | |
NodeId | __recurArgMaxCopy (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &) |
Recursion part for the createArgMaxCopy. More... | |
NodeId | __recurExtractOptPol (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &) |
Recursion part for the createArgMaxCopy. More... | |
void | __transferActionIds (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &) |
Extract from an ArgMaxSet the associated ActionSet. More... | |
<agrum/FMDP/planning/structuredPlaner.h>
A class to find optimal policy for a given FMDP.
Perform a structure value iteration planning
Pure virtual functions : _regress, _maximize, _argmaximize, _add and _subtract are a priori the ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)
Definition at line 70 of file structuredPlaner.h.
|
protected |
Default constructor.
Definition at line 64 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::sviInstance().
|
virtual |
Default destructor.
Definition at line 82 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::sviInstance().
|
private |
Recursion part for the createArgMaxCopy.
Definition at line 507 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
private |
Recursion part for the createArgMaxCopy.
Definition at line 589 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
private |
Extract from an ArgMaxSet the associated ActionSet.
Definition at line 619 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Perform the R(s) + gamma . function.
Definition at line 408 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Performs argmax_a Q(s,a)
Definition at line 540 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Perform the required tasks to extract an optimal policy.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 437 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.
Definition at line 353 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protected |
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.
Definition at line 564 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 298 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protected |
Creates a copy of given Qaction that can be exploit by a Argmax.
Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction
Qaction | : the function graph we want to transform |
actionId | : the action Id associated to that graph |
Definition at line 482 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Performs max_a Q(s,a)
Definition at line 369 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Performs min_i F_i.
Definition at line 389 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 316 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
inline |
Returns a const ptr on the Factored Markov Decision Process on which we're planning.
Definition at line 137 of file structuredPlaner.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
virtual |
Initializes data structure needed for making the planning.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 229 of file structuredPlaner_tpl.h.
Referenced by gum::AdaptiveRMaxPlaner::initialize(), and gum::StructuredPlaner< double >::optimalPolicySize().
|
virtual |
Performs a value iteration.
nbStep | : enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed |
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 251 of file structuredPlaner_tpl.h.
Referenced by gum::AdaptiveRMaxPlaner::makePlanning(), and gum::StructuredPlaner< double >::optimalPolicySize().
|
inlinevirtual |
Returns the best policy obtained so far.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 157 of file structuredPlaner.h.
|
virtual |
Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 105 of file structuredPlaner_tpl.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicySize().
|
inlinevirtual |
Returns optimalPolicy computed so far current size.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 164 of file structuredPlaner.h.
|
inlinestatic |
Definition at line 80 of file structuredPlaner.h.
Referenced by gum::SDYNA::RandomMDDInstance(), and gum::SDYNA::spimddiInstance().
|
inlinestatic |
Definition at line 94 of file structuredPlaner.h.
Referenced by gum::SDYNA::RandomTreeInstance(), and gum::SDYNA::spitiInstance().
|
inline |
Returns a const ptr on the value function computed so far.
Definition at line 142 of file structuredPlaner.h.
|
inlinevirtual |
Returns vFunction computed so far current size.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 149 of file structuredPlaner.h.
|
private |
Definition at line 380 of file structuredPlaner.h.
|
private |
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.
Definition at line 379 of file structuredPlaner.h.
|
protected |
Discount Factor used for infinite horizon planning.
Definition at line 363 of file structuredPlaner.h.
|
protected |
A Set to eleminate primed variables.
Definition at line 358 of file structuredPlaner.h.
|
protected |
The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )
Definition at line 338 of file structuredPlaner.h.
Referenced by gum::StructuredPlaner< double >::fmdp().
|
protected |
Definition at line 365 of file structuredPlaner.h.
|
protected |
The associated optimal policy.
Definition at line 353 of file structuredPlaner.h.
Referenced by gum::StructuredPlaner< double >::optimalPolicy(), and gum::StructuredPlaner< double >::optimalPolicySize().
|
protected |
Boolean used to indcates whether or not iteration informations should be displayed on terminal.
Definition at line 371 of file structuredPlaner.h.
|
protected |
The Value Function computed iteratively.
Definition at line 343 of file structuredPlaner.h.
Referenced by gum::StructuredPlaner< double >::vFunction(), and gum::StructuredPlaner< double >::vFunctionSize().