aGrUM  0.20.2
a C++ library for (probabilistic) graphical models
gum::StructuredPlaner< GUM_SCALAR > Class Template Reference

<agrum/FMDP/planning/structuredPlaner.h> More...

#include <structuredPlaner.h>

+ Inheritance diagram for gum::StructuredPlaner< GUM_SCALAR >:
+ Collaboration diagram for gum::StructuredPlaner< GUM_SCALAR >:

Public Member Functions

Datastructure access methods
INLINE const FMDP< GUM_SCALAR > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 
Planning Methods
virtual void initialize (const FMDP< GUM_SCALAR > *fmdp)
 Initializes data structure needed for making the planning. More...
 
virtual void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 

Static Public Member Functions

static StructuredPlaner< GUM_SCALAR > * spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< GUM_SCALAR > * sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< GUM_SCALAR > * fmdp_
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
 The associated optimal policy. More...
 
Set< const DiscreteVariable *> elVarSeq_
 A Set to eleminate primed variables. More...
 
GUM_SCALAR discountFactor_
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< GUM_SCALAR > * operator_
 
bool verbose_
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void initVFunction_ ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * valueIteration_ ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_ (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximiseQactions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimiseFunctions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_ (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 

Constructor & destructor.

 StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
 Default constructor. More...
 
virtual ~StructuredPlaner ()
 Default destructor. More...
 

Optimal policy extraction methods

virtual void evalPolicy_ ()
 Perform the required tasks to extract an optimal policy. More...
 
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * makeArgMax_ (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
 Performs argmax_a Q(s,a) More...
 
void extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 
NodeId recurArgMaxCopy__ (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
NodeId recurExtractOptPol__ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
void transferActionIds__ (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
 Extract from an ArgMaxSet the associated ActionSet. More...
 

Detailed Description

template<typename GUM_SCALAR>
class gum::StructuredPlaner< GUM_SCALAR >

<agrum/FMDP/planning/structuredPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a structure value iteration planning

Pure virtual functions : regress_, maximize_, argmaximize_, add_ and subtract_ are a priori the ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)

Definition at line 70 of file structuredPlaner.h.

Constructor & Destructor Documentation

◆ StructuredPlaner()

template<typename GUM_SCALAR>
INLINE gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner ( IOperatorStrategy< GUM_SCALAR > *  opi,
GUM_SCALAR  discountFactor,
GUM_SCALAR  epsilon,
bool  verbose 
)
protected

Default constructor.

Definition at line 64 of file structuredPlaner_tpl.h.

68  :
69  discountFactor_(discountFactor),
70  operator_(opi), verbose_(verbose) {
71  GUM_CONSTRUCTOR(StructuredPlaner);
72 
73  threshold__ = epsilon;
74  vFunction_ = nullptr;
75  optimalPolicy_ = nullptr;
76  }
bool verbose_
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
GUM_SCALAR threshold__
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ ~StructuredPlaner()

template<typename GUM_SCALAR >
INLINE gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner ( )
virtual

Default destructor.

Definition at line 82 of file structuredPlaner_tpl.h.

82  {
83  GUM_DESTRUCTOR(StructuredPlaner);
84 
85  if (vFunction_) { delete vFunction_; }
86 
87  if (optimalPolicy_) delete optimalPolicy_;
88 
89  delete operator_;
90  }
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

Member Function Documentation

◆ addReward_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::addReward_ ( MultiDimFunctionGraph< GUM_SCALAR > *  function,
Idx  actionId = 0 
)
protectedvirtual

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 409 of file structuredPlaner_tpl.h.

411  {
412  // *****************************************************************************************
413  // ... we multiply the result by the discount factor, ...
415  = operator_->getFunctionInstance();
416  newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
417  delete Vold;
418 
419  // *****************************************************************************************
420  // ... and finally add reward
421  newVFunction = operator_->add(newVFunction, RECAST(fmdp_->reward(actionId)));
422 
423  return newVFunction;
424  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ argmaximiseQactions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtual

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 548 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::fmdp(), gum::StructuredPlaner< double >::optimalPolicy(), gum::StructuredPlaner< double >::optimalPolicySize(), gum::StructuredPlaner< double >::vFunction(), and gum::StructuredPlaner< double >::vFunctionSize().

551  {
552  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
553  newVFunction
554  = qActionsSet.back();
555  qActionsSet.pop_back();
556 
557  while (!qActionsSet.empty()) {
558  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
559  qAction
560  = qActionsSet.back();
561  qActionsSet.pop_back();
562  newVFunction = operator_->argmaximize(newVFunction, qAction);
563  }
564 
565  return newVFunction;
566  }
IOperatorStrategy< GUM_SCALAR > * operator_
+ Here is the caller graph for this function:

◆ evalPolicy_()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::evalPolicy_ ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 439 of file structuredPlaner_tpl.h.

439  {
440  // *****************************************************************************************
441  // Loop reset
443  = operator_->getFunctionInstance();
444  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
445 
446  std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >,
447  SetTerminalNodePolicy >* >
448  argMaxQActionsSet;
449  // *****************************************************************************************
450  // For each action
451  for (auto actionIter = fmdp_->beginActions();
452  actionIter != fmdp_->endActions();
453  ++actionIter) {
455  = this->evalQaction_(newVFunction, *actionIter);
456 
457  qAction = this->addReward_(qAction);
458 
459  argMaxQActionsSet.push_back(makeArgMax_(qAction, *actionIter));
460  }
461  delete newVFunction;
462 
463 
464  // *****************************************************************************************
465  // Next to evaluate main value function, we take maximise over all action
466  // value, ...
467  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
468  argMaxVFunction
469  = argmaximiseQactions_(argMaxQActionsSet);
470 
471  // *****************************************************************************************
472  // Next to evaluate main value function, we take maximise over all action
473  // value, ...
474  extractOptimalPolicy_(argMaxVFunction);
475  }
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * makeArgMax_(const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
void extractOptimalPolicy_(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_(std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
Performs argmax_a Q(s,a)
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ evalQaction_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::evalQaction_ ( const MultiDimFunctionGraph< GUM_SCALAR > *  Vold,
Idx  actionId 
)
protectedvirtual

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 353 of file structuredPlaner_tpl.h.

355  {
356  // ******************************************************************************
357  // Initialisation :
358  // Creating a copy of last Vfunction to deduce from the new Qaction
359  // And finding the first var to eleminate (the one at the end)
360 
361  return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
362  }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
Set< const DiscreteVariable *> elVarSeq_
A Set to eleminate primed variables.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ extractOptimalPolicy_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protected

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 574 of file structuredPlaner_tpl.h.

577  {
578  optimalPolicy_->clear();
579 
580  // Insertion des nouvelles variables
581  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
582  = argMaxOptimalValueFunction->variablesSequence().beginSafe();
583  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
584  ++varIter)
585  optimalPolicy_->add(**varIter);
586 
588  optimalPolicy_->manager()->setRootNode(
589  recurExtractOptPol__(argMaxOptimalValueFunction->root(),
590  argMaxOptimalValueFunction,
591  src2dest));
592 
593  delete argMaxOptimalValueFunction;
594  }
NodeId recurExtractOptPol__(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ fmdp()

template<typename GUM_SCALAR>
INLINE const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp ( )
inline

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 137 of file structuredPlaner.h.

137 { return fmdp_; }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...

◆ initialize()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::initialize ( const FMDP< GUM_SCALAR > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 229 of file structuredPlaner_tpl.h.

229  {
230  fmdp_ = fmdp;
231 
232  // Determination of the threshold value
234 
235  // Establishement of sequence of variable elemination
236  for (auto varIter = fmdp_->beginVariables(); varIter != fmdp_->endVariables();
237  ++varIter)
238  elVarSeq_ << fmdp_->main2prime(*varIter);
239 
240  // Initialisation of the value function
241  vFunction_ = operator_->getFunctionInstance();
242  optimalPolicy_ = operator_->getAggregatorInstance();
243  firstTime__ = true;
244  }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
GUM_SCALAR threshold__
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
INLINE const FMDP< GUM_SCALAR > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
Set< const DiscreteVariable *> elVarSeq_
A Set to eleminate primed variables.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ initVFunction_()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::initVFunction_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 298 of file structuredPlaner_tpl.h.

298  {
299  vFunction_->copy(*(RECAST(fmdp_->reward())));
300  }
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual void copy(const MultiDimContainer< GUM_SCALAR > &src)
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

◆ makeArgMax_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::makeArgMax_ ( const MultiDimFunctionGraph< GUM_SCALAR > *  Qaction,
Idx  actionId 
)
protected

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 485 of file structuredPlaner_tpl.h.

487  {
488  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
489  amcpy
490  = operator_->getArgMaxFunctionInstance();
491 
492  // Insertion des nouvelles variables
493  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
494  = qAction->variablesSequence().beginSafe();
495  varIter != qAction->variablesSequence().endSafe();
496  ++varIter)
497  amcpy->add(**varIter);
498 
500  amcpy->manager()->setRootNode(
501  recurArgMaxCopy__(qAction->root(), actionId, qAction, amcpy, src2dest));
502 
503  delete qAction;
504  return amcpy;
505  }
NodeId recurArgMaxCopy__(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ makePlanning()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 251 of file structuredPlaner_tpl.h.

251  {
252  if (firstTime__) {
253  this->initVFunction_();
254  firstTime__ = false;
255  }
256 
257  // *****************************************************************************************
258  // Main loop
259  // *****************************************************************************************
260  Idx nbIte = 0;
261  GUM_SCALAR gap = threshold__ + 1;
262  while ((gap > threshold__) && (nbIte < nbStep)) {
263  ++nbIte;
264 
266 
267  // *****************************************************************************************
268  // Then we compare new value function and the old one
270  = operator_->subtract(newVFunction, vFunction_);
271  gap = 0;
272 
273  for (deltaV->beginValues(); deltaV->hasValue(); deltaV->nextValue())
274  if (gap < fabs(deltaV->value())) gap = fabs(deltaV->value());
275  delete deltaV;
276 
277  if (verbose_)
278  std::cout << " ------------------- Fin itération n° " << nbIte << std::endl
279  << " Gap : " << gap << " - " << threshold__ << std::endl;
280 
281  // *****************************************************************************************
282  // And eventually we update pointers for next loop
283  delete vFunction_;
284  vFunction_ = newVFunction;
285  }
286 
287  // *****************************************************************************************
288  // Policy matching value function research
289  // *****************************************************************************************
290  this->evalPolicy_();
291  }
bool verbose_
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
void nextValue() const
Increments the constant safe iterator.
void beginValues() const
Initializes the constant safe iterator on terminal nodes.
GUM_SCALAR threshold__
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
bool hasValue() const
Indicates if constant safe iterator has reach end of terminal nodes list.
virtual MultiDimFunctionGraph< GUM_SCALAR > * valueIteration_()
Performs a single step of value iteration.
virtual void evalPolicy_()
Perform the required tasks to extract an optimal policy.
const GUM_SCALAR & value() const
Returns the value of the current terminal nodes pointed by the constant safe iterator.
virtual void initVFunction_()
Performs a single step of value iteration.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ maximiseQactions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 370 of file structuredPlaner_tpl.h.

371  {
372  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
373  qActionsSet.pop_back();
374 
375  while (!qActionsSet.empty()) {
376  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
377  qActionsSet.pop_back();
378  newVFunction = operator_->maximize(newVFunction, qAction);
379  }
380 
381  return newVFunction;
382  }
IOperatorStrategy< GUM_SCALAR > * operator_

◆ minimiseFunctions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 390 of file structuredPlaner_tpl.h.

391  {
392  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
393  qActionsSet.pop_back();
394 
395  while (!qActionsSet.empty()) {
396  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
397  qActionsSet.pop_back();
398  newVFunction = operator_->minimize(newVFunction, qAction);
399  }
400 
401  return newVFunction;
402  }
IOperatorStrategy< GUM_SCALAR > * operator_

◆ optimalPolicy()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy ( )
inlinevirtual

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 157 of file structuredPlaner.h.

157  {
158  return optimalPolicy_;
159  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ optimalPolicy2String()

template<typename GUM_SCALAR >
std::string gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String ( )
virtual

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 105 of file structuredPlaner_tpl.h.

105  {
106  // ************************************************************************
107  // Discarding the case where no \pi* have been computed
108  if (!optimalPolicy_ || optimalPolicy_->root() == 0)
109  return "NO OPTIMAL POLICY CALCULATED YET";
110 
111  // ************************************************************************
112  // Initialisation
113 
114  // Declaration of the needed string stream
115  std::stringstream output;
116  std::stringstream terminalStream;
117  std::stringstream nonTerminalStream;
118  std::stringstream arcstream;
119 
120  // First line for the toDot
121  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
122 
123  // Form line for the internal node stream en the terminal node stream
124  terminalStream << "node [shape = box];" << std::endl;
125  nonTerminalStream << "node [shape = ellipse];" << std::endl;
126 
127  // For somme clarity in the final string
128  std::string tab = "\t";
129 
130  // To know if we already checked a node or not
131  Set< NodeId > visited;
132 
133  // FIFO of nodes to visit
134  std::queue< NodeId > fifo;
135 
136  // Loading the FIFO
137  fifo.push(optimalPolicy_->root());
138  visited << optimalPolicy_->root();
139 
140 
141  // ************************************************************************
142  // Main loop
143  while (!fifo.empty()) {
144  // Node to visit
145  NodeId currentNodeId = fifo.front();
146  fifo.pop();
147 
148  // Checking if it is terminal
149  if (optimalPolicy_->isTerminalNode(currentNodeId)) {
150  // Get back the associated ActionSet
151  ActionSet ase = optimalPolicy_->nodeValue(currentNodeId);
152 
153  // Creating a line for this node
154  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId
155  << " [label=\"" << currentNodeId << " - ";
156 
157  // Enumerating and adding to the line the associated optimal actions
158  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe();
159  valIter != ase.endSafe();
160  ++valIter)
161  terminalStream << fmdp_->actionName(*valIter) << " ";
162 
163  // Terminating line
164  terminalStream << "\"];" << std::endl;
165  continue;
166  }
167 
168  // Either wise
169  {
170  // Geting back the associated internal node
171  const InternalNode* currentNode = optimalPolicy_->node(currentNodeId);
172 
173  // Creating a line in internalnode stream for this node
174  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId
175  << " [label=\"" << currentNodeId << " - "
176  << currentNode->nodeVar()->name() << "\"];" << std::endl;
177 
178  // Going through the sons and agregating them according the the sons Ids
179  HashTable< NodeId, LinkedList< Idx >* > sonMap;
180  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
181  if (!visited.exists(currentNode->son(sonIter))) {
182  fifo.push(currentNode->son(sonIter));
183  visited << currentNode->son(sonIter);
184  }
185  if (!sonMap.exists(currentNode->son(sonIter)))
186  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
187  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
188  }
189 
190  // Adding to the arc stram
191  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe();
192  ++sonIter) {
193  arcstream << tab << currentNodeId << " -> " << sonIter.key()
194  << " [label=\" ";
195  Link< Idx >* modaIter = sonIter.val()->list();
196  while (modaIter) {
197  arcstream << currentNode->nodeVar()->label(modaIter->element());
198  if (modaIter->nextLink()) arcstream << ", ";
199  modaIter = modaIter->nextLink();
200  }
201  arcstream << "\",color=\"#00ff00\"];" << std::endl;
202  delete sonIter.val();
203  }
204  }
205  }
206 
207  // Terminating
208  output << terminalStream.str() << std::endl
209  << nonTerminalStream.str() << std::endl
210  << arcstream.str() << std::endl
211  << "}" << std::endl;
212 
213  return output.str();
214  }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:626
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:632

◆ optimalPolicySize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::optimalPolicySize ( )
inlinevirtual

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 164 of file structuredPlaner.h.

164  {
165  return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
166  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ recurArgMaxCopy__()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::recurArgMaxCopy__ ( NodeId  currentNodeId,
Idx  actionId,
const MultiDimFunctionGraph< GUM_SCALAR > *  src,
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxCpy,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 512 of file structuredPlaner_tpl.h.

518  {
519  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
520 
521  NodeId nody;
522  if (src->isTerminalNode(currentNodeId)) {
523  ArgMaxSet< GUM_SCALAR, Idx > leaf(src->nodeValue(currentNodeId), actionId);
524  nody = argMaxCpy->manager()->addTerminalNode(leaf);
525  } else {
526  const InternalNode* currentNode = src->node(currentNodeId);
527  NodeId* sonsMap = static_cast< NodeId* >(
528  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
529  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
530  sonsMap[moda] = recurArgMaxCopy__(currentNode->son(moda),
531  actionId,
532  src,
533  argMaxCpy,
534  visitedNodes);
535  nody
536  = argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
537  }
538  visitedNodes.insert(currentNodeId, nody);
539  return nody;
540  }
bool isTerminalNode(const NodeId &node) const
Indicates if given node is terminal or not.
const InternalNode * node(NodeId n) const
Returns internalNode structure associated to that nodeId.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
const GUM_SCALAR & nodeValue(NodeId n) const
Returns value associated to given node.
NodeId recurArgMaxCopy__(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)

◆ recurExtractOptPol__()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::recurExtractOptPol__ ( NodeId  currentNodeId,
const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxOptVFunc,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 601 of file structuredPlaner_tpl.h.

605  {
606  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
607 
608  NodeId nody;
609  if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
610  ActionSet leaf;
611  transferActionIds__(argMaxOptVFunc->nodeValue(currentNodeId), leaf);
612  nody = optimalPolicy_->manager()->addTerminalNode(leaf);
613  } else {
614  const InternalNode* currentNode = argMaxOptVFunc->node(currentNodeId);
615  NodeId* sonsMap = static_cast< NodeId* >(
616  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
617  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
618  sonsMap[moda] = recurExtractOptPol__(currentNode->son(moda),
619  argMaxOptVFunc,
620  visitedNodes);
621  nody = optimalPolicy_->manager()->addInternalNode(currentNode->nodeVar(),
622  sonsMap);
623  }
624  visitedNodes.insert(currentNodeId, nody);
625  return nody;
626  }
void transferActionIds__(const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
Extract from an ArgMaxSet the associated ActionSet.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
NodeId recurExtractOptPol__(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)

◆ spumddInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::spumddInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 80 of file structuredPlaner.h.

82  {
83  return new StructuredPlaner< GUM_SCALAR >(
84  new MDDOperatorStrategy< GUM_SCALAR >(),
85  discountFactor,
86  epsilon,
87  verbose);
88  }

◆ sviInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::sviInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 94 of file structuredPlaner.h.

96  {
97  return new StructuredPlaner< GUM_SCALAR >(
98  new TreeOperatorStrategy< GUM_SCALAR >(),
99  discountFactor,
100  epsilon,
101  verbose);
102  }

◆ transferActionIds__()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::transferActionIds__ ( const ArgMaxSet< GUM_SCALAR, Idx > &  src,
ActionSet dest 
)
private

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 632 of file structuredPlaner_tpl.h.

634  {
635  for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
636  dest += *idi;
637  }

◆ valueIteration_()

template<typename GUM_SCALAR >
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::valueIteration_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 316 of file structuredPlaner_tpl.h.

316  {
317  // *****************************************************************************************
318  // Loop reset
320  = operator_->getFunctionInstance();
321  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
322 
323  // *****************************************************************************************
324  // For each action
325  std::vector< MultiDimFunctionGraph< GUM_SCALAR >* > qActionsSet;
326  for (auto actionIter = fmdp_->beginActions();
327  actionIter != fmdp_->endActions();
328  ++actionIter) {
330  = this->evalQaction_(newVFunction, *actionIter);
331  qActionsSet.push_back(qAction);
332  }
333  delete newVFunction;
334 
335  // *****************************************************************************************
336  // Next to evaluate main value function, we take maximise over all action
337  // value, ...
338  newVFunction = this->maximiseQactions_(qActionsSet);
339 
340  // *******************************************************************************************
341  // Next, we eval the new function value
342  newVFunction = this->addReward_(newVFunction);
343 
344  return newVFunction;
345  }
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximiseQactions_(std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
Performs max_a Q(s,a)
IOperatorStrategy< GUM_SCALAR > * operator_

◆ vFunction()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction ( )
inline

Returns a const ptr on the value function computed so far.

Definition at line 142 of file structuredPlaner.h.

142  {
143  return vFunction_;
144  }
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

◆ vFunctionSize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::vFunctionSize ( )
inlinevirtual

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 149 of file structuredPlaner.h.

149  {
150  return vFunction_ != nullptr ? vFunction_->realSize() : 0;
151  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

Member Data Documentation

◆ discountFactor_

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::discountFactor_
protected

Discount Factor used for infinite horizon planning.

Definition at line 363 of file structuredPlaner.h.

◆ elVarSeq_

template<typename GUM_SCALAR>
Set< const DiscreteVariable* > gum::StructuredPlaner< GUM_SCALAR >::elVarSeq_
protected

A Set to eleminate primed variables.

Definition at line 358 of file structuredPlaner.h.

◆ firstTime__

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::firstTime__
private

Definition at line 380 of file structuredPlaner.h.

◆ fmdp_

template<typename GUM_SCALAR>
const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp_
protected

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 338 of file structuredPlaner.h.

◆ operator_

template<typename GUM_SCALAR>
IOperatorStrategy< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::operator_
protected

Definition at line 365 of file structuredPlaner.h.

◆ optimalPolicy_

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy_
protected

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 353 of file structuredPlaner.h.

◆ threshold__

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::threshold__
private

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 379 of file structuredPlaner.h.

◆ verbose_

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::verbose_
protected

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 371 of file structuredPlaner.h.

◆ vFunction_

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction_
protected

The Value Function computed iteratively.

Definition at line 343 of file structuredPlaner.h.


The documentation for this class was generated from the following files: