aGrUM  0.20.3
a C++ library for (probabilistic) graphical models
gum::StructuredPlaner< GUM_SCALAR > Class Template Reference

<agrum/FMDP/planning/structuredPlaner.h> More...

#include <structuredPlaner.h>

+ Inheritance diagram for gum::StructuredPlaner< GUM_SCALAR >:
+ Collaboration diagram for gum::StructuredPlaner< GUM_SCALAR >:

Public Member Functions

Datastructure access methods
INLINE const FMDP< GUM_SCALAR > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 
Planning Methods
virtual void initialize (const FMDP< GUM_SCALAR > *fmdp)
 Initializes data structure needed for making the planning. More...
 
virtual void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 

Static Public Member Functions

static StructuredPlaner< GUM_SCALAR > * spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< GUM_SCALAR > * sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< GUM_SCALAR > * fmdp_
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
 The associated optimal policy. More...
 
Set< const DiscreteVariable *> elVarSeq_
 A Set to eleminate primed variables. More...
 
GUM_SCALAR discountFactor_
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< GUM_SCALAR > * operator_
 
bool verbose_
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void initVFunction_ ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * valueIteration_ ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_ (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximiseQactions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimiseFunctions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_ (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 

Constructor & destructor.

 StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
 Default constructor. More...
 
virtual ~StructuredPlaner ()
 Default destructor. More...
 

Optimal policy extraction methods

virtual void evalPolicy_ ()
 Perform the required tasks to extract an optimal policy. More...
 
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * makeArgMax_ (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
 Performs argmax_a Q(s,a) More...
 
void extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 
NodeId _recurArgMaxCopy_ (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
NodeId _recurExtractOptPol_ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
void _transferActionIds_ (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
 Extract from an ArgMaxSet the associated ActionSet. More...
 

Detailed Description

template<typename GUM_SCALAR>
class gum::StructuredPlaner< GUM_SCALAR >

<agrum/FMDP/planning/structuredPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a structure value iteration planning

Pure virtual functions : regress_, maximize_, argmaximize_, add_ and subtract_ are a priori the ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)

Definition at line 70 of file structuredPlaner.h.

Constructor & Destructor Documentation

◆ StructuredPlaner()

template<typename GUM_SCALAR>
INLINE gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner ( IOperatorStrategy< GUM_SCALAR > *  opi,
GUM_SCALAR  discountFactor,
GUM_SCALAR  epsilon,
bool  verbose 
)
protected

Default constructor.

Definition at line 64 of file structuredPlaner_tpl.h.

67  :
68  discountFactor_(discountFactor),
69  operator_(opi), verbose_(verbose) {
70  GUM_CONSTRUCTOR(StructuredPlaner);
71 
72  _threshold_ = epsilon;
73  vFunction_ = nullptr;
74  optimalPolicy_ = nullptr;
75  }
bool verbose_
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
GUM_SCALAR _threshold_
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ ~StructuredPlaner()

template<typename GUM_SCALAR >
INLINE gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner ( )
virtual

Default destructor.

Definition at line 81 of file structuredPlaner_tpl.h.

81  {
82  GUM_DESTRUCTOR(StructuredPlaner);
83 
84  if (vFunction_) { delete vFunction_; }
85 
86  if (optimalPolicy_) delete optimalPolicy_;
87 
88  delete operator_;
89  }
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

Member Function Documentation

◆ _recurArgMaxCopy_()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::_recurArgMaxCopy_ ( NodeId  currentNodeId,
Idx  actionId,
const MultiDimFunctionGraph< GUM_SCALAR > *  src,
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxCpy,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 488 of file structuredPlaner_tpl.h.

493  {
494  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
495 
496  NodeId nody;
497  if (src->isTerminalNode(currentNodeId)) {
498  ArgMaxSet< GUM_SCALAR, Idx > leaf(src->nodeValue(currentNodeId), actionId);
499  nody = argMaxCpy->manager()->addTerminalNode(leaf);
500  } else {
501  const InternalNode* currentNode = src->node(currentNodeId);
502  NodeId* sonsMap = static_cast< NodeId* >(
503  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
504  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
505  sonsMap[moda]
506  = _recurArgMaxCopy_(currentNode->son(moda), actionId, src, argMaxCpy, visitedNodes);
507  nody = argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
508  }
509  visitedNodes.insert(currentNodeId, nody);
510  return nody;
511  }
bool isTerminalNode(const NodeId &node) const
Indicates if given node is terminal or not.
const InternalNode * node(NodeId n) const
Returns internalNode structure associated to that nodeId.
NodeId _recurArgMaxCopy_(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
const GUM_SCALAR & nodeValue(NodeId n) const
Returns value associated to given node.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)

◆ _recurExtractOptPol_()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::_recurExtractOptPol_ ( NodeId  currentNodeId,
const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxOptVFunc,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 567 of file structuredPlaner_tpl.h.

571  {
572  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
573 
574  NodeId nody;
575  if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
576  ActionSet leaf;
577  _transferActionIds_(argMaxOptVFunc->nodeValue(currentNodeId), leaf);
578  nody = optimalPolicy_->manager()->addTerminalNode(leaf);
579  } else {
580  const InternalNode* currentNode = argMaxOptVFunc->node(currentNodeId);
581  NodeId* sonsMap = static_cast< NodeId* >(
582  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
583  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
584  sonsMap[moda] = _recurExtractOptPol_(currentNode->son(moda), argMaxOptVFunc, visitedNodes);
585  nody = optimalPolicy_->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
586  }
587  visitedNodes.insert(currentNodeId, nody);
588  return nody;
589  }
NodeId _recurExtractOptPol_(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
void _transferActionIds_(const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
Extract from an ArgMaxSet the associated ActionSet.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)

◆ _transferActionIds_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::_transferActionIds_ ( const ArgMaxSet< GUM_SCALAR, Idx > &  src,
ActionSet dest 
)
private

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 595 of file structuredPlaner_tpl.h.

596  {
597  for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
598  dest += *idi;
599  }

◆ addReward_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::addReward_ ( MultiDimFunctionGraph< GUM_SCALAR > *  function,
Idx  actionId = 0 
)
protectedvirtual

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 395 of file structuredPlaner_tpl.h.

396  {
397  // *****************************************************************************************
398  // ... we multiply the result by the discount factor, ...
399  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
400  newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
401  delete Vold;
402 
403  // *****************************************************************************************
404  // ... and finally add reward
405  newVFunction = operator_->add(newVFunction, RECAST(fmdp_->reward(actionId)));
406 
407  return newVFunction;
408  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ argmaximiseQactions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtual

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 519 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::fmdp(), gum::StructuredPlaner< double >::optimalPolicy(), gum::StructuredPlaner< double >::optimalPolicySize(), gum::StructuredPlaner< double >::vFunction(), and gum::StructuredPlaner< double >::vFunctionSize().

521  {
522  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* newVFunction
523  = qActionsSet.back();
524  qActionsSet.pop_back();
525 
526  while (!qActionsSet.empty()) {
527  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* qAction
528  = qActionsSet.back();
529  qActionsSet.pop_back();
530  newVFunction = operator_->argmaximize(newVFunction, qAction);
531  }
532 
533  return newVFunction;
534  }
IOperatorStrategy< GUM_SCALAR > * operator_
+ Here is the caller graph for this function:

◆ evalPolicy_()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::evalPolicy_ ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 423 of file structuredPlaner_tpl.h.

423  {
424  // *****************************************************************************************
425  // Loop reset
426  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
427  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
428 
429  std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* >
430  argMaxQActionsSet;
431  // *****************************************************************************************
432  // For each action
433  for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
434  MultiDimFunctionGraph< GUM_SCALAR >* qAction = this->evalQaction_(newVFunction, *actionIter);
435 
436  qAction = this->addReward_(qAction);
437 
438  argMaxQActionsSet.push_back(makeArgMax_(qAction, *actionIter));
439  }
440  delete newVFunction;
441 
442 
443  // *****************************************************************************************
444  // Next to evaluate main value function, we take maximise over all action
445  // value, ...
446  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* argMaxVFunction
447  = argmaximiseQactions_(argMaxQActionsSet);
448 
449  // *****************************************************************************************
450  // Next to evaluate main value function, we take maximise over all action
451  // value, ...
452  extractOptimalPolicy_(argMaxVFunction);
453  }
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * makeArgMax_(const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
void extractOptimalPolicy_(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_(std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
Performs argmax_a Q(s,a)
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ evalQaction_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::evalQaction_ ( const MultiDimFunctionGraph< GUM_SCALAR > *  Vold,
Idx  actionId 
)
protectedvirtual

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 341 of file structuredPlaner_tpl.h.

342  {
343  // ******************************************************************************
344  // Initialisation :
345  // Creating a copy of last Vfunction to deduce from the new Qaction
346  // And finding the first var to eleminate (the one at the end)
347 
348  return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
349  }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
Set< const DiscreteVariable *> elVarSeq_
A Set to eleminate primed variables.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ extractOptimalPolicy_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protected

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 542 of file structuredPlaner_tpl.h.

544  {
545  optimalPolicy_->clear();
546 
547  // Insertion des nouvelles variables
548  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
549  = argMaxOptimalValueFunction->variablesSequence().beginSafe();
550  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
551  ++varIter)
552  optimalPolicy_->add(**varIter);
553 
555  optimalPolicy_->manager()->setRootNode(_recurExtractOptPol_(argMaxOptimalValueFunction->root(),
556  argMaxOptimalValueFunction,
557  src2dest));
558 
559  delete argMaxOptimalValueFunction;
560  }
NodeId _recurExtractOptPol_(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ fmdp()

template<typename GUM_SCALAR>
INLINE const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp ( )
inline

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 133 of file structuredPlaner.h.

133 { return fmdp_; }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...

◆ initialize()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::initialize ( const FMDP< GUM_SCALAR > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 224 of file structuredPlaner_tpl.h.

224  {
225  fmdp_ = fmdp;
226 
227  // Determination of the threshold value
229 
230  // Establishement of sequence of variable elemination
231  for (auto varIter = fmdp_->beginVariables(); varIter != fmdp_->endVariables(); ++varIter)
232  elVarSeq_ << fmdp_->main2prime(*varIter);
233 
234  // Initialisation of the value function
235  vFunction_ = operator_->getFunctionInstance();
236  optimalPolicy_ = operator_->getAggregatorInstance();
237  _firstTime_ = true;
238  }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
GUM_SCALAR _threshold_
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
INLINE const FMDP< GUM_SCALAR > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
Set< const DiscreteVariable *> elVarSeq_
A Set to eleminate primed variables.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ initVFunction_()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::initVFunction_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 291 of file structuredPlaner_tpl.h.

291  {
292  vFunction_->copy(*(RECAST(fmdp_->reward())));
293  }
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual void copy(const MultiDimContainer< GUM_SCALAR > &src)
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

◆ makeArgMax_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::makeArgMax_ ( const MultiDimFunctionGraph< GUM_SCALAR > *  Qaction,
Idx  actionId 
)
protected

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 463 of file structuredPlaner_tpl.h.

464  {
465  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* amcpy
466  = operator_->getArgMaxFunctionInstance();
467 
468  // Insertion des nouvelles variables
469  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
470  = qAction->variablesSequence().beginSafe();
471  varIter != qAction->variablesSequence().endSafe();
472  ++varIter)
473  amcpy->add(**varIter);
474 
476  amcpy->manager()->setRootNode(
477  _recurArgMaxCopy_(qAction->root(), actionId, qAction, amcpy, src2dest));
478 
479  delete qAction;
480  return amcpy;
481  }
NodeId _recurArgMaxCopy_(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ makePlanning()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 245 of file structuredPlaner_tpl.h.

245  {
246  if (_firstTime_) {
247  this->initVFunction_();
248  _firstTime_ = false;
249  }
250 
251  // *****************************************************************************************
252  // Main loop
253  // *****************************************************************************************
254  Idx nbIte = 0;
255  GUM_SCALAR gap = _threshold_ + 1;
256  while ((gap > _threshold_) && (nbIte < nbStep)) {
257  ++nbIte;
258 
260 
261  // *****************************************************************************************
262  // Then we compare new value function and the old one
263  MultiDimFunctionGraph< GUM_SCALAR >* deltaV = operator_->subtract(newVFunction, vFunction_);
264  gap = 0;
265 
266  for (deltaV->beginValues(); deltaV->hasValue(); deltaV->nextValue())
267  if (gap < fabs(deltaV->value())) gap = fabs(deltaV->value());
268  delete deltaV;
269 
270  if (verbose_)
271  std::cout << " ------------------- Fin itération n° " << nbIte << std::endl
272  << " Gap : " << gap << " - " << _threshold_ << std::endl;
273 
274  // *****************************************************************************************
275  // And eventually we update pointers for next loop
276  delete vFunction_;
277  vFunction_ = newVFunction;
278  }
279 
280  // *****************************************************************************************
281  // Policy matching value function research
282  // *****************************************************************************************
283  this->evalPolicy_();
284  }
bool verbose_
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
void nextValue() const
Increments the constant safe iterator.
void beginValues() const
Initializes the constant safe iterator on terminal nodes.
GUM_SCALAR _threshold_
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
bool hasValue() const
Indicates if constant safe iterator has reach end of terminal nodes list.
virtual MultiDimFunctionGraph< GUM_SCALAR > * valueIteration_()
Performs a single step of value iteration.
virtual void evalPolicy_()
Perform the required tasks to extract an optimal policy.
const GUM_SCALAR & value() const
Returns the value of the current terminal nodes pointed by the constant safe iterator.
virtual void initVFunction_()
Performs a single step of value iteration.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< GUM_SCALAR > * operator_

◆ maximiseQactions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 356 of file structuredPlaner_tpl.h.

357  {
358  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
359  qActionsSet.pop_back();
360 
361  while (!qActionsSet.empty()) {
362  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
363  qActionsSet.pop_back();
364  newVFunction = operator_->maximize(newVFunction, qAction);
365  }
366 
367  return newVFunction;
368  }
IOperatorStrategy< GUM_SCALAR > * operator_

◆ minimiseFunctions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 375 of file structuredPlaner_tpl.h.

376  {
377  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
378  qActionsSet.pop_back();
379 
380  while (!qActionsSet.empty()) {
381  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
382  qActionsSet.pop_back();
383  newVFunction = operator_->minimize(newVFunction, qAction);
384  }
385 
386  return newVFunction;
387  }
IOperatorStrategy< GUM_SCALAR > * operator_

◆ optimalPolicy()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy ( )
inlinevirtual

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 148 of file structuredPlaner.h.

148  {
149  return optimalPolicy_;
150  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ optimalPolicy2String()

template<typename GUM_SCALAR >
std::string gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String ( )
virtual

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 104 of file structuredPlaner_tpl.h.

104  {
105  // ************************************************************************
106  // Discarding the case where no \pi* have been computed
107  if (!optimalPolicy_ || optimalPolicy_->root() == 0) return "NO OPTIMAL POLICY CALCULATED YET";
108 
109  // ************************************************************************
110  // Initialisation
111 
112  // Declaration of the needed string stream
113  std::stringstream output;
114  std::stringstream terminalStream;
115  std::stringstream nonTerminalStream;
116  std::stringstream arcstream;
117 
118  // First line for the toDot
119  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
120 
121  // Form line for the internal node stream en the terminal node stream
122  terminalStream << "node [shape = box];" << std::endl;
123  nonTerminalStream << "node [shape = ellipse];" << std::endl;
124 
125  // For somme clarity in the final string
126  std::string tab = "\t";
127 
128  // To know if we already checked a node or not
129  Set< NodeId > visited;
130 
131  // FIFO of nodes to visit
132  std::queue< NodeId > fifo;
133 
134  // Loading the FIFO
135  fifo.push(optimalPolicy_->root());
136  visited << optimalPolicy_->root();
137 
138 
139  // ************************************************************************
140  // Main loop
141  while (!fifo.empty()) {
142  // Node to visit
143  NodeId currentNodeId = fifo.front();
144  fifo.pop();
145 
146  // Checking if it is terminal
147  if (optimalPolicy_->isTerminalNode(currentNodeId)) {
148  // Get back the associated ActionSet
149  ActionSet ase = optimalPolicy_->nodeValue(currentNodeId);
150 
151  // Creating a line for this node
152  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
153  << currentNodeId << " - ";
154 
155  // Enumerating and adding to the line the associated optimal actions
156  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe(); valIter != ase.endSafe();
157  ++valIter)
158  terminalStream << fmdp_->actionName(*valIter) << " ";
159 
160  // Terminating line
161  terminalStream << "\"];" << std::endl;
162  continue;
163  }
164 
165  // Either wise
166  {
167  // Geting back the associated internal node
168  const InternalNode* currentNode = optimalPolicy_->node(currentNodeId);
169 
170  // Creating a line in internalnode stream for this node
171  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
172  << currentNodeId << " - " << currentNode->nodeVar()->name() << "\"];"
173  << std::endl;
174 
175  // Going through the sons and agregating them according the the sons Ids
176  HashTable< NodeId, LinkedList< Idx >* > sonMap;
177  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
178  if (!visited.exists(currentNode->son(sonIter))) {
179  fifo.push(currentNode->son(sonIter));
180  visited << currentNode->son(sonIter);
181  }
182  if (!sonMap.exists(currentNode->son(sonIter)))
183  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
184  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
185  }
186 
187  // Adding to the arc stram
188  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe(); ++sonIter) {
189  arcstream << tab << currentNodeId << " -> " << sonIter.key() << " [label=\" ";
190  Link< Idx >* modaIter = sonIter.val()->list();
191  while (modaIter) {
192  arcstream << currentNode->nodeVar()->label(modaIter->element());
193  if (modaIter->nextLink()) arcstream << ", ";
194  modaIter = modaIter->nextLink();
195  }
196  arcstream << "\",color=\"#00ff00\"];" << std::endl;
197  delete sonIter.val();
198  }
199  }
200  }
201 
202  // Terminating
203  output << terminalStream.str() << std::endl
204  << nonTerminalStream.str() << std::endl
205  << arcstream.str() << std::endl
206  << "}" << std::endl;
207 
208  return output.str();
209  }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:600
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:606

◆ optimalPolicySize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::optimalPolicySize ( )
inlinevirtual

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 155 of file structuredPlaner.h.

155  {
156  return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
157  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ spumddInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::spumddInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 79 of file structuredPlaner.h.

81  {
82  return new StructuredPlaner< GUM_SCALAR >(new MDDOperatorStrategy< GUM_SCALAR >(),
83  discountFactor,
84  epsilon,
85  verbose);
86  }

◆ sviInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::sviInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 91 of file structuredPlaner.h.

93  {
94  return new StructuredPlaner< GUM_SCALAR >(new TreeOperatorStrategy< GUM_SCALAR >(),
95  discountFactor,
96  epsilon,
97  verbose);
98  }

◆ valueIteration_()

template<typename GUM_SCALAR >
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::valueIteration_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 308 of file structuredPlaner_tpl.h.

308  {
309  // *****************************************************************************************
310  // Loop reset
311  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
312  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
313 
314  // *****************************************************************************************
315  // For each action
316  std::vector< MultiDimFunctionGraph< GUM_SCALAR >* > qActionsSet;
317  for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
318  MultiDimFunctionGraph< GUM_SCALAR >* qAction = this->evalQaction_(newVFunction, *actionIter);
319  qActionsSet.push_back(qAction);
320  }
321  delete newVFunction;
322 
323  // *****************************************************************************************
324  // Next to evaluate main value function, we take maximise over all action
325  // value, ...
326  newVFunction = this->maximiseQactions_(qActionsSet);
327 
328  // *******************************************************************************************
329  // Next, we eval the new function value
330  newVFunction = this->addReward_(newVFunction);
331 
332  return newVFunction;
333  }
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximiseQactions_(std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
Performs max_a Q(s,a)
IOperatorStrategy< GUM_SCALAR > * operator_

◆ vFunction()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction ( )
inline

Returns a const ptr on the value function computed so far.

Definition at line 138 of file structuredPlaner.h.

138 { return vFunction_; }
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

◆ vFunctionSize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::vFunctionSize ( )
inlinevirtual

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 143 of file structuredPlaner.h.

143 { return vFunction_ != nullptr ? vFunction_->realSize() : 0; }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

Member Data Documentation

◆ _firstTime_

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::_firstTime_
private

Definition at line 367 of file structuredPlaner.h.

◆ _threshold_

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::_threshold_
private

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 366 of file structuredPlaner.h.

◆ discountFactor_

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::discountFactor_
protected

Discount Factor used for infinite horizon planning.

Definition at line 350 of file structuredPlaner.h.

◆ elVarSeq_

template<typename GUM_SCALAR>
Set< const DiscreteVariable* > gum::StructuredPlaner< GUM_SCALAR >::elVarSeq_
protected

A Set to eleminate primed variables.

Definition at line 345 of file structuredPlaner.h.

◆ fmdp_

template<typename GUM_SCALAR>
const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp_
protected

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 325 of file structuredPlaner.h.

◆ operator_

template<typename GUM_SCALAR>
IOperatorStrategy< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::operator_
protected

Definition at line 352 of file structuredPlaner.h.

◆ optimalPolicy_

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy_
protected

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 340 of file structuredPlaner.h.

◆ verbose_

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::verbose_
protected

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 358 of file structuredPlaner.h.

◆ vFunction_

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction_
protected

The Value Function computed iteratively.

Definition at line 330 of file structuredPlaner.h.


The documentation for this class was generated from the following files: