aGrUM  0.14.2
gum::StructuredPlaner< GUM_SCALAR > Class Template Reference

<agrum/FMDP/planning/structuredPlaner.h> More...

#include <structuredPlaner.h>

+ Inheritance diagram for gum::StructuredPlaner< GUM_SCALAR >:
+ Collaboration diagram for gum::StructuredPlaner< GUM_SCALAR >:

Public Member Functions

Datastructure access methods
INLINE const FMDP< GUM_SCALAR > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 
Planning Methods
virtual void initialize (const FMDP< GUM_SCALAR > *fmdp)
 Initializes data structure needed for making the planning. More...
 
virtual void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 

Static Public Member Functions

static StructuredPlaner< GUM_SCALAR > * spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< GUM_SCALAR > * sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< GUM_SCALAR > * _fmdp
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
 The associated optimal policy. More...
 
Set< const DiscreteVariable *> _elVarSeq
 A Set to eleminate primed variables. More...
 
GUM_SCALAR _discountFactor
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< GUM_SCALAR > * _operator
 
bool _verbose
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void _initVFunction ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _valueIteration ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _evalQaction (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _maximiseQactions (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _minimiseFunctions (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _addReward (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 

Constructor & destructor.

 StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
 Default constructor. More...
 
virtual ~StructuredPlaner ()
 Default destructor. More...
 

Optimal policy extraction methods

virtual void _evalPolicy ()
 Perform the required tasks to extract an optimal policy. More...
 
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _makeArgMax (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
 Performs argmax_a Q(s,a) More...
 
void _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 
NodeId __recurArgMaxCopy (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
NodeId __recurExtractOptPol (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
void __transferActionIds (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
 Extract from an ArgMaxSet the associated ActionSet. More...
 

Detailed Description

template<typename GUM_SCALAR>
class gum::StructuredPlaner< GUM_SCALAR >

<agrum/FMDP/planning/structuredPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a structure value iteration planning

Pure virtual functions : _regress, _maximize, _argmaximize, _add and _subtract are a priori the ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)

Definition at line 67 of file structuredPlaner.h.

Constructor & Destructor Documentation

◆ StructuredPlaner()

template<typename GUM_SCALAR>
INLINE gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner ( IOperatorStrategy< GUM_SCALAR > *  opi,
GUM_SCALAR  discountFactor,
GUM_SCALAR  epsilon,
bool  verbose 
)
protected

Default constructor.

Definition at line 61 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::sviInstance().

65  :
66  _discountFactor(discountFactor),
67  _operator(opi), _verbose(verbose) {
68  GUM_CONSTRUCTOR(StructuredPlaner);
69 
70  __threshold = epsilon;
71  _vFunction = nullptr;
72  _optimalPolicy = nullptr;
73  }
GUM_SCALAR _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * _operator
bool _verbose
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
GUM_SCALAR __threshold
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ ~StructuredPlaner()

template<typename GUM_SCALAR >
INLINE gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner ( )
virtual

Default destructor.

Definition at line 79 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::sviInstance().

79  {
80  GUM_DESTRUCTOR(StructuredPlaner);
81 
82  if (_vFunction) { delete _vFunction; }
83 
84  if (_optimalPolicy) delete _optimalPolicy;
85 
86  delete _operator;
87  }
IOperatorStrategy< GUM_SCALAR > * _operator
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

Member Function Documentation

◆ __recurArgMaxCopy()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::__recurArgMaxCopy ( NodeId  currentNodeId,
Idx  actionId,
const MultiDimFunctionGraph< GUM_SCALAR > *  src,
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxCpy,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 504 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

510  {
511  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
512 
513  NodeId nody;
514  if (src->isTerminalNode(currentNodeId)) {
515  ArgMaxSet< GUM_SCALAR, Idx > leaf(src->nodeValue(currentNodeId), actionId);
516  nody = argMaxCpy->manager()->addTerminalNode(leaf);
517  } else {
518  const InternalNode* currentNode = src->node(currentNodeId);
519  NodeId* sonsMap = static_cast< NodeId* >(
520  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
521  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
522  sonsMap[moda] = __recurArgMaxCopy(
523  currentNode->son(moda), actionId, src, argMaxCpy, visitedNodes);
524  nody =
525  argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
526  }
527  visitedNodes.insert(currentNodeId, nody);
528  return nody;
529  }
bool isTerminalNode(const NodeId &node) const
Indicates if given node is terminal or not.
const InternalNode * node(NodeId n) const
Returns internalNode structure associated to that nodeId.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
const GUM_SCALAR & nodeValue(NodeId n) const
Returns value associated to given node.
NodeId __recurArgMaxCopy(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)
+ Here is the caller graph for this function:

◆ __recurExtractOptPol()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::__recurExtractOptPol ( NodeId  currentNodeId,
const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxOptVFunc,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 586 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

590  {
591  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
592 
593  NodeId nody;
594  if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
595  ActionSet leaf;
596  __transferActionIds(argMaxOptVFunc->nodeValue(currentNodeId), leaf);
597  nody = _optimalPolicy->manager()->addTerminalNode(leaf);
598  } else {
599  const InternalNode* currentNode = argMaxOptVFunc->node(currentNodeId);
600  NodeId* sonsMap = static_cast< NodeId* >(
601  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
602  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
603  sonsMap[moda] = __recurExtractOptPol(
604  currentNode->son(moda), argMaxOptVFunc, visitedNodes);
605  nody = _optimalPolicy->manager()->addInternalNode(currentNode->nodeVar(),
606  sonsMap);
607  }
608  visitedNodes.insert(currentNodeId, nody);
609  return nody;
610  }
NodeId __recurExtractOptPol(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
void __transferActionIds(const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
Extract from an ArgMaxSet the associated ActionSet.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)
+ Here is the caller graph for this function:

◆ __transferActionIds()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::__transferActionIds ( const ArgMaxSet< GUM_SCALAR, Idx > &  src,
ActionSet dest 
)
private

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 616 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

617  {
618  for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
619  dest += *idi;
620  }
+ Here is the caller graph for this function:

◆ _addReward()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_addReward ( MultiDimFunctionGraph< GUM_SCALAR > *  function,
Idx  actionId = 0 
)
protectedvirtual

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 405 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

406  {
407  // *****************************************************************************************
408  // ... we multiply the result by the discount factor, ...
410  _operator->getFunctionInstance();
411  newVFunction->copyAndMultiplyByScalar(*Vold, this->_discountFactor);
412  delete Vold;
413 
414  // *****************************************************************************************
415  // ... and finally add reward
416  newVFunction = _operator->add(newVFunction, RECAST(_fmdp->reward(actionId)));
417 
418  return newVFunction;
419  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
GUM_SCALAR _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
+ Here is the caller graph for this function:

◆ _argmaximiseQactions()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::_argmaximiseQactions ( std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtual

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 537 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

540  {
541  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
542  newVFunction = qActionsSet.back();
543  qActionsSet.pop_back();
544 
545  while (!qActionsSet.empty()) {
546  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
547  qAction = qActionsSet.back();
548  qActionsSet.pop_back();
549  newVFunction = _operator->argmaximize(newVFunction, qAction);
550  }
551 
552  return newVFunction;
553  }
IOperatorStrategy< GUM_SCALAR > * _operator
+ Here is the caller graph for this function:

◆ _evalPolicy()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::_evalPolicy ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 434 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

434  {
435  // *****************************************************************************************
436  // Loop reset
438  _operator->getFunctionInstance();
439  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
440 
441  std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >,
442  SetTerminalNodePolicy >* >
443  argMaxQActionsSet;
444  // *****************************************************************************************
445  // For each action
446  for (auto actionIter = _fmdp->beginActions();
447  actionIter != _fmdp->endActions();
448  ++actionIter) {
450  this->_evalQaction(newVFunction, *actionIter);
451 
452  qAction = this->_addReward(qAction);
453 
454  argMaxQActionsSet.push_back(_makeArgMax(qAction, *actionIter));
455  }
456  delete newVFunction;
457 
458 
459  // *****************************************************************************************
460  // Next to evaluate main value function, we take maximise over all action
461  // value, ...
462  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
463  argMaxVFunction = _argmaximiseQactions(argMaxQActionsSet);
464 
465  // *****************************************************************************************
466  // Next to evaluate main value function, we take maximise over all action
467  // value, ...
468  _extractOptimalPolicy(argMaxVFunction);
469  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions(std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
Performs argmax_a Q(s,a)
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< GUM_SCALAR > * _operator
void _extractOptimalPolicy(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * _evalQaction(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _makeArgMax(const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< GUM_SCALAR > * _addReward(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ _evalQaction()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_evalQaction ( const MultiDimFunctionGraph< GUM_SCALAR > *  Vold,
Idx  actionId 
)
protectedvirtual

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 350 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

351  {
352  // ******************************************************************************
353  // Initialisation :
354  // Creating a copy of last Vfunction to deduce from the new Qaction
355  // And finding the first var to eleminate (the one at the end)
356 
357  return _operator->regress(Vold, actionId, this->_fmdp, this->_elVarSeq);
358  }
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
Set< const DiscreteVariable *> _elVarSeq
A Set to eleminate primed variables.
+ Here is the caller graph for this function:

◆ _extractOptimalPolicy()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::_extractOptimalPolicy ( const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protected

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 561 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

564  {
565  _optimalPolicy->clear();
566 
567  // Insertion des nouvelles variables
568  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
569  argMaxOptimalValueFunction->variablesSequence().beginSafe();
570  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
571  ++varIter)
572  _optimalPolicy->add(**varIter);
573 
575  _optimalPolicy->manager()->setRootNode(__recurExtractOptPol(
576  argMaxOptimalValueFunction->root(), argMaxOptimalValueFunction, src2dest));
577 
578  delete argMaxOptimalValueFunction;
579  }
NodeId __recurExtractOptPol(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
+ Here is the caller graph for this function:

◆ _initVFunction()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::_initVFunction ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 295 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

295  {
296  _vFunction->copy(*(RECAST(_fmdp->reward())));
297  }
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual void copy(const MultiDimContainer< GUM_SCALAR > &src)
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ _makeArgMax()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::_makeArgMax ( const MultiDimFunctionGraph< GUM_SCALAR > *  Qaction,
Idx  actionId 
)
protected

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 479 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

480  {
481  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
482  amcpy = _operator->getArgMaxFunctionInstance();
483 
484  // Insertion des nouvelles variables
485  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
486  qAction->variablesSequence().beginSafe();
487  varIter != qAction->variablesSequence().endSafe();
488  ++varIter)
489  amcpy->add(**varIter);
490 
492  amcpy->manager()->setRootNode(
493  __recurArgMaxCopy(qAction->root(), actionId, qAction, amcpy, src2dest));
494 
495  delete qAction;
496  return amcpy;
497  }
IOperatorStrategy< GUM_SCALAR > * _operator
NodeId __recurArgMaxCopy(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
+ Here is the caller graph for this function:

◆ _maximiseQactions()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_maximiseQactions ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 366 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

367  {
368  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
369  qActionsSet.pop_back();
370 
371  while (!qActionsSet.empty()) {
372  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
373  qActionsSet.pop_back();
374  newVFunction = _operator->maximize(newVFunction, qAction);
375  }
376 
377  return newVFunction;
378  }
IOperatorStrategy< GUM_SCALAR > * _operator
+ Here is the caller graph for this function:

◆ _minimiseFunctions()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_minimiseFunctions ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 386 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

387  {
388  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
389  qActionsSet.pop_back();
390 
391  while (!qActionsSet.empty()) {
392  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
393  qActionsSet.pop_back();
394  newVFunction = _operator->minimize(newVFunction, qAction);
395  }
396 
397  return newVFunction;
398  }
IOperatorStrategy< GUM_SCALAR > * _operator
+ Here is the caller graph for this function:

◆ _valueIteration()

template<typename GUM_SCALAR >
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_valueIteration ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 313 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

313  {
314  // *****************************************************************************************
315  // Loop reset
317  _operator->getFunctionInstance();
318  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
319 
320  // *****************************************************************************************
321  // For each action
322  std::vector< MultiDimFunctionGraph< GUM_SCALAR >* > qActionsSet;
323  for (auto actionIter = _fmdp->beginActions();
324  actionIter != _fmdp->endActions();
325  ++actionIter) {
327  this->_evalQaction(newVFunction, *actionIter);
328  qActionsSet.push_back(qAction);
329  }
330  delete newVFunction;
331 
332  // *****************************************************************************************
333  // Next to evaluate main value function, we take maximise over all action
334  // value, ...
335  newVFunction = this->_maximiseQactions(qActionsSet);
336 
337  // *******************************************************************************************
338  // Next, we eval the new function value
339  newVFunction = this->_addReward(newVFunction);
340 
341  return newVFunction;
342  }
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * _evalQaction(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > * _addReward(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR > * _maximiseQactions(std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
Performs max_a Q(s,a)
+ Here is the caller graph for this function:

◆ fmdp()

template<typename GUM_SCALAR>
INLINE const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp ( )
inline

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 134 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

134 { return _fmdp; }
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
+ Here is the caller graph for this function:

◆ initialize()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::initialize ( const FMDP< GUM_SCALAR > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 226 of file structuredPlaner_tpl.h.

Referenced by gum::AdaptiveRMaxPlaner::initialize(), and gum::StructuredPlaner< double >::optimalPolicySize().

226  {
227  _fmdp = fmdp;
228 
229  // Determination of the threshold value
231 
232  // Establishement of sequence of variable elemination
233  for (auto varIter = _fmdp->beginVariables(); varIter != _fmdp->endVariables();
234  ++varIter)
235  _elVarSeq << _fmdp->main2prime(*varIter);
236 
237  // Initialisation of the value function
238  _vFunction = _operator->getFunctionInstance();
239  _optimalPolicy = _operator->getAggregatorInstance();
240  __firstTime = true;
241  }
GUM_SCALAR _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
GUM_SCALAR __threshold
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
Set< const DiscreteVariable *> _elVarSeq
A Set to eleminate primed variables.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
INLINE const FMDP< GUM_SCALAR > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ makePlanning()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 248 of file structuredPlaner_tpl.h.

Referenced by gum::AdaptiveRMaxPlaner::makePlanning(), and gum::StructuredPlaner< double >::optimalPolicySize().

248  {
249  if (__firstTime) {
250  this->_initVFunction();
251  __firstTime = false;
252  }
253 
254  // *****************************************************************************************
255  // Main loop
256  // *****************************************************************************************
257  Idx nbIte = 0;
258  GUM_SCALAR gap = __threshold + 1;
259  while ((gap > __threshold) && (nbIte < nbStep)) {
260  ++nbIte;
261 
263 
264  // *****************************************************************************************
265  // Then we compare new value function and the old one
267  _operator->subtract(newVFunction, _vFunction);
268  gap = 0;
269 
270  for (deltaV->beginValues(); deltaV->hasValue(); deltaV->nextValue())
271  if (gap < fabs(deltaV->value())) gap = fabs(deltaV->value());
272  delete deltaV;
273 
274  if (_verbose)
275  std::cout << " ------------------- Fin itération n° " << nbIte << std::endl
276  << " Gap : " << gap << " - " << __threshold << std::endl;
277 
278  // *****************************************************************************************
279  // And eventually we update pointers for next loop
280  delete _vFunction;
281  _vFunction = newVFunction;
282  }
283 
284  // *****************************************************************************************
285  // Policy matching value function research
286  // *****************************************************************************************
287  this->_evalPolicy();
288  }
void nextValue() const
Increments the constant safe iterator.
void beginValues() const
Initializes the constant safe iterator on terminal nodes.
virtual void _evalPolicy()
Perform the required tasks to extract an optimal policy.
IOperatorStrategy< GUM_SCALAR > * _operator
bool _verbose
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
virtual MultiDimFunctionGraph< GUM_SCALAR > * _valueIteration()
Performs a single step of value iteration.
virtual void _initVFunction()
Performs a single step of value iteration.
GUM_SCALAR __threshold
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
bool hasValue() const
Indicates if constant safe iterator has reach end of terminal nodes list.
const GUM_SCALAR & value() const
Returns the value of the current terminal nodes pointed by the constant safe iterator.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ optimalPolicy()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy ( )
inlinevirtual

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 154 of file structuredPlaner.h.

154  {
155  return _optimalPolicy;
156  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ optimalPolicy2String()

template<typename GUM_SCALAR >
std::string gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String ( )
virtual

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 102 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

102  {
103  // ************************************************************************
104  // Discarding the case where no \pi* have been computed
105  if (!_optimalPolicy || _optimalPolicy->root() == 0)
106  return "NO OPTIMAL POLICY CALCULATED YET";
107 
108  // ************************************************************************
109  // Initialisation
110 
111  // Declaration of the needed string stream
112  std::stringstream output;
113  std::stringstream terminalStream;
114  std::stringstream nonTerminalStream;
115  std::stringstream arcstream;
116 
117  // First line for the toDot
118  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
119 
120  // Form line for the internal node stream en the terminal node stream
121  terminalStream << "node [shape = box];" << std::endl;
122  nonTerminalStream << "node [shape = ellipse];" << std::endl;
123 
124  // For somme clarity in the final string
125  std::string tab = "\t";
126 
127  // To know if we already checked a node or not
128  Set< NodeId > visited;
129 
130  // FIFO of nodes to visit
131  std::queue< NodeId > fifo;
132 
133  // Loading the FIFO
134  fifo.push(_optimalPolicy->root());
135  visited << _optimalPolicy->root();
136 
137 
138  // ************************************************************************
139  // Main loop
140  while (!fifo.empty()) {
141  // Node to visit
142  NodeId currentNodeId = fifo.front();
143  fifo.pop();
144 
145  // Checking if it is terminal
146  if (_optimalPolicy->isTerminalNode(currentNodeId)) {
147  // Get back the associated ActionSet
148  ActionSet ase = _optimalPolicy->nodeValue(currentNodeId);
149 
150  // Creating a line for this node
151  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId
152  << " [label=\"" << currentNodeId << " - ";
153 
154  // Enumerating and adding to the line the associated optimal actions
155  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe();
156  valIter != ase.endSafe();
157  ++valIter)
158  terminalStream << _fmdp->actionName(*valIter) << " ";
159 
160  // Terminating line
161  terminalStream << "\"];" << std::endl;
162  continue;
163  }
164 
165  // Either wise
166  {
167  // Geting back the associated internal node
168  const InternalNode* currentNode = _optimalPolicy->node(currentNodeId);
169 
170  // Creating a line in internalnode stream for this node
171  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId
172  << " [label=\"" << currentNodeId << " - "
173  << currentNode->nodeVar()->name() << "\"];" << std::endl;
174 
175  // Going through the sons and agregating them according the the sons Ids
176  HashTable< NodeId, LinkedList< Idx >* > sonMap;
177  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
178  if (!visited.exists(currentNode->son(sonIter))) {
179  fifo.push(currentNode->son(sonIter));
180  visited << currentNode->son(sonIter);
181  }
182  if (!sonMap.exists(currentNode->son(sonIter)))
183  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
184  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
185  }
186 
187  // Adding to the arc stram
188  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe();
189  ++sonIter) {
190  arcstream << tab << currentNodeId << " -> " << sonIter.key()
191  << " [label=\" ";
192  Link< Idx >* modaIter = sonIter.val()->list();
193  while (modaIter) {
194  arcstream << currentNode->nodeVar()->label(modaIter->element());
195  if (modaIter->nextLink()) arcstream << ", ";
196  modaIter = modaIter->nextLink();
197  }
198  arcstream << "\",color=\"#00ff00\"];" << std::endl;
199  delete sonIter.val();
200  }
201  }
202  }
203 
204  // Terminating
205  output << terminalStream.str() << std::endl
206  << nonTerminalStream.str() << std::endl
207  << arcstream.str() << std::endl
208  << "}" << std::endl;
209 
210  return output.str();
211  }
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:604
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:610
+ Here is the caller graph for this function:

◆ optimalPolicySize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::optimalPolicySize ( )
inlinevirtual

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 161 of file structuredPlaner.h.

161  {
162  return _optimalPolicy != nullptr ? _optimalPolicy->realSize() : 0;
163  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ spumddInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::spumddInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 77 of file structuredPlaner.h.

Referenced by gum::SDYNA::RandomMDDInstance(), and gum::SDYNA::spimddiInstance().

79  {
80  return new StructuredPlaner< GUM_SCALAR >(
81  new MDDOperatorStrategy< GUM_SCALAR >(),
82  discountFactor,
83  epsilon,
84  verbose);
85  }
+ Here is the caller graph for this function:

◆ sviInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::sviInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 91 of file structuredPlaner.h.

Referenced by gum::SDYNA::RandomTreeInstance(), and gum::SDYNA::spitiInstance().

93  {
94  return new StructuredPlaner< GUM_SCALAR >(
95  new TreeOperatorStrategy< GUM_SCALAR >(),
96  discountFactor,
97  epsilon,
98  verbose);
99  }
+ Here is the caller graph for this function:

◆ vFunction()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction ( )
inline

Returns a const ptr on the value function computed so far.

Definition at line 139 of file structuredPlaner.h.

139  {
140  return _vFunction;
141  }
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.

◆ vFunctionSize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::vFunctionSize ( )
inlinevirtual

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 146 of file structuredPlaner.h.

146  {
147  return _vFunction != nullptr ? _vFunction->realSize() : 0;
148  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.

Member Data Documentation

◆ __firstTime

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::__firstTime
private

Definition at line 377 of file structuredPlaner.h.

◆ __threshold

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::__threshold
private

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 376 of file structuredPlaner.h.

◆ _discountFactor

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::_discountFactor
protected

Discount Factor used for infinite horizon planning.

Definition at line 360 of file structuredPlaner.h.

◆ _elVarSeq

template<typename GUM_SCALAR>
Set< const DiscreteVariable* > gum::StructuredPlaner< GUM_SCALAR >::_elVarSeq
protected

A Set to eleminate primed variables.

Definition at line 355 of file structuredPlaner.h.

◆ _fmdp

template<typename GUM_SCALAR>
const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::_fmdp
protected

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 335 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::fmdp().

◆ _operator

template<typename GUM_SCALAR>
IOperatorStrategy< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::_operator
protected

Definition at line 362 of file structuredPlaner.h.

◆ _optimalPolicy

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy
protected

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the _fmdp is required. optimalPolicy2String do this job.

Definition at line 350 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicy(), and gum::StructuredPlaner< double >::optimalPolicySize().

◆ _verbose

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::_verbose
protected

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 368 of file structuredPlaner.h.

◆ _vFunction

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::_vFunction
protected

The Value Function computed iteratively.

Definition at line 340 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::vFunction(), and gum::StructuredPlaner< double >::vFunctionSize().


The documentation for this class was generated from the following files: