aGrUM  0.16.0
gum::StructuredPlaner< GUM_SCALAR > Class Template Reference

<agrum/FMDP/planning/structuredPlaner.h> More...

#include <structuredPlaner.h>

+ Inheritance diagram for gum::StructuredPlaner< GUM_SCALAR >:
+ Collaboration diagram for gum::StructuredPlaner< GUM_SCALAR >:

Public Member Functions

Datastructure access methods
INLINE const FMDP< GUM_SCALAR > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 
Planning Methods
virtual void initialize (const FMDP< GUM_SCALAR > *fmdp)
 Initializes data structure needed for making the planning. More...
 
virtual void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 

Static Public Member Functions

static StructuredPlaner< GUM_SCALAR > * spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< GUM_SCALAR > * sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< GUM_SCALAR > * _fmdp
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
 The associated optimal policy. More...
 
Set< const DiscreteVariable *> _elVarSeq
 A Set to eleminate primed variables. More...
 
GUM_SCALAR _discountFactor
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< GUM_SCALAR > * _operator
 
bool _verbose
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void _initVFunction ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _valueIteration ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _evalQaction (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _maximiseQactions (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _minimiseFunctions (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< GUM_SCALAR > * _addReward (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 

Constructor & destructor.

 StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
 Default constructor. More...
 
virtual ~StructuredPlaner ()
 Default destructor. More...
 

Optimal policy extraction methods

virtual void _evalPolicy ()
 Perform the required tasks to extract an optimal policy. More...
 
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _makeArgMax (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
 Performs argmax_a Q(s,a) More...
 
void _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 
NodeId __recurArgMaxCopy (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
NodeId __recurExtractOptPol (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy. More...
 
void __transferActionIds (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
 Extract from an ArgMaxSet the associated ActionSet. More...
 

Detailed Description

template<typename GUM_SCALAR>
class gum::StructuredPlaner< GUM_SCALAR >

<agrum/FMDP/planning/structuredPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a structure value iteration planning

Pure virtual functions : _regress, _maximize, _argmaximize, _add and _subtract are a priori the ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)

Definition at line 70 of file structuredPlaner.h.

Constructor & Destructor Documentation

◆ StructuredPlaner()

template<typename GUM_SCALAR>
INLINE gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner ( IOperatorStrategy< GUM_SCALAR > *  opi,
GUM_SCALAR  discountFactor,
GUM_SCALAR  epsilon,
bool  verbose 
)
protected

Default constructor.

Definition at line 64 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::sviInstance().

68  :
69  _discountFactor(discountFactor),
70  _operator(opi), _verbose(verbose) {
71  GUM_CONSTRUCTOR(StructuredPlaner);
72 
73  __threshold = epsilon;
74  _vFunction = nullptr;
75  _optimalPolicy = nullptr;
76  }
GUM_SCALAR _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * _operator
bool _verbose
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
GUM_SCALAR __threshold
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ ~StructuredPlaner()

template<typename GUM_SCALAR >
INLINE gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner ( )
virtual

Default destructor.

Definition at line 82 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::sviInstance().

82  {
83  GUM_DESTRUCTOR(StructuredPlaner);
84 
85  if (_vFunction) { delete _vFunction; }
86 
87  if (_optimalPolicy) delete _optimalPolicy;
88 
89  delete _operator;
90  }
IOperatorStrategy< GUM_SCALAR > * _operator
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

Member Function Documentation

◆ __recurArgMaxCopy()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::__recurArgMaxCopy ( NodeId  currentNodeId,
Idx  actionId,
const MultiDimFunctionGraph< GUM_SCALAR > *  src,
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxCpy,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 507 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

513  {
514  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
515 
516  NodeId nody;
517  if (src->isTerminalNode(currentNodeId)) {
518  ArgMaxSet< GUM_SCALAR, Idx > leaf(src->nodeValue(currentNodeId), actionId);
519  nody = argMaxCpy->manager()->addTerminalNode(leaf);
520  } else {
521  const InternalNode* currentNode = src->node(currentNodeId);
522  NodeId* sonsMap = static_cast< NodeId* >(
523  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
524  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
525  sonsMap[moda] = __recurArgMaxCopy(
526  currentNode->son(moda), actionId, src, argMaxCpy, visitedNodes);
527  nody =
528  argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
529  }
530  visitedNodes.insert(currentNodeId, nody);
531  return nody;
532  }
bool isTerminalNode(const NodeId &node) const
Indicates if given node is terminal or not.
const InternalNode * node(NodeId n) const
Returns internalNode structure associated to that nodeId.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
const GUM_SCALAR & nodeValue(NodeId n) const
Returns value associated to given node.
NodeId __recurArgMaxCopy(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:98
#define SOA_ALLOCATE(x)
+ Here is the caller graph for this function:

◆ __recurExtractOptPol()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::__recurExtractOptPol ( NodeId  currentNodeId,
const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  argMaxOptVFunc,
HashTable< NodeId, NodeId > &  visitedNodes 
)
private

Recursion part for the createArgMaxCopy.

Definition at line 589 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

593  {
594  if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
595 
596  NodeId nody;
597  if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
598  ActionSet leaf;
599  __transferActionIds(argMaxOptVFunc->nodeValue(currentNodeId), leaf);
600  nody = _optimalPolicy->manager()->addTerminalNode(leaf);
601  } else {
602  const InternalNode* currentNode = argMaxOptVFunc->node(currentNodeId);
603  NodeId* sonsMap = static_cast< NodeId* >(
604  SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
605  for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
606  sonsMap[moda] = __recurExtractOptPol(
607  currentNode->son(moda), argMaxOptVFunc, visitedNodes);
608  nody = _optimalPolicy->manager()->addInternalNode(currentNode->nodeVar(),
609  sonsMap);
610  }
611  visitedNodes.insert(currentNodeId, nody);
612  return nody;
613  }
NodeId __recurExtractOptPol(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
bool exists(const Key &key) const
Checks whether there exists an element with a given key in the hashtable.
void __transferActionIds(const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
Extract from an ArgMaxSet the associated ActionSet.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Size NodeId
Type for node ids.
Definition: graphElements.h:98
#define SOA_ALLOCATE(x)
+ Here is the caller graph for this function:

◆ __transferActionIds()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::__transferActionIds ( const ArgMaxSet< GUM_SCALAR, Idx > &  src,
ActionSet dest 
)
private

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 619 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

620  {
621  for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
622  dest += *idi;
623  }
+ Here is the caller graph for this function:

◆ _addReward()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_addReward ( MultiDimFunctionGraph< GUM_SCALAR > *  function,
Idx  actionId = 0 
)
protectedvirtual

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 408 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

409  {
410  // *****************************************************************************************
411  // ... we multiply the result by the discount factor, ...
413  _operator->getFunctionInstance();
414  newVFunction->copyAndMultiplyByScalar(*Vold, this->_discountFactor);
415  delete Vold;
416 
417  // *****************************************************************************************
418  // ... and finally add reward
419  newVFunction = _operator->add(newVFunction, RECAST(_fmdp->reward(actionId)));
420 
421  return newVFunction;
422  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
GUM_SCALAR _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
+ Here is the caller graph for this function:

◆ _argmaximiseQactions()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::_argmaximiseQactions ( std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtual

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 540 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

543  {
544  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
545  newVFunction = qActionsSet.back();
546  qActionsSet.pop_back();
547 
548  while (!qActionsSet.empty()) {
549  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
550  qAction = qActionsSet.back();
551  qActionsSet.pop_back();
552  newVFunction = _operator->argmaximize(newVFunction, qAction);
553  }
554 
555  return newVFunction;
556  }
IOperatorStrategy< GUM_SCALAR > * _operator
+ Here is the caller graph for this function:

◆ _evalPolicy()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::_evalPolicy ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 437 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

437  {
438  // *****************************************************************************************
439  // Loop reset
441  _operator->getFunctionInstance();
442  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
443 
444  std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >,
445  SetTerminalNodePolicy >* >
446  argMaxQActionsSet;
447  // *****************************************************************************************
448  // For each action
449  for (auto actionIter = _fmdp->beginActions();
450  actionIter != _fmdp->endActions();
451  ++actionIter) {
453  this->_evalQaction(newVFunction, *actionIter);
454 
455  qAction = this->_addReward(qAction);
456 
457  argMaxQActionsSet.push_back(_makeArgMax(qAction, *actionIter));
458  }
459  delete newVFunction;
460 
461 
462  // *****************************************************************************************
463  // Next to evaluate main value function, we take maximise over all action
464  // value, ...
465  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
466  argMaxVFunction = _argmaximiseQactions(argMaxQActionsSet);
467 
468  // *****************************************************************************************
469  // Next to evaluate main value function, we take maximise over all action
470  // value, ...
471  _extractOptimalPolicy(argMaxVFunction);
472  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions(std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
Performs argmax_a Q(s,a)
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< GUM_SCALAR > * _operator
void _extractOptimalPolicy(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * _evalQaction(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * _makeArgMax(const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< GUM_SCALAR > * _addReward(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ _evalQaction()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_evalQaction ( const MultiDimFunctionGraph< GUM_SCALAR > *  Vold,
Idx  actionId 
)
protectedvirtual

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 353 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

354  {
355  // ******************************************************************************
356  // Initialisation :
357  // Creating a copy of last Vfunction to deduce from the new Qaction
358  // And finding the first var to eleminate (the one at the end)
359 
360  return _operator->regress(Vold, actionId, this->_fmdp, this->_elVarSeq);
361  }
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
Set< const DiscreteVariable *> _elVarSeq
A Set to eleminate primed variables.
+ Here is the caller graph for this function:

◆ _extractOptimalPolicy()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::_extractOptimalPolicy ( const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protected

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 564 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

567  {
568  _optimalPolicy->clear();
569 
570  // Insertion des nouvelles variables
571  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
572  argMaxOptimalValueFunction->variablesSequence().beginSafe();
573  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
574  ++varIter)
575  _optimalPolicy->add(**varIter);
576 
578  _optimalPolicy->manager()->setRootNode(__recurExtractOptPol(
579  argMaxOptimalValueFunction->root(), argMaxOptimalValueFunction, src2dest));
580 
581  delete argMaxOptimalValueFunction;
582  }
NodeId __recurExtractOptPol(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
+ Here is the caller graph for this function:

◆ _initVFunction()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::_initVFunction ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 298 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

298  {
299  _vFunction->copy(*(RECAST(_fmdp->reward())));
300  }
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual void copy(const MultiDimContainer< GUM_SCALAR > &src)
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ _makeArgMax()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::_makeArgMax ( const MultiDimFunctionGraph< GUM_SCALAR > *  Qaction,
Idx  actionId 
)
protected

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 482 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

483  {
484  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
485  amcpy = _operator->getArgMaxFunctionInstance();
486 
487  // Insertion des nouvelles variables
488  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
489  qAction->variablesSequence().beginSafe();
490  varIter != qAction->variablesSequence().endSafe();
491  ++varIter)
492  amcpy->add(**varIter);
493 
495  amcpy->manager()->setRootNode(
496  __recurArgMaxCopy(qAction->root(), actionId, qAction, amcpy, src2dest));
497 
498  delete qAction;
499  return amcpy;
500  }
IOperatorStrategy< GUM_SCALAR > * _operator
NodeId __recurArgMaxCopy(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
+ Here is the caller graph for this function:

◆ _maximiseQactions()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_maximiseQactions ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 369 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

370  {
371  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
372  qActionsSet.pop_back();
373 
374  while (!qActionsSet.empty()) {
375  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
376  qActionsSet.pop_back();
377  newVFunction = _operator->maximize(newVFunction, qAction);
378  }
379 
380  return newVFunction;
381  }
IOperatorStrategy< GUM_SCALAR > * _operator
+ Here is the caller graph for this function:

◆ _minimiseFunctions()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_minimiseFunctions ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &  qActionsSet)
protectedvirtual

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 389 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

390  {
391  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
392  qActionsSet.pop_back();
393 
394  while (!qActionsSet.empty()) {
395  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
396  qActionsSet.pop_back();
397  newVFunction = _operator->minimize(newVFunction, qAction);
398  }
399 
400  return newVFunction;
401  }
IOperatorStrategy< GUM_SCALAR > * _operator
+ Here is the caller graph for this function:

◆ _valueIteration()

template<typename GUM_SCALAR >
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::_valueIteration ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 316 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

316  {
317  // *****************************************************************************************
318  // Loop reset
320  _operator->getFunctionInstance();
321  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
322 
323  // *****************************************************************************************
324  // For each action
325  std::vector< MultiDimFunctionGraph< GUM_SCALAR >* > qActionsSet;
326  for (auto actionIter = _fmdp->beginActions();
327  actionIter != _fmdp->endActions();
328  ++actionIter) {
330  this->_evalQaction(newVFunction, *actionIter);
331  qActionsSet.push_back(qAction);
332  }
333  delete newVFunction;
334 
335  // *****************************************************************************************
336  // Next to evaluate main value function, we take maximise over all action
337  // value, ...
338  newVFunction = this->_maximiseQactions(qActionsSet);
339 
340  // *******************************************************************************************
341  // Next, we eval the new function value
342  newVFunction = this->_addReward(newVFunction);
343 
344  return newVFunction;
345  }
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * _evalQaction(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > * _addReward(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR > * _maximiseQactions(std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
Performs max_a Q(s,a)
+ Here is the caller graph for this function:

◆ fmdp()

template<typename GUM_SCALAR>
INLINE const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp ( )
inline

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 137 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

137 { return _fmdp; }
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
+ Here is the caller graph for this function:

◆ initialize()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::initialize ( const FMDP< GUM_SCALAR > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 229 of file structuredPlaner_tpl.h.

Referenced by gum::AdaptiveRMaxPlaner::initialize(), and gum::StructuredPlaner< double >::optimalPolicySize().

229  {
230  _fmdp = fmdp;
231 
232  // Determination of the threshold value
234 
235  // Establishement of sequence of variable elemination
236  for (auto varIter = _fmdp->beginVariables(); varIter != _fmdp->endVariables();
237  ++varIter)
238  _elVarSeq << _fmdp->main2prime(*varIter);
239 
240  // Initialisation of the value function
241  _vFunction = _operator->getFunctionInstance();
242  _optimalPolicy = _operator->getAggregatorInstance();
243  __firstTime = true;
244  }
GUM_SCALAR _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * _operator
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
GUM_SCALAR __threshold
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
Set< const DiscreteVariable *> _elVarSeq
A Set to eleminate primed variables.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
INLINE const FMDP< GUM_SCALAR > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ makePlanning()

template<typename GUM_SCALAR >
void gum::StructuredPlaner< GUM_SCALAR >::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 251 of file structuredPlaner_tpl.h.

Referenced by gum::AdaptiveRMaxPlaner::makePlanning(), and gum::StructuredPlaner< double >::optimalPolicySize().

251  {
252  if (__firstTime) {
253  this->_initVFunction();
254  __firstTime = false;
255  }
256 
257  // *****************************************************************************************
258  // Main loop
259  // *****************************************************************************************
260  Idx nbIte = 0;
261  GUM_SCALAR gap = __threshold + 1;
262  while ((gap > __threshold) && (nbIte < nbStep)) {
263  ++nbIte;
264 
266 
267  // *****************************************************************************************
268  // Then we compare new value function and the old one
270  _operator->subtract(newVFunction, _vFunction);
271  gap = 0;
272 
273  for (deltaV->beginValues(); deltaV->hasValue(); deltaV->nextValue())
274  if (gap < fabs(deltaV->value())) gap = fabs(deltaV->value());
275  delete deltaV;
276 
277  if (_verbose)
278  std::cout << " ------------------- Fin itération n° " << nbIte << std::endl
279  << " Gap : " << gap << " - " << __threshold << std::endl;
280 
281  // *****************************************************************************************
282  // And eventually we update pointers for next loop
283  delete _vFunction;
284  _vFunction = newVFunction;
285  }
286 
287  // *****************************************************************************************
288  // Policy matching value function research
289  // *****************************************************************************************
290  this->_evalPolicy();
291  }
void nextValue() const
Increments the constant safe iterator.
void beginValues() const
Initializes the constant safe iterator on terminal nodes.
virtual void _evalPolicy()
Perform the required tasks to extract an optimal policy.
IOperatorStrategy< GUM_SCALAR > * _operator
bool _verbose
Boolean used to indcates whether or not iteration informations should be displayed on terminal...
virtual MultiDimFunctionGraph< GUM_SCALAR > * _valueIteration()
Performs a single step of value iteration.
virtual void _initVFunction()
Performs a single step of value iteration.
GUM_SCALAR __threshold
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*...
bool hasValue() const
Indicates if constant safe iterator has reach end of terminal nodes list.
const GUM_SCALAR & value() const
Returns the value of the current terminal nodes pointed by the constant safe iterator.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.
+ Here is the caller graph for this function:

◆ optimalPolicy()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy ( )
inlinevirtual

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 157 of file structuredPlaner.h.

157  {
158  return _optimalPolicy;
159  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ optimalPolicy2String()

template<typename GUM_SCALAR >
std::string gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String ( )
virtual

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 105 of file structuredPlaner_tpl.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicySize().

105  {
106  // ************************************************************************
107  // Discarding the case where no \pi* have been computed
108  if (!_optimalPolicy || _optimalPolicy->root() == 0)
109  return "NO OPTIMAL POLICY CALCULATED YET";
110 
111  // ************************************************************************
112  // Initialisation
113 
114  // Declaration of the needed string stream
115  std::stringstream output;
116  std::stringstream terminalStream;
117  std::stringstream nonTerminalStream;
118  std::stringstream arcstream;
119 
120  // First line for the toDot
121  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
122 
123  // Form line for the internal node stream en the terminal node stream
124  terminalStream << "node [shape = box];" << std::endl;
125  nonTerminalStream << "node [shape = ellipse];" << std::endl;
126 
127  // For somme clarity in the final string
128  std::string tab = "\t";
129 
130  // To know if we already checked a node or not
131  Set< NodeId > visited;
132 
133  // FIFO of nodes to visit
134  std::queue< NodeId > fifo;
135 
136  // Loading the FIFO
137  fifo.push(_optimalPolicy->root());
138  visited << _optimalPolicy->root();
139 
140 
141  // ************************************************************************
142  // Main loop
143  while (!fifo.empty()) {
144  // Node to visit
145  NodeId currentNodeId = fifo.front();
146  fifo.pop();
147 
148  // Checking if it is terminal
149  if (_optimalPolicy->isTerminalNode(currentNodeId)) {
150  // Get back the associated ActionSet
151  ActionSet ase = _optimalPolicy->nodeValue(currentNodeId);
152 
153  // Creating a line for this node
154  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId
155  << " [label=\"" << currentNodeId << " - ";
156 
157  // Enumerating and adding to the line the associated optimal actions
158  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe();
159  valIter != ase.endSafe();
160  ++valIter)
161  terminalStream << _fmdp->actionName(*valIter) << " ";
162 
163  // Terminating line
164  terminalStream << "\"];" << std::endl;
165  continue;
166  }
167 
168  // Either wise
169  {
170  // Geting back the associated internal node
171  const InternalNode* currentNode = _optimalPolicy->node(currentNodeId);
172 
173  // Creating a line in internalnode stream for this node
174  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId
175  << " [label=\"" << currentNodeId << " - "
176  << currentNode->nodeVar()->name() << "\"];" << std::endl;
177 
178  // Going through the sons and agregating them according the the sons Ids
179  HashTable< NodeId, LinkedList< Idx >* > sonMap;
180  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
181  if (!visited.exists(currentNode->son(sonIter))) {
182  fifo.push(currentNode->son(sonIter));
183  visited << currentNode->son(sonIter);
184  }
185  if (!sonMap.exists(currentNode->son(sonIter)))
186  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
187  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
188  }
189 
190  // Adding to the arc stram
191  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe();
192  ++sonIter) {
193  arcstream << tab << currentNodeId << " -> " << sonIter.key()
194  << " [label=\" ";
195  Link< Idx >* modaIter = sonIter.val()->list();
196  while (modaIter) {
197  arcstream << currentNode->nodeVar()->label(modaIter->element());
198  if (modaIter->nextLink()) arcstream << ", ";
199  modaIter = modaIter->nextLink();
200  }
201  arcstream << "\",color=\"#00ff00\"];" << std::endl;
202  delete sonIter.val();
203  }
204  }
205  }
206 
207  // Terminating
208  output << terminalStream.str() << std::endl
209  << nonTerminalStream.str() << std::endl
210  << arcstream.str() << std::endl
211  << "}" << std::endl;
212 
213  return output.str();
214  }
const FMDP< GUM_SCALAR > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:607
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:98
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:613
+ Here is the caller graph for this function:

◆ optimalPolicySize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::optimalPolicySize ( )
inlinevirtual

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 164 of file structuredPlaner.h.

164  {
165  return _optimalPolicy != nullptr ? _optimalPolicy->realSize() : 0;
166  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ spumddInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::spumddInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 80 of file structuredPlaner.h.

Referenced by gum::SDYNA::RandomMDDInstance(), and gum::SDYNA::spimddiInstance().

82  {
83  return new StructuredPlaner< GUM_SCALAR >(
84  new MDDOperatorStrategy< GUM_SCALAR >(),
85  discountFactor,
86  epsilon,
87  verbose);
88  }
+ Here is the caller graph for this function:

◆ sviInstance()

template<typename GUM_SCALAR>
static StructuredPlaner< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::sviInstance ( GUM_SCALAR  discountFactor = 0.9,
GUM_SCALAR  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 94 of file structuredPlaner.h.

Referenced by gum::SDYNA::RandomTreeInstance(), and gum::SDYNA::spitiInstance().

96  {
97  return new StructuredPlaner< GUM_SCALAR >(
98  new TreeOperatorStrategy< GUM_SCALAR >(),
99  discountFactor,
100  epsilon,
101  verbose);
102  }
+ Here is the caller graph for this function:

◆ vFunction()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction ( )
inline

Returns a const ptr on the value function computed so far.

Definition at line 142 of file structuredPlaner.h.

142  {
143  return _vFunction;
144  }
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.

◆ vFunctionSize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::vFunctionSize ( )
inlinevirtual

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 149 of file structuredPlaner.h.

149  {
150  return _vFunction != nullptr ? _vFunction->realSize() : 0;
151  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< GUM_SCALAR > * _vFunction
The Value Function computed iteratively.

Member Data Documentation

◆ __firstTime

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::__firstTime
private

Definition at line 380 of file structuredPlaner.h.

◆ __threshold

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::__threshold
private

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 379 of file structuredPlaner.h.

◆ _discountFactor

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::_discountFactor
protected

Discount Factor used for infinite horizon planning.

Definition at line 363 of file structuredPlaner.h.

◆ _elVarSeq

template<typename GUM_SCALAR>
Set< const DiscreteVariable* > gum::StructuredPlaner< GUM_SCALAR >::_elVarSeq
protected

A Set to eleminate primed variables.

Definition at line 358 of file structuredPlaner.h.

◆ _fmdp

template<typename GUM_SCALAR>
const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::_fmdp
protected

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 338 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::fmdp().

◆ _operator

template<typename GUM_SCALAR>
IOperatorStrategy< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::_operator
protected

Definition at line 365 of file structuredPlaner.h.

◆ _optimalPolicy

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy
protected

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the _fmdp is required. optimalPolicy2String do this job.

Definition at line 353 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::optimalPolicy(), and gum::StructuredPlaner< double >::optimalPolicySize().

◆ _verbose

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::_verbose
protected

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 371 of file structuredPlaner.h.

◆ _vFunction

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::_vFunction
protected

The Value Function computed iteratively.

Definition at line 343 of file structuredPlaner.h.

Referenced by gum::StructuredPlaner< double >::vFunction(), and gum::StructuredPlaner< double >::vFunctionSize().


The documentation for this class was generated from the following files: