aGrUM  0.20.2
a C++ library for (probabilistic) graphical models
gum::AdaptiveRMaxPlaner Class Reference

<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...

#include <adaptiveRMaxPlaner.h>

+ Inheritance diagram for gum::AdaptiveRMaxPlaner:
+ Collaboration diagram for gum::AdaptiveRMaxPlaner:

Public Member Functions

Planning Methods
void initialize (const FMDP< double > *fmdp)
 Initializes data structure needed for making the planning. More...
 
void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 
Datastructure access methods
INLINE const FMDP< double > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< double > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 

Static Public Member Functions

static AdaptiveRMaxPlanerReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static AdaptiveRMaxPlanerTreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< double > * fmdp_
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< double > * vFunction_
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
 The associated optimal policy. More...
 
Set< const DiscreteVariable * > elVarSeq_
 A Set to eleminate primed variables. More...
 
double discountFactor_
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< double > * operator_
 
bool verbose_
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void initVFunction_ ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< double > * valueIteration_ ()
 Performs a single step of value iteration. More...
 
Optimal policy extraction methods
virtual void evalPolicy_ ()
 Perform the required tasks to extract an optimal policy. More...
 
Value Iteration Methods
virtual MultiDimFunctionGraph< double > * evalQaction_ (const MultiDimFunctionGraph< double > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< double > * maximiseQactions_ (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< double > * minimiseFunctions_ (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< double > * addReward_ (MultiDimFunctionGraph< double > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 
Optimal policy extraction methods
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * makeArgMax_ (const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
 Performs argmax_a Q(s,a) More...
 
void extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 

Constructor & destructor.

 AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
 Default constructor. More...
 
 ~AdaptiveRMaxPlaner ()
 Default destructor. More...
 

Incremental methods

HashTable< Idx, StatesCounter *> counterTable__
 
HashTable< Idx, boolinitializedTable__
 
bool initialized__
 
void checkState (const Instantiation &newState, Idx actionId)
 

Incremental methods

void setOptimalStrategy (const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol)
 
virtual ActionSet stateOptimalPolicy (const Instantiation &curState)
 
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol_
 
ActionSet allActions_
 

Detailed Description

<agrum/FMDP/planning/adaptiveRMaxPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a RMax planning on given in parameter factored markov decision process

Definition at line 53 of file adaptiveRMaxPlaner.h.

Constructor & Destructor Documentation

◆ AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::AdaptiveRMaxPlaner ( IOperatorStrategy< double > *  opi,
double  discountFactor,
double  epsilon,
const ILearningStrategy learner,
bool  verbose 
)
private

Default constructor.

Definition at line 63 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

67  :
68  StructuredPlaner(opi, discountFactor, epsilon, verbose),
69  IDecisionStrategy(), fmdpLearner__(learner), initialized__(false) {
70  GUM_CONSTRUCTOR(AdaptiveRMaxPlaner);
71  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
const ILearningStrategy * fmdpLearner__
StructuredPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, bool verbose)
Default constructor.
+ Here is the call graph for this function:

◆ ~AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner ( )

Default destructor.

Definition at line 76 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

76  {
77  GUM_DESTRUCTOR(AdaptiveRMaxPlaner);
78 
79  for (HashTableIteratorSafe< Idx, StatesCounter* > scIter
80  = counterTable__.beginSafe();
81  scIter != counterTable__.endSafe();
82  ++scIter)
83  delete scIter.val();
84  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
HashTable< Idx, StatesCounter *> counterTable__
+ Here is the call graph for this function:

Member Function Documentation

◆ addReward_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::addReward_ ( MultiDimFunctionGraph< double > *  function,
Idx  actionId = 0 
)
protectedvirtualinherited

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 409 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

411  {
412  // *****************************************************************************************
413  // ... we multiply the result by the discount factor, ...
416  newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
417  delete Vold;
418 
419  // *****************************************************************************************
420  // ... and finally add reward
421  newVFunction = operator_->add(newVFunction, RECAST(fmdp_->reward(actionId)));
422 
423  return newVFunction;
424  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:326
double discountFactor_
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * operator_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

◆ argmaximiseQactions_()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtualinherited

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 548 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

551  {
552  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
553  newVFunction
554  = qActionsSet.back();
555  qActionsSet.pop_back();
556 
557  while (!qActionsSet.empty()) {
558  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
559  qAction
560  = qActionsSet.back();
561  qActionsSet.pop_back();
562  newVFunction = operator_->argmaximize(newVFunction, qAction);
563  }
564 
565  return newVFunction;
566  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximize(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f1, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f2, Idx del=3)=0
IOperatorStrategy< double > * operator_

◆ checkState()

void gum::AdaptiveRMaxPlaner::checkState ( const Instantiation newState,
Idx  actionId 
)
inlinevirtual

Implements gum::IDecisionStrategy.

Definition at line 201 of file adaptiveRMaxPlaner.h.

201  {
202  if (!initializedTable__[actionId]) {
203  counterTable__[actionId]->reset(newState);
204  initializedTable__[actionId] = true;
205  } else
206  counterTable__[actionId]->incState(newState);
207  }
HashTable< Idx, StatesCounter *> counterTable__
HashTable< Idx, bool > initializedTable__

◆ clearTables__()

void gum::AdaptiveRMaxPlaner::clearTables__ ( )
private

Definition at line 350 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

350  {
351  for (auto actionIter = this->fmdp()->beginActions();
352  actionIter != this->fmdp()->endActions();
353  ++actionIter) {
354  delete actionsBoolTable__[*actionIter];
355  delete actionsRMaxTable__[*actionIter];
356  }
357  actionsRMaxTable__.clear();
358  actionsBoolTable__.clear();
359  }
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsRMaxTable__
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsBoolTable__
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
+ Here is the call graph for this function:

◆ evalPolicy_()

void gum::AdaptiveRMaxPlaner::evalPolicy_ ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 195 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

195  {
196  // *****************************************************************************************
197  // Loop reset
198  MultiDimFunctionGraph< double >* newVFunction
200  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
201 
202  std::vector<
203  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* >
204  argMaxQActionsSet;
205  // *****************************************************************************************
206  // For each action
207  for (auto actionIter = fmdp_->beginActions();
208  actionIter != fmdp_->endActions();
209  ++actionIter) {
211  = this->evalQaction_(newVFunction, *actionIter);
212 
213  qAction = this->addReward_(qAction, *actionIter);
214 
215  qAction = this->operator_->maximize(
216  actionsRMaxTable__[*actionIter],
217  this->operator_->multiply(qAction, actionsBoolTable__[*actionIter], 1),
218  2);
219 
220  argMaxQActionsSet.push_back(makeArgMax_(qAction, *actionIter));
221  }
222  delete newVFunction;
223 
224  // *****************************************************************************************
225  // Next to evaluate main value function, we take maximise over all action
226  // value, ...
227  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >*
228  argMaxVFunction
229  = argmaximiseQactions_(argMaxQActionsSet);
230 
231  // *****************************************************************************************
232  // Next to evaluate main value function, we take maximise over all action
233  // value, ...
234  extractOptimalPolicy_(argMaxVFunction);
235  }
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsRMaxTable__
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:117
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< double > * evalQaction_(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * makeArgMax_(const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsBoolTable__
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
virtual MultiDimFunctionGraph< double > * addReward_(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
void extractOptimalPolicy_(const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_(std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
Performs argmax_a Q(s,a)
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< double > * operator_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:

◆ evalQaction_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::evalQaction_ ( const MultiDimFunctionGraph< double > *  Vold,
Idx  actionId 
)
protectedvirtualinherited

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 353 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

355  {
356  // ******************************************************************************
357  // Initialisation :
358  // Creating a copy of last Vfunction to deduce from the new Qaction
359  // And finding the first var to eleminate (the one at the end)
360 
361  return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
362  }
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * regress(const MultiDimFunctionGraph< GUM_SCALAR > *Vold, Idx actionId, const FMDP< GUM_SCALAR > *fmdp, const Set< const DiscreteVariable * > &elVarSeq)=0
Performs a multiplication/projection on given qAction.
Set< const DiscreteVariable * > elVarSeq_
A Set to eleminate primed variables.
IOperatorStrategy< double > * operator_

◆ extractOptimalPolicy_()

void gum::StructuredPlaner< double >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protectedinherited

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 574 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

577  {
578  optimalPolicy_->clear();
579 
580  // Insertion des nouvelles variables
581  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
582  = argMaxOptimalValueFunction->variablesSequence().beginSafe();
583  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
584  ++varIter)
585  optimalPolicy_->add(**varIter);
586 
588  optimalPolicy_->manager()->setRootNode(
589  recurExtractOptPol__(argMaxOptimalValueFunction->root(),
590  argMaxOptimalValueFunction,
591  src2dest));
592 
593  delete argMaxOptimalValueFunction;
594  }
NodeId recurExtractOptPol__(NodeId, const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ fmdp()

INLINE const FMDP< double >* gum::StructuredPlaner< double >::fmdp ( )
inlineinherited

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 137 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

137 { return fmdp_; }
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...

◆ initialize()

void gum::AdaptiveRMaxPlaner::initialize ( const FMDP< double > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Reimplemented from gum::IDecisionStrategy.

Definition at line 97 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

97  {
98  if (!initialized__) {
101  for (auto actionIter = fmdp->beginActions();
102  actionIter != fmdp->endActions();
103  ++actionIter) {
104  counterTable__.insert(*actionIter, new StatesCounter());
105  initializedTable__.insert(*actionIter, false);
106  }
107  initialized__ = true;
108  }
109  }
virtual void initialize(const FMDP< double > *fmdp)
Initializes the learner.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
HashTable< Idx, StatesCounter *> counterTable__
HashTable< Idx, bool > initializedTable__
virtual void initialize(const FMDP< GUM_SCALAR > *fmdp)
Initializes data structure needed for making the planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
+ Here is the call graph for this function:

◆ initVFunction_()

void gum::AdaptiveRMaxPlaner::initVFunction_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 133 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

133  {
136  for (auto actionIter = fmdp_->beginActions();
137  actionIter != fmdp_->endActions();
138  ++actionIter)
140  RECASTED(this->fmdp_->reward(*actionIter)),
141  1);
142  }
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:326
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
#define RECASTED(x)
For shorter line and hence more comprehensive code purposes only.
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< double > * operator_
+ Here is the call graph for this function:

◆ makeArgMax_()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::makeArgMax_ ( const MultiDimFunctionGraph< double > *  Qaction,
Idx  actionId 
)
protectedinherited

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 485 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

487  {
488  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
489  amcpy
491 
492  // Insertion des nouvelles variables
493  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
494  = qAction->variablesSequence().beginSafe();
495  varIter != qAction->variablesSequence().endSafe();
496  ++varIter)
497  amcpy->add(**varIter);
498 
500  amcpy->manager()->setRootNode(
501  recurArgMaxCopy__(qAction->root(), actionId, qAction, amcpy, src2dest));
502 
503  delete qAction;
504  return amcpy;
505  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * getArgMaxFunctionInstance()=0
NodeId recurArgMaxCopy__(NodeId, Idx, const MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
IOperatorStrategy< double > * operator_

◆ makePlanning()

void gum::AdaptiveRMaxPlaner::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 114 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

114  {
116 
118 
119  clearTables__();
120  }
virtual void makePlanning(Idx nbStep=1000000)
Performs a value iteration.
+ Here is the call graph for this function:

◆ makeRMaxFunctionGraphs__()

void gum::AdaptiveRMaxPlaner::makeRMaxFunctionGraphs__ ( )
private

Definition at line 240 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

240  {
242  = fmdpLearner__->modaMax() * 5 > 30 ? fmdpLearner__->modaMax() * 5 : 30;
243  rmax__ = fmdpLearner__->rMax() / (1.0 - this->discountFactor_);
244 
245  for (auto actionIter = this->fmdp()->beginActions();
246  actionIter != this->fmdp()->endActions();
247  ++actionIter) {
248  std::vector< MultiDimFunctionGraph< double >* > rmaxs;
249  std::vector< MultiDimFunctionGraph< double >* > boolQs;
250 
251  for (auto varIter = this->fmdp()->beginVariables();
252  varIter != this->fmdp()->endVariables();
253  ++varIter) {
254  const IVisitableGraphLearner* visited = counterTable__[*actionIter];
255 
257  = this->operator_->getFunctionInstance();
259  = this->operator_->getFunctionInstance();
260 
261  visited->insertSetOfVars(varRMax);
262  visited->insertSetOfVars(varBoolQ);
263 
264  std::pair< NodeId, NodeId > rooty
265  = visitLearner__(visited, visited->root(), varRMax, varBoolQ);
266  varRMax->manager()->setRootNode(rooty.first);
267  varRMax->manager()->reduce();
268  varRMax->manager()->clean();
269  varBoolQ->manager()->setRootNode(rooty.second);
270  varBoolQ->manager()->reduce();
271  varBoolQ->manager()->clean();
272 
273  rmaxs.push_back(varRMax);
274  boolQs.push_back(varBoolQ);
275 
276  // std::cout << RECASTED(this->fmdp_->transition(*actionIter,
277  // *varIter))->toDot() << std::endl;
278  // for( auto varIter2 =
279  // RECASTED(this->fmdp_->transition(*actionIter,
280  // *varIter))->variablesSequence().beginSafe(); varIter2 !=
281  // RECASTED(this->fmdp_->transition(*actionIter,
282  // *varIter))->variablesSequence().endSafe(); ++varIter2 )
283  // std::cout << (*varIter2)->name() << " | ";
284  // std::cout << std::endl;
285 
286  // std::cout << varRMax->toDot() << std::endl;
287  // for( auto varIter =
288  // varRMax->variablesSequence().beginSafe(); varIter !=
289  // varRMax->variablesSequence().endSafe(); ++varIter )
290  // std::cout << (*varIter)->name() << " | ";
291  // std::cout << std::endl;
292 
293  // std::cout << varBoolQ->toDot() << std::endl;
294  // for( auto varIter =
295  // varBoolQ->variablesSequence().beginSafe(); varIter !=
296  // varBoolQ->variablesSequence().endSafe(); ++varIter )
297  // std::cout << (*varIter)->name() << " | ";
298  // std::cout << std::endl;
299  }
300 
301  // std::cout << "Maximising" << std::endl;
302  actionsRMaxTable__.insert(*actionIter, this->maximiseQactions_(rmaxs));
303  actionsBoolTable__.insert(*actionIter, this->minimiseFunctions_(boolQs));
304  }
305  }
void clean()
Removes var without nodes in the diagram.
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsRMaxTable__
SequenceIteratorSafe< const DiscreteVariable *> beginVariables() const
Returns an iterator reference to he beginning of the list of variables.
Definition: fmdp.h:95
std::pair< NodeId, NodeId > visitLearner__(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
const ILearningStrategy * fmdpLearner__
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsBoolTable__
HashTable< Idx, StatesCounter *> counterTable__
virtual double modaMax() const =0
learnerSize
virtual MultiDimFunctionGraph< double > * minimiseFunctions_(std::vector< MultiDimFunctionGraph< double > *> &)
Performs min_i F_i.
virtual double rMax() const =0
learnerSize
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
double discountFactor_
Discount Factor used for infinite horizon planning.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
SequenceIteratorSafe< const DiscreteVariable *> endVariables() const
Returns an iterator reference to the end of the list of variables.
Definition: fmdp.h:102
virtual void reduce()=0
Ensures that every isomorphic subgraphs are merged together.
virtual MultiDimFunctionGraph< double > * maximiseQactions_(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
IOperatorStrategy< double > * operator_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:

◆ maximiseQactions_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 370 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

371  {
372  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
373  qActionsSet.pop_back();
374 
375  while (!qActionsSet.empty()) {
376  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
377  qActionsSet.pop_back();
378  newVFunction = operator_->maximize(newVFunction, qAction);
379  }
380 
381  return newVFunction;
382  }
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
IOperatorStrategy< double > * operator_

◆ minimiseFunctions_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 390 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

391  {
392  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
393  qActionsSet.pop_back();
394 
395  while (!qActionsSet.empty()) {
396  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
397  qActionsSet.pop_back();
398  newVFunction = operator_->minimize(newVFunction, qAction);
399  }
400 
401  return newVFunction;
402  }
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
IOperatorStrategy< double > * operator_

◆ optimalPolicy()

INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::optimalPolicy ( )
inlinevirtualinherited

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< double >.

Definition at line 157 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

157  {
158  return optimalPolicy_;
159  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ optimalPolicy2String()

std::string gum::StructuredPlaner< double >::optimalPolicy2String ( )
virtualinherited

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< double >.

Definition at line 105 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

105  {
106  // ************************************************************************
107  // Discarding the case where no \pi* have been computed
108  if (!optimalPolicy_ || optimalPolicy_->root() == 0)
109  return "NO OPTIMAL POLICY CALCULATED YET";
110 
111  // ************************************************************************
112  // Initialisation
113 
114  // Declaration of the needed string stream
115  std::stringstream output;
116  std::stringstream terminalStream;
117  std::stringstream nonTerminalStream;
118  std::stringstream arcstream;
119 
120  // First line for the toDot
121  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
122 
123  // Form line for the internal node stream en the terminal node stream
124  terminalStream << "node [shape = box];" << std::endl;
125  nonTerminalStream << "node [shape = ellipse];" << std::endl;
126 
127  // For somme clarity in the final string
128  std::string tab = "\t";
129 
130  // To know if we already checked a node or not
131  Set< NodeId > visited;
132 
133  // FIFO of nodes to visit
134  std::queue< NodeId > fifo;
135 
136  // Loading the FIFO
137  fifo.push(optimalPolicy_->root());
138  visited << optimalPolicy_->root();
139 
140 
141  // ************************************************************************
142  // Main loop
143  while (!fifo.empty()) {
144  // Node to visit
145  NodeId currentNodeId = fifo.front();
146  fifo.pop();
147 
148  // Checking if it is terminal
149  if (optimalPolicy_->isTerminalNode(currentNodeId)) {
150  // Get back the associated ActionSet
151  ActionSet ase = optimalPolicy_->nodeValue(currentNodeId);
152 
153  // Creating a line for this node
154  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId
155  << " [label=\"" << currentNodeId << " - ";
156 
157  // Enumerating and adding to the line the associated optimal actions
158  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe();
159  valIter != ase.endSafe();
160  ++valIter)
161  terminalStream << fmdp_->actionName(*valIter) << " ";
162 
163  // Terminating line
164  terminalStream << "\"];" << std::endl;
165  continue;
166  }
167 
168  // Either wise
169  {
170  // Geting back the associated internal node
171  const InternalNode* currentNode = optimalPolicy_->node(currentNodeId);
172 
173  // Creating a line in internalnode stream for this node
174  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId
175  << " [label=\"" << currentNodeId << " - "
176  << currentNode->nodeVar()->name() << "\"];" << std::endl;
177 
178  // Going through the sons and agregating them according the the sons Ids
179  HashTable< NodeId, LinkedList< Idx >* > sonMap;
180  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
181  if (!visited.exists(currentNode->son(sonIter))) {
182  fifo.push(currentNode->son(sonIter));
183  visited << currentNode->son(sonIter);
184  }
185  if (!sonMap.exists(currentNode->son(sonIter)))
186  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
187  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
188  }
189 
190  // Adding to the arc stram
191  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe();
192  ++sonIter) {
193  arcstream << tab << currentNodeId << " -> " << sonIter.key()
194  << " [label=\" ";
195  Link< Idx >* modaIter = sonIter.val()->list();
196  while (modaIter) {
197  arcstream << currentNode->nodeVar()->label(modaIter->element());
198  if (modaIter->nextLink()) arcstream << ", ";
199  modaIter = modaIter->nextLink();
200  }
201  arcstream << "\",color=\"#00ff00\"];" << std::endl;
202  delete sonIter.val();
203  }
204  }
205  }
206 
207  // Terminating
208  output << terminalStream.str() << std::endl
209  << nonTerminalStream.str() << std::endl
210  << arcstream.str() << std::endl
211  << "}" << std::endl;
212 
213  return output.str();
214  }
const std::string & actionName(Idx actionId) const
Returns name of action given in parameter.
Definition: fmdp_tpl.h:349
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:626
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:632

◆ optimalPolicySize()

virtual Size gum::StructuredPlaner< double >::optimalPolicySize ( )
inlinevirtualinherited

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 164 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

164  {
165  return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
166  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ ReducedAndOrderedInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 65 of file adaptiveRMaxPlaner.h.

68  {
69  return new AdaptiveRMaxPlaner(new MDDOperatorStrategy< double >(),
70  discountFactor,
71  epsilon,
72  learner,
73  verbose);
74  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

◆ setOptimalStrategy()

void gum::IDecisionStrategy::setOptimalStrategy ( const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *  optPol)
inlineinherited

Definition at line 90 of file IDecisionStrategy.h.

91  {
92  optPol_ = const_cast<
93  MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* >(optPol);
94  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol_

◆ spumddInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::spumddInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 80 of file structuredPlaner.h.

82  {
83  return new StructuredPlaner< GUM_SCALAR >(
84  new MDDOperatorStrategy< GUM_SCALAR >(),
85  discountFactor,
86  epsilon,
87  verbose);
88  }

◆ stateOptimalPolicy()

virtual ActionSet gum::IDecisionStrategy::stateOptimalPolicy ( const Instantiation curState)
inlinevirtualinherited

Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.

Definition at line 96 of file IDecisionStrategy.h.

96  {
97  return (optPol_ && optPol_->realSize() != 0) ? optPol_->get(curState)
98  : allActions_;
99  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol_

◆ sviInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::sviInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 94 of file structuredPlaner.h.

96  {
97  return new StructuredPlaner< GUM_SCALAR >(
98  new TreeOperatorStrategy< GUM_SCALAR >(),
99  discountFactor,
100  epsilon,
101  verbose);
102  }

◆ TreeInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::TreeInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 79 of file adaptiveRMaxPlaner.h.

82  {
83  return new AdaptiveRMaxPlaner(new TreeOperatorStrategy< double >(),
84  discountFactor,
85  epsilon,
86  learner,
87  verbose);
88  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

◆ valueIteration_()

MultiDimFunctionGraph< double > * gum::AdaptiveRMaxPlaner::valueIteration_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 147 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

147  {
148  // *****************************************************************************************
149  // Loop reset
150  MultiDimFunctionGraph< double >* newVFunction
152  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
153 
154  // *****************************************************************************************
155  // For each action
156  std::vector< MultiDimFunctionGraph< double >* > qActionsSet;
157  for (auto actionIter = fmdp_->beginActions();
158  actionIter != fmdp_->endActions();
159  ++actionIter) {
161  = evalQaction_(newVFunction, *actionIter);
162 
163  // *******************************************************************************************
164  // Next, we add the reward
165  qAction = addReward_(qAction, *actionIter);
166 
167  qAction = this->operator_->maximize(
168  actionsRMaxTable__[*actionIter],
169  this->operator_->multiply(qAction, actionsBoolTable__[*actionIter], 1),
170  2);
171 
172  qActionsSet.push_back(qAction);
173  }
174  delete newVFunction;
175 
176  // *****************************************************************************************
177  // Next to evaluate main value function, we take maximise over all action
178  // value, ...
179  newVFunction = maximiseQactions_(qActionsSet);
180 
181  return newVFunction;
182  }
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsRMaxTable__
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:117
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< double > * evalQaction_(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
HashTable< Idx, MultiDimFunctionGraph< double > *> actionsBoolTable__
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
virtual MultiDimFunctionGraph< double > * addReward_(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< double > * maximiseQactions_(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
IOperatorStrategy< double > * operator_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:

◆ vFunction()

INLINE const MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction ( )
inlineinherited

Returns a const ptr on the value function computed so far.

Definition at line 142 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

142  {
143  return vFunction_;
144  }
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.

◆ vFunctionSize()

virtual Size gum::StructuredPlaner< double >::vFunctionSize ( )
inlinevirtualinherited

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 149 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

149  {
150  return vFunction_ != nullptr ? vFunction_->realSize() : 0;
151  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.

◆ visitLearner__()

std::pair< NodeId, NodeId > gum::AdaptiveRMaxPlaner::visitLearner__ ( const IVisitableGraphLearner visited,
NodeId  currentNodeId,
MultiDimFunctionGraph< double > *  rmax,
MultiDimFunctionGraph< double > *  boolQ 
)
private

Definition at line 311 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

314  {
315  std::pair< NodeId, NodeId > rep;
316  if (visited->isTerminal(currentNodeId)) {
317  rep.first = rmax->manager()->addTerminalNode(
318  visited->nodeNbObservation(currentNodeId) < rThreshold__ ? rmax__ : 0.0);
319  rep.second = boolQ->manager()->addTerminalNode(
320  visited->nodeNbObservation(currentNodeId) < rThreshold__ ? 0.0 : 1.0);
321  return rep;
322  }
323 
324  NodeId* rmaxsons = static_cast< NodeId* >(SOA_ALLOCATE(
325  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
326  NodeId* bqsons = static_cast< NodeId* >(SOA_ALLOCATE(
327  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
328 
329  for (Idx moda = 0; moda < visited->nodeVar(currentNodeId)->domainSize();
330  ++moda) {
331  std::pair< NodeId, NodeId > sonp
332  = visitLearner__(visited,
333  visited->nodeSon(currentNodeId, moda),
334  rmax,
335  boolQ);
336  rmaxsons[moda] = sonp.first;
337  bqsons[moda] = sonp.second;
338  }
339 
340  rep.first = rmax->manager()->addInternalNode(visited->nodeVar(currentNodeId),
341  rmaxsons);
342  rep.second = boolQ->manager()->addInternalNode(visited->nodeVar(currentNodeId),
343  bqsons);
344  return rep;
345  }
NodeId addInternalNode(const DiscreteVariable *var)
Inserts a new non terminal node in graph.
std::pair< NodeId, NodeId > visitLearner__(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)
+ Here is the call graph for this function:

Member Data Documentation

◆ actionsBoolTable__

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::actionsBoolTable__
private

Definition at line 189 of file adaptiveRMaxPlaner.h.

◆ actionsRMaxTable__

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::actionsRMaxTable__
private

Definition at line 188 of file adaptiveRMaxPlaner.h.

◆ allActions_

ActionSet gum::IDecisionStrategy::allActions_
protectedinherited

Definition at line 106 of file IDecisionStrategy.h.

◆ counterTable__

HashTable< Idx, StatesCounter* > gum::AdaptiveRMaxPlaner::counterTable__
private

Definition at line 210 of file adaptiveRMaxPlaner.h.

◆ discountFactor_

double gum::StructuredPlaner< double >::discountFactor_
protectedinherited

Discount Factor used for infinite horizon planning.

Definition at line 363 of file structuredPlaner.h.

◆ elVarSeq_

Set< const DiscreteVariable* > gum::StructuredPlaner< double >::elVarSeq_
protectedinherited

A Set to eleminate primed variables.

Definition at line 358 of file structuredPlaner.h.

◆ fmdp_

const FMDP< double >* gum::StructuredPlaner< double >::fmdp_
protectedinherited

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 338 of file structuredPlaner.h.

◆ fmdpLearner__

const ILearningStrategy* gum::AdaptiveRMaxPlaner::fmdpLearner__
private

Definition at line 190 of file adaptiveRMaxPlaner.h.

◆ initialized__

bool gum::AdaptiveRMaxPlaner::initialized__
private

Definition at line 213 of file adaptiveRMaxPlaner.h.

◆ initializedTable__

HashTable< Idx, bool > gum::AdaptiveRMaxPlaner::initializedTable__
private

Definition at line 211 of file adaptiveRMaxPlaner.h.

◆ operator_

IOperatorStrategy< double >* gum::StructuredPlaner< double >::operator_
protectedinherited

Definition at line 365 of file structuredPlaner.h.

◆ optimalPolicy_

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 353 of file structuredPlaner.h.

◆ optPol_

const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::IDecisionStrategy::optPol_
protectedinherited

Definition at line 103 of file IDecisionStrategy.h.

◆ rmax__

double gum::AdaptiveRMaxPlaner::rmax__
private

Definition at line 193 of file adaptiveRMaxPlaner.h.

◆ rThreshold__

double gum::AdaptiveRMaxPlaner::rThreshold__
private

Definition at line 192 of file adaptiveRMaxPlaner.h.

◆ verbose_

bool gum::StructuredPlaner< double >::verbose_
protectedinherited

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 371 of file structuredPlaner.h.

◆ vFunction_

MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction_
protectedinherited

The Value Function computed iteratively.

Definition at line 343 of file structuredPlaner.h.


The documentation for this class was generated from the following files: