aGrUM  0.20.3
a C++ library for (probabilistic) graphical models
gum::AdaptiveRMaxPlaner Class Reference

<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...

#include <adaptiveRMaxPlaner.h>

+ Inheritance diagram for gum::AdaptiveRMaxPlaner:
+ Collaboration diagram for gum::AdaptiveRMaxPlaner:

Public Member Functions

Planning Methods
void initialize (const FMDP< double > *fmdp)
 Initializes data structure needed for making the planning. More...
 
void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 
Datastructure access methods
INLINE const FMDP< double > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< double > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 

Static Public Member Functions

static AdaptiveRMaxPlanerReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static AdaptiveRMaxPlanerTreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< double > * fmdp_
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< double > * vFunction_
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
 The associated optimal policy. More...
 
Set< const DiscreteVariable * > elVarSeq_
 A Set to eleminate primed variables. More...
 
double discountFactor_
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< double > * operator_
 
bool verbose_
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void initVFunction_ ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< double > * valueIteration_ ()
 Performs a single step of value iteration. More...
 
Optimal policy extraction methods
virtual void evalPolicy_ ()
 Perform the required tasks to extract an optimal policy. More...
 
Value Iteration Methods
virtual MultiDimFunctionGraph< double > * evalQaction_ (const MultiDimFunctionGraph< double > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< double > * maximiseQactions_ (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< double > * minimiseFunctions_ (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< double > * addReward_ (MultiDimFunctionGraph< double > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 
Optimal policy extraction methods
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * makeArgMax_ (const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
 Performs argmax_a Q(s,a) More...
 
void extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 

Constructor & destructor.

 AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
 Default constructor. More...
 
 ~AdaptiveRMaxPlaner ()
 Default destructor. More...
 

Incremental methods

HashTable< Idx, StatesCounter *> _counterTable_
 
HashTable< Idx, bool_initializedTable_
 
bool _initialized_
 
void checkState (const Instantiation &newState, Idx actionId)
 

Incremental methods

void setOptimalStrategy (const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol)
 
virtual ActionSet stateOptimalPolicy (const Instantiation &curState)
 
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol_
 
ActionSet allActions_
 

Detailed Description

<agrum/FMDP/planning/adaptiveRMaxPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a RMax planning on given in parameter factored markov decision process

Definition at line 53 of file adaptiveRMaxPlaner.h.

Constructor & Destructor Documentation

◆ AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::AdaptiveRMaxPlaner ( IOperatorStrategy< double > *  opi,
double  discountFactor,
double  epsilon,
const ILearningStrategy learner,
bool  verbose 
)
private

Default constructor.

Definition at line 63 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

67  :
68  StructuredPlaner(opi, discountFactor, epsilon, verbose),
69  IDecisionStrategy(), _fmdpLearner_(learner), _initialized_(false) {
70  GUM_CONSTRUCTOR(AdaptiveRMaxPlaner);
71  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
StructuredPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, bool verbose)
Default constructor.
const ILearningStrategy * _fmdpLearner_
+ Here is the call graph for this function:

◆ ~AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner ( )

Default destructor.

Definition at line 76 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

76  {
77  GUM_DESTRUCTOR(AdaptiveRMaxPlaner);
78 
79  for (HashTableIteratorSafe< Idx, StatesCounter* > scIter = _counterTable_.beginSafe();
80  scIter != _counterTable_.endSafe();
81  ++scIter)
82  delete scIter.val();
83  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
HashTable< Idx, StatesCounter *> _counterTable_
+ Here is the call graph for this function:

Member Function Documentation

◆ _clearTables_()

void gum::AdaptiveRMaxPlaner::_clearTables_ ( )
private

Definition at line 321 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

321  {
322  for (auto actionIter = this->fmdp()->beginActions(); actionIter != this->fmdp()->endActions();
323  ++actionIter) {
324  delete _actionsBoolTable_[*actionIter];
325  delete _actionsRMaxTable_[*actionIter];
326  }
327  _actionsRMaxTable_.clear();
328  _actionsBoolTable_.clear();
329  }
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsBoolTable_
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsRMaxTable_
+ Here is the call graph for this function:

◆ _makeRMaxFunctionGraphs_()

void gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_ ( )
private

Definition at line 222 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

222  {
223  _rThreshold_ = _fmdpLearner_->modaMax() * 5 > 30 ? _fmdpLearner_->modaMax() * 5 : 30;
224  _rmax_ = _fmdpLearner_->rMax() / (1.0 - this->discountFactor_);
225 
226  for (auto actionIter = this->fmdp()->beginActions(); actionIter != this->fmdp()->endActions();
227  ++actionIter) {
228  std::vector< MultiDimFunctionGraph< double >* > rmaxs;
229  std::vector< MultiDimFunctionGraph< double >* > boolQs;
230 
231  for (auto varIter = this->fmdp()->beginVariables(); varIter != this->fmdp()->endVariables();
232  ++varIter) {
233  const IVisitableGraphLearner* visited = _counterTable_[*actionIter];
234 
237 
238  visited->insertSetOfVars(varRMax);
239  visited->insertSetOfVars(varBoolQ);
240 
241  std::pair< NodeId, NodeId > rooty
242  = _visitLearner_(visited, visited->root(), varRMax, varBoolQ);
243  varRMax->manager()->setRootNode(rooty.first);
244  varRMax->manager()->reduce();
245  varRMax->manager()->clean();
246  varBoolQ->manager()->setRootNode(rooty.second);
247  varBoolQ->manager()->reduce();
248  varBoolQ->manager()->clean();
249 
250  rmaxs.push_back(varRMax);
251  boolQs.push_back(varBoolQ);
252 
253  // std::cout << RECASTED(this->fmdp_->transition(*actionIter,
254  // *varIter))->toDot() << std::endl;
255  // for( auto varIter2 =
256  // RECASTED(this->fmdp_->transition(*actionIter,
257  // *varIter))->variablesSequence().beginSafe(); varIter2 !=
258  // RECASTED(this->fmdp_->transition(*actionIter,
259  // *varIter))->variablesSequence().endSafe(); ++varIter2 )
260  // std::cout << (*varIter2)->name() << " | ";
261  // std::cout << std::endl;
262 
263  // std::cout << varRMax->toDot() << std::endl;
264  // for( auto varIter =
265  // varRMax->variablesSequence().beginSafe(); varIter !=
266  // varRMax->variablesSequence().endSafe(); ++varIter )
267  // std::cout << (*varIter)->name() << " | ";
268  // std::cout << std::endl;
269 
270  // std::cout << varBoolQ->toDot() << std::endl;
271  // for( auto varIter =
272  // varBoolQ->variablesSequence().beginSafe(); varIter !=
273  // varBoolQ->variablesSequence().endSafe(); ++varIter )
274  // std::cout << (*varIter)->name() << " | ";
275  // std::cout << std::endl;
276  }
277 
278  // std::cout << "Maximising" << std::endl;
279  _actionsRMaxTable_.insert(*actionIter, this->maximiseQactions_(rmaxs));
280  _actionsBoolTable_.insert(*actionIter, this->minimiseFunctions_(boolQs));
281  }
282  }
void clean()
Removes var without nodes in the diagram.
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:136
SequenceIteratorSafe< const DiscreteVariable *> beginVariables() const
Returns an iterator reference to he beginning of the list of variables.
Definition: fmdp.h:94
HashTable< Idx, StatesCounter *> _counterTable_
const ILearningStrategy * _fmdpLearner_
std::pair< NodeId, NodeId > _visitLearner_(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
virtual double modaMax() const =0
learnerSize
virtual MultiDimFunctionGraph< double > * minimiseFunctions_(std::vector< MultiDimFunctionGraph< double > *> &)
Performs min_i F_i.
virtual double rMax() const =0
learnerSize
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
double discountFactor_
Discount Factor used for infinite horizon planning.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
SequenceIteratorSafe< const DiscreteVariable *> endVariables() const
Returns an iterator reference to the end of the list of variables.
Definition: fmdp.h:101
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsBoolTable_
virtual void reduce()=0
Ensures that every isomorphic subgraphs are merged together.
virtual MultiDimFunctionGraph< double > * maximiseQactions_(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
IOperatorStrategy< double > * operator_
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsRMaxTable_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:

◆ _visitLearner_()

std::pair< NodeId, NodeId > gum::AdaptiveRMaxPlaner::_visitLearner_ ( const IVisitableGraphLearner visited,
NodeId  currentNodeId,
MultiDimFunctionGraph< double > *  rmax,
MultiDimFunctionGraph< double > *  boolQ 
)
private

Definition at line 288 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

291  {
292  std::pair< NodeId, NodeId > rep;
293  if (visited->isTerminal(currentNodeId)) {
294  rep.first = rmax->manager()->addTerminalNode(
295  visited->nodeNbObservation(currentNodeId) < _rThreshold_ ? _rmax_ : 0.0);
296  rep.second = boolQ->manager()->addTerminalNode(
297  visited->nodeNbObservation(currentNodeId) < _rThreshold_ ? 0.0 : 1.0);
298  return rep;
299  }
300 
301  NodeId* rmaxsons = static_cast< NodeId* >(
302  SOA_ALLOCATE(sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
303  NodeId* bqsons = static_cast< NodeId* >(
304  SOA_ALLOCATE(sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
305 
306  for (Idx moda = 0; moda < visited->nodeVar(currentNodeId)->domainSize(); ++moda) {
307  std::pair< NodeId, NodeId > sonp
308  = _visitLearner_(visited, visited->nodeSon(currentNodeId, moda), rmax, boolQ);
309  rmaxsons[moda] = sonp.first;
310  bqsons[moda] = sonp.second;
311  }
312 
313  rep.first = rmax->manager()->addInternalNode(visited->nodeVar(currentNodeId), rmaxsons);
314  rep.second = boolQ->manager()->addInternalNode(visited->nodeVar(currentNodeId), bqsons);
315  return rep;
316  }
NodeId addInternalNode(const DiscreteVariable *var)
Inserts a new non terminal node in graph.
std::pair< NodeId, NodeId > _visitLearner_(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)
+ Here is the call graph for this function:

◆ addReward_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::addReward_ ( MultiDimFunctionGraph< double > *  function,
Idx  actionId = 0 
)
protectedvirtualinherited

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 395 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

396  {
397  // *****************************************************************************************
398  // ... we multiply the result by the discount factor, ...
400  newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
401  delete Vold;
402 
403  // *****************************************************************************************
404  // ... and finally add reward
405  newVFunction = operator_->add(newVFunction, RECAST(fmdp_->reward(actionId)));
406 
407  return newVFunction;
408  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:293
double discountFactor_
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * operator_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

◆ argmaximiseQactions_()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtualinherited

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 519 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

521  {
522  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* newVFunction
523  = qActionsSet.back();
524  qActionsSet.pop_back();
525 
526  while (!qActionsSet.empty()) {
527  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* qAction
528  = qActionsSet.back();
529  qActionsSet.pop_back();
530  newVFunction = operator_->argmaximize(newVFunction, qAction);
531  }
532 
533  return newVFunction;
534  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximize(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f1, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f2, Idx del=3)=0
IOperatorStrategy< double > * operator_

◆ checkState()

void gum::AdaptiveRMaxPlaner::checkState ( const Instantiation newState,
Idx  actionId 
)
inlinevirtual

Implements gum::IDecisionStrategy.

Definition at line 198 of file adaptiveRMaxPlaner.h.

198  {
199  if (!_initializedTable_[actionId]) {
200  _counterTable_[actionId]->reset(newState);
201  _initializedTable_[actionId] = true;
202  } else
203  _counterTable_[actionId]->incState(newState);
204  }
HashTable< Idx, StatesCounter *> _counterTable_
HashTable< Idx, bool > _initializedTable_

◆ evalPolicy_()

void gum::AdaptiveRMaxPlaner::evalPolicy_ ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 183 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

183  {
184  // *****************************************************************************************
185  // Loop reset
187  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
188 
189  std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* >
190  argMaxQActionsSet;
191  // *****************************************************************************************
192  // For each action
193  for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
194  MultiDimFunctionGraph< double >* qAction = this->evalQaction_(newVFunction, *actionIter);
195 
196  qAction = this->addReward_(qAction, *actionIter);
197 
198  qAction = this->operator_->maximize(
199  _actionsRMaxTable_[*actionIter],
200  this->operator_->multiply(qAction, _actionsBoolTable_[*actionIter], 1),
201  2);
202 
203  argMaxQActionsSet.push_back(makeArgMax_(qAction, *actionIter));
204  }
205  delete newVFunction;
206 
207  // *****************************************************************************************
208  // Next to evaluate main value function, we take maximise over all action
209  // value, ...
210  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* argMaxVFunction
211  = argmaximiseQactions_(argMaxQActionsSet);
212 
213  // *****************************************************************************************
214  // Next to evaluate main value function, we take maximise over all action
215  // value, ...
216  extractOptimalPolicy_(argMaxVFunction);
217  }
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:136
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:116
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< double > * evalQaction_(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * makeArgMax_(const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
virtual MultiDimFunctionGraph< double > * addReward_(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
void extractOptimalPolicy_(const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_(std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
Performs argmax_a Q(s,a)
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsBoolTable_
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< double > * operator_
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsRMaxTable_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:

◆ evalQaction_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::evalQaction_ ( const MultiDimFunctionGraph< double > *  Vold,
Idx  actionId 
)
protectedvirtualinherited

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 341 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

342  {
343  // ******************************************************************************
344  // Initialisation :
345  // Creating a copy of last Vfunction to deduce from the new Qaction
346  // And finding the first var to eleminate (the one at the end)
347 
348  return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
349  }
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * regress(const MultiDimFunctionGraph< GUM_SCALAR > *Vold, Idx actionId, const FMDP< GUM_SCALAR > *fmdp, const Set< const DiscreteVariable * > &elVarSeq)=0
Performs a multiplication/projection on given qAction.
Set< const DiscreteVariable * > elVarSeq_
A Set to eleminate primed variables.
IOperatorStrategy< double > * operator_

◆ extractOptimalPolicy_()

void gum::StructuredPlaner< double >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protectedinherited

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 542 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

544  {
545  optimalPolicy_->clear();
546 
547  // Insertion des nouvelles variables
548  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
549  = argMaxOptimalValueFunction->variablesSequence().beginSafe();
550  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
551  ++varIter)
552  optimalPolicy_->add(**varIter);
553 
555  optimalPolicy_->manager()->setRootNode(_recurExtractOptPol_(argMaxOptimalValueFunction->root(),
556  argMaxOptimalValueFunction,
557  src2dest));
558 
559  delete argMaxOptimalValueFunction;
560  }
NodeId _recurExtractOptPol_(NodeId, const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ fmdp()

INLINE const FMDP< double >* gum::StructuredPlaner< double >::fmdp ( )
inlineinherited

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 133 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

133 { return fmdp_; }
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...

◆ initialize()

void gum::AdaptiveRMaxPlaner::initialize ( const FMDP< double > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Reimplemented from gum::IDecisionStrategy.

Definition at line 96 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

96  {
97  if (!_initialized_) {
100  for (auto actionIter = fmdp->beginActions(); actionIter != fmdp->endActions(); ++actionIter) {
101  _counterTable_.insert(*actionIter, new StatesCounter());
102  _initializedTable_.insert(*actionIter, false);
103  }
104  _initialized_ = true;
105  }
106  }
virtual void initialize(const FMDP< double > *fmdp)
Initializes the learner.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:136
HashTable< Idx, StatesCounter *> _counterTable_
HashTable< Idx, bool > _initializedTable_
virtual void initialize(const FMDP< GUM_SCALAR > *fmdp)
Initializes data structure needed for making the planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
+ Here is the call graph for this function:

◆ initVFunction_()

void gum::AdaptiveRMaxPlaner::initVFunction_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 130 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

130  {
132  for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter)
133  vFunction_ = this->operator_->add(vFunction_, RECASTED(this->fmdp_->reward(*actionIter)), 1);
134  }
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:136
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:293
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
#define RECASTED(x)
For shorter line and hence more comprehensive code purposes only.
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.
IOperatorStrategy< double > * operator_
+ Here is the call graph for this function:

◆ makeArgMax_()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::makeArgMax_ ( const MultiDimFunctionGraph< double > *  Qaction,
Idx  actionId 
)
protectedinherited

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 463 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

464  {
465  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* amcpy
467 
468  // Insertion des nouvelles variables
469  for (SequenceIteratorSafe< const DiscreteVariable* > varIter
470  = qAction->variablesSequence().beginSafe();
471  varIter != qAction->variablesSequence().endSafe();
472  ++varIter)
473  amcpy->add(**varIter);
474 
476  amcpy->manager()->setRootNode(
477  _recurArgMaxCopy_(qAction->root(), actionId, qAction, amcpy, src2dest));
478 
479  delete qAction;
480  return amcpy;
481  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * getArgMaxFunctionInstance()=0
NodeId _recurArgMaxCopy_(NodeId, Idx, const MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
IOperatorStrategy< double > * operator_

◆ makePlanning()

void gum::AdaptiveRMaxPlaner::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 111 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

111  {
113 
115 
116  _clearTables_();
117  }
virtual void makePlanning(Idx nbStep=1000000)
Performs a value iteration.
+ Here is the call graph for this function:

◆ maximiseQactions_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 356 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

357  {
358  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
359  qActionsSet.pop_back();
360 
361  while (!qActionsSet.empty()) {
362  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
363  qActionsSet.pop_back();
364  newVFunction = operator_->maximize(newVFunction, qAction);
365  }
366 
367  return newVFunction;
368  }
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
IOperatorStrategy< double > * operator_

◆ minimiseFunctions_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 375 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

376  {
377  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
378  qActionsSet.pop_back();
379 
380  while (!qActionsSet.empty()) {
381  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
382  qActionsSet.pop_back();
383  newVFunction = operator_->minimize(newVFunction, qAction);
384  }
385 
386  return newVFunction;
387  }
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
IOperatorStrategy< double > * operator_

◆ optimalPolicy()

INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::optimalPolicy ( )
inlinevirtualinherited

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< double >.

Definition at line 148 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

148  {
149  return optimalPolicy_;
150  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ optimalPolicy2String()

std::string gum::StructuredPlaner< double >::optimalPolicy2String ( )
virtualinherited

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< double >.

Definition at line 104 of file structuredPlaner_tpl.h.

References gum::Set< Key, Alloc >::emplace().

104  {
105  // ************************************************************************
106  // Discarding the case where no \pi* have been computed
107  if (!optimalPolicy_ || optimalPolicy_->root() == 0) return "NO OPTIMAL POLICY CALCULATED YET";
108 
109  // ************************************************************************
110  // Initialisation
111 
112  // Declaration of the needed string stream
113  std::stringstream output;
114  std::stringstream terminalStream;
115  std::stringstream nonTerminalStream;
116  std::stringstream arcstream;
117 
118  // First line for the toDot
119  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
120 
121  // Form line for the internal node stream en the terminal node stream
122  terminalStream << "node [shape = box];" << std::endl;
123  nonTerminalStream << "node [shape = ellipse];" << std::endl;
124 
125  // For somme clarity in the final string
126  std::string tab = "\t";
127 
128  // To know if we already checked a node or not
129  Set< NodeId > visited;
130 
131  // FIFO of nodes to visit
132  std::queue< NodeId > fifo;
133 
134  // Loading the FIFO
135  fifo.push(optimalPolicy_->root());
136  visited << optimalPolicy_->root();
137 
138 
139  // ************************************************************************
140  // Main loop
141  while (!fifo.empty()) {
142  // Node to visit
143  NodeId currentNodeId = fifo.front();
144  fifo.pop();
145 
146  // Checking if it is terminal
147  if (optimalPolicy_->isTerminalNode(currentNodeId)) {
148  // Get back the associated ActionSet
149  ActionSet ase = optimalPolicy_->nodeValue(currentNodeId);
150 
151  // Creating a line for this node
152  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
153  << currentNodeId << " - ";
154 
155  // Enumerating and adding to the line the associated optimal actions
156  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe(); valIter != ase.endSafe();
157  ++valIter)
158  terminalStream << fmdp_->actionName(*valIter) << " ";
159 
160  // Terminating line
161  terminalStream << "\"];" << std::endl;
162  continue;
163  }
164 
165  // Either wise
166  {
167  // Geting back the associated internal node
168  const InternalNode* currentNode = optimalPolicy_->node(currentNodeId);
169 
170  // Creating a line in internalnode stream for this node
171  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
172  << currentNodeId << " - " << currentNode->nodeVar()->name() << "\"];"
173  << std::endl;
174 
175  // Going through the sons and agregating them according the the sons Ids
176  HashTable< NodeId, LinkedList< Idx >* > sonMap;
177  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
178  if (!visited.exists(currentNode->son(sonIter))) {
179  fifo.push(currentNode->son(sonIter));
180  visited << currentNode->son(sonIter);
181  }
182  if (!sonMap.exists(currentNode->son(sonIter)))
183  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
184  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
185  }
186 
187  // Adding to the arc stram
188  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe(); ++sonIter) {
189  arcstream << tab << currentNodeId << " -> " << sonIter.key() << " [label=\" ";
190  Link< Idx >* modaIter = sonIter.val()->list();
191  while (modaIter) {
192  arcstream << currentNode->nodeVar()->label(modaIter->element());
193  if (modaIter->nextLink()) arcstream << ", ";
194  modaIter = modaIter->nextLink();
195  }
196  arcstream << "\",color=\"#00ff00\"];" << std::endl;
197  delete sonIter.val();
198  }
199  }
200  }
201 
202  // Terminating
203  output << terminalStream.str() << std::endl
204  << nonTerminalStream.str() << std::endl
205  << arcstream.str() << std::endl
206  << "}" << std::endl;
207 
208  return output.str();
209  }
const std::string & actionName(Idx actionId) const
Returns name of action given in parameter.
Definition: fmdp_tpl.h:314
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:600
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:606

◆ optimalPolicySize()

virtual Size gum::StructuredPlaner< double >::optimalPolicySize ( )
inlinevirtualinherited

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 155 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

155  {
156  return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
157  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.

◆ ReducedAndOrderedInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 62 of file adaptiveRMaxPlaner.h.

65  {
66  return new AdaptiveRMaxPlaner(new MDDOperatorStrategy< double >(),
67  discountFactor,
68  epsilon,
69  learner,
70  verbose);
71  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

◆ setOptimalStrategy()

void gum::IDecisionStrategy::setOptimalStrategy ( const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *  optPol)
inlineinherited

Definition at line 89 of file IDecisionStrategy.h.

89  {
90  optPol_ = const_cast< MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* >(optPol);
91  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol_

◆ spumddInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::spumddInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 79 of file structuredPlaner.h.

81  {
82  return new StructuredPlaner< GUM_SCALAR >(new MDDOperatorStrategy< GUM_SCALAR >(),
83  discountFactor,
84  epsilon,
85  verbose);
86  }

◆ stateOptimalPolicy()

virtual ActionSet gum::IDecisionStrategy::stateOptimalPolicy ( const Instantiation curState)
inlinevirtualinherited

Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.

Definition at line 93 of file IDecisionStrategy.h.

93  {
94  return (optPol_ && optPol_->realSize() != 0) ? optPol_->get(curState) : allActions_;
95  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol_

◆ sviInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::sviInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 91 of file structuredPlaner.h.

93  {
94  return new StructuredPlaner< GUM_SCALAR >(new TreeOperatorStrategy< GUM_SCALAR >(),
95  discountFactor,
96  epsilon,
97  verbose);
98  }

◆ TreeInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::TreeInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 76 of file adaptiveRMaxPlaner.h.

79  {
80  return new AdaptiveRMaxPlaner(new TreeOperatorStrategy< double >(),
81  discountFactor,
82  epsilon,
83  learner,
84  verbose);
85  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

◆ valueIteration_()

MultiDimFunctionGraph< double > * gum::AdaptiveRMaxPlaner::valueIteration_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 139 of file adaptiveRMaxPlaner.cpp.

References gum::Set< Key, Alloc >::emplace().

139  {
140  // *****************************************************************************************
141  // Loop reset
143  newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
144 
145  // *****************************************************************************************
146  // For each action
147  std::vector< MultiDimFunctionGraph< double >* > qActionsSet;
148  for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
149  MultiDimFunctionGraph< double >* qAction = evalQaction_(newVFunction, *actionIter);
150 
151  // *******************************************************************************************
152  // Next, we add the reward
153  qAction = addReward_(qAction, *actionIter);
154 
155  qAction = this->operator_->maximize(
156  _actionsRMaxTable_[*actionIter],
157  this->operator_->multiply(qAction, _actionsBoolTable_[*actionIter], 1),
158  2);
159 
160  qActionsSet.push_back(qAction);
161  }
162  delete newVFunction;
163 
164  // *****************************************************************************************
165  // Next to evaluate main value function, we take maximise over all action
166  // value, ...
167  newVFunction = maximiseQactions_(qActionsSet);
168 
169  return newVFunction;
170  }
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:136
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:116
const FMDP< double > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< double > * evalQaction_(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
virtual MultiDimFunctionGraph< double > * addReward_(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsBoolTable_
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< double > * maximiseQactions_(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
IOperatorStrategy< double > * operator_
HashTable< Idx, MultiDimFunctionGraph< double > *> _actionsRMaxTable_
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:

◆ vFunction()

INLINE const MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction ( )
inlineinherited

Returns a const ptr on the value function computed so far.

Definition at line 138 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

138 { return vFunction_; }
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.

◆ vFunctionSize()

virtual Size gum::StructuredPlaner< double >::vFunctionSize ( )
inlinevirtualinherited

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 143 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_().

143 { return vFunction_ != nullptr ? vFunction_->realSize() : 0; }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< double > * vFunction_
The Value Function computed iteratively.

Member Data Documentation

◆ _actionsBoolTable_

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::_actionsBoolTable_
private

Definition at line 186 of file adaptiveRMaxPlaner.h.

◆ _actionsRMaxTable_

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::_actionsRMaxTable_
private

Definition at line 185 of file adaptiveRMaxPlaner.h.

◆ _counterTable_

HashTable< Idx, StatesCounter* > gum::AdaptiveRMaxPlaner::_counterTable_
private

Definition at line 207 of file adaptiveRMaxPlaner.h.

◆ _fmdpLearner_

const ILearningStrategy* gum::AdaptiveRMaxPlaner::_fmdpLearner_
private

Definition at line 187 of file adaptiveRMaxPlaner.h.

◆ _initialized_

bool gum::AdaptiveRMaxPlaner::_initialized_
private

Definition at line 210 of file adaptiveRMaxPlaner.h.

◆ _initializedTable_

HashTable< Idx, bool > gum::AdaptiveRMaxPlaner::_initializedTable_
private

Definition at line 208 of file adaptiveRMaxPlaner.h.

◆ _rmax_

double gum::AdaptiveRMaxPlaner::_rmax_
private

Definition at line 190 of file adaptiveRMaxPlaner.h.

◆ _rThreshold_

double gum::AdaptiveRMaxPlaner::_rThreshold_
private

Definition at line 189 of file adaptiveRMaxPlaner.h.

◆ allActions_

ActionSet gum::IDecisionStrategy::allActions_
protectedinherited

Definition at line 102 of file IDecisionStrategy.h.

◆ discountFactor_

double gum::StructuredPlaner< double >::discountFactor_
protectedinherited

Discount Factor used for infinite horizon planning.

Definition at line 350 of file structuredPlaner.h.

◆ elVarSeq_

Set< const DiscreteVariable* > gum::StructuredPlaner< double >::elVarSeq_
protectedinherited

A Set to eleminate primed variables.

Definition at line 345 of file structuredPlaner.h.

◆ fmdp_

const FMDP< double >* gum::StructuredPlaner< double >::fmdp_
protectedinherited

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 325 of file structuredPlaner.h.

◆ operator_

IOperatorStrategy< double >* gum::StructuredPlaner< double >::operator_
protectedinherited

Definition at line 352 of file structuredPlaner.h.

◆ optimalPolicy_

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 340 of file structuredPlaner.h.

◆ optPol_

const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::IDecisionStrategy::optPol_
protectedinherited

Definition at line 99 of file IDecisionStrategy.h.

◆ verbose_

bool gum::StructuredPlaner< double >::verbose_
protectedinherited

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 358 of file structuredPlaner.h.

◆ vFunction_

MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction_
protectedinherited

The Value Function computed iteratively.

Definition at line 330 of file structuredPlaner.h.


The documentation for this class was generated from the following files: