aGrUM  0.16.0
gum::AdaptiveRMaxPlaner Class Reference

<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...

#include <adaptiveRMaxPlaner.h>

+ Inheritance diagram for gum::AdaptiveRMaxPlaner:
+ Collaboration diagram for gum::AdaptiveRMaxPlaner:

Public Member Functions

Planning Methods
void initialize (const FMDP< double > *fmdp)
 Initializes data structure needed for making the planning. More...
 
void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 
Datastructure access methods
INLINE const FMDP< double > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< double > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 

Static Public Member Functions

static AdaptiveRMaxPlanerReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static AdaptiveRMaxPlanerTreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< double > * _fmdp
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< double > * _vFunction
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
 The associated optimal policy. More...
 
Set< const DiscreteVariable * > _elVarSeq
 A Set to eleminate primed variables. More...
 
double _discountFactor
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< double > * _operator
 
bool _verbose
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void _initVFunction ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< double > * _valueIteration ()
 Performs a single step of value iteration. More...
 
Optimal policy extraction methods
virtual void _evalPolicy ()
 Perform the required tasks to extract an optimal policy. More...
 
Value Iteration Methods
virtual MultiDimFunctionGraph< double > * _evalQaction (const MultiDimFunctionGraph< double > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< double > * _maximiseQactions (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< double > * _minimiseFunctions (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< double > * _addReward (MultiDimFunctionGraph< double > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 
Optimal policy extraction methods
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _makeArgMax (const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
 Performs argmax_a Q(s,a) More...
 
void _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 

Constructor & destructor.

 AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
 Default constructor. More...
 
 ~AdaptiveRMaxPlaner ()
 Default destructor. More...
 

Incremental methods

HashTable< Idx, StatesCounter *> __counterTable
 
HashTable< Idx, bool__initializedTable
 
bool __initialized
 
void checkState (const Instantiation &newState, Idx actionId)
 

Incremental methods

void setOptimalStrategy (const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol)
 
virtual ActionSet stateOptimalPolicy (const Instantiation &curState)
 
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
 
ActionSet _allActions
 

Detailed Description

<agrum/FMDP/planning/adaptiveRMaxPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a RMax planning on given in parameter factored markov decision process

Definition at line 53 of file adaptiveRMaxPlaner.h.

Constructor & Destructor Documentation

◆ AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::AdaptiveRMaxPlaner ( IOperatorStrategy< double > *  opi,
double  discountFactor,
double  epsilon,
const ILearningStrategy learner,
bool  verbose 
)
private

Default constructor.

Definition at line 63 of file adaptiveRMaxPlaner.cpp.

Referenced by ReducedAndOrderedInstance(), and TreeInstance().

67  :
68  StructuredPlaner(opi, discountFactor, epsilon, verbose),
69  IDecisionStrategy(), __fmdpLearner(learner), __initialized(false) {
70  GUM_CONSTRUCTOR(AdaptiveRMaxPlaner);
71  }
const ILearningStrategy * __fmdpLearner
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
StructuredPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, bool verbose)
Default constructor.
+ Here is the caller graph for this function:

◆ ~AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner ( )

Default destructor.

Definition at line 76 of file adaptiveRMaxPlaner.cpp.

References __counterTable.

Referenced by TreeInstance().

76  {
77  GUM_DESTRUCTOR(AdaptiveRMaxPlaner);
78 
79  for (HashTableIteratorSafe< Idx, StatesCounter* > scIter =
80  __counterTable.beginSafe();
81  scIter != __counterTable.endSafe();
82  ++scIter)
83  delete scIter.val();
84  }
HashTable< Idx, StatesCounter *> __counterTable
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
+ Here is the caller graph for this function:

Member Function Documentation

◆ __clearTables()

void gum::AdaptiveRMaxPlaner::__clearTables ( )
private

Definition at line 345 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::FMDP< GUM_SCALAR >::endActions(), and gum::StructuredPlaner< double >::fmdp().

Referenced by makePlanning(), and TreeInstance().

345  {
346  for (auto actionIter = this->fmdp()->beginActions();
347  actionIter != this->fmdp()->endActions();
348  ++actionIter) {
349  delete __actionsBoolTable[*actionIter];
350  delete __actionsRMaxTable[*actionIter];
351  }
352  __actionsRMaxTable.clear();
353  __actionsBoolTable.clear();
354  }
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ __makeRMaxFunctionGraphs()

void gum::AdaptiveRMaxPlaner::__makeRMaxFunctionGraphs ( )
private

Definition at line 238 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, __counterTable, __fmdpLearner, __rmax, __rThreshold, __visitLearner(), gum::StructuredPlaner< double >::_discountFactor, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_minimiseFunctions(), gum::StructuredPlaner< double >::_operator, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::beginVariables(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::clean(), gum::FMDP< GUM_SCALAR >::endActions(), gum::FMDP< GUM_SCALAR >::endVariables(), gum::StructuredPlaner< double >::fmdp(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::IVisitableGraphLearner::insertSetOfVars(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::ILearningStrategy::modaMax(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::reduce(), gum::ILearningStrategy::rMax(), gum::IVisitableGraphLearner::root(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by makePlanning(), and TreeInstance().

238  {
239  __rThreshold =
240  __fmdpLearner->modaMax() * 5 > 30 ? __fmdpLearner->modaMax() * 5 : 30;
241  __rmax = __fmdpLearner->rMax() / (1.0 - this->_discountFactor);
242 
243  for (auto actionIter = this->fmdp()->beginActions();
244  actionIter != this->fmdp()->endActions();
245  ++actionIter) {
246  std::vector< MultiDimFunctionGraph< double >* > rmaxs;
247  std::vector< MultiDimFunctionGraph< double >* > boolQs;
248 
249  for (auto varIter = this->fmdp()->beginVariables();
250  varIter != this->fmdp()->endVariables();
251  ++varIter) {
252  const IVisitableGraphLearner* visited = __counterTable[*actionIter];
253 
258 
259  visited->insertSetOfVars(varRMax);
260  visited->insertSetOfVars(varBoolQ);
261 
262  std::pair< NodeId, NodeId > rooty =
263  __visitLearner(visited, visited->root(), varRMax, varBoolQ);
264  varRMax->manager()->setRootNode(rooty.first);
265  varRMax->manager()->reduce();
266  varRMax->manager()->clean();
267  varBoolQ->manager()->setRootNode(rooty.second);
268  varBoolQ->manager()->reduce();
269  varBoolQ->manager()->clean();
270 
271  rmaxs.push_back(varRMax);
272  boolQs.push_back(varBoolQ);
273 
274  // std::cout << RECASTED(this->_fmdp->transition(*actionIter,
275  // *varIter))->toDot() << std::endl;
276  // for( auto varIter2 =
277  // RECASTED(this->_fmdp->transition(*actionIter,
278  // *varIter))->variablesSequence().beginSafe(); varIter2 !=
279  // RECASTED(this->_fmdp->transition(*actionIter,
280  // *varIter))->variablesSequence().endSafe(); ++varIter2 )
281  // std::cout << (*varIter2)->name() << " | ";
282  // std::cout << std::endl;
283 
284  // std::cout << varRMax->toDot() << std::endl;
285  // for( auto varIter =
286  // varRMax->variablesSequence().beginSafe(); varIter !=
287  // varRMax->variablesSequence().endSafe(); ++varIter )
288  // std::cout << (*varIter)->name() << " | ";
289  // std::cout << std::endl;
290 
291  // std::cout << varBoolQ->toDot() << std::endl;
292  // for( auto varIter =
293  // varBoolQ->variablesSequence().beginSafe(); varIter !=
294  // varBoolQ->variablesSequence().endSafe(); ++varIter )
295  // std::cout << (*varIter)->name() << " | ";
296  // std::cout << std::endl;
297  }
298 
299  // std::cout << "Maximising" << std::endl;
300  __actionsRMaxTable.insert(*actionIter, this->_maximiseQactions(rmaxs));
301  __actionsBoolTable.insert(*actionIter, this->_minimiseFunctions(boolQs));
302  }
303  }
HashTable< Idx, StatesCounter *> __counterTable
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
void clean()
Removes var without nodes in the diagram.
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
double _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * _operator
std::pair< NodeId, NodeId > __visitLearner(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
SequenceIteratorSafe< const DiscreteVariable *> beginVariables() const
Returns an iterator reference to he beginning of the list of variables.
Definition: fmdp.h:95
const ILearningStrategy * __fmdpLearner
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
virtual double modaMax() const =0
learnerSize
virtual double rMax() const =0
learnerSize
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
SequenceIteratorSafe< const DiscreteVariable *> endVariables() const
Returns an iterator reference to the end of the list of variables.
Definition: fmdp.h:102
virtual void reduce()=0
Ensures that every isomorphic subgraphs are merged together.
virtual MultiDimFunctionGraph< double > * _minimiseFunctions(std::vector< MultiDimFunctionGraph< double > *> &)
Performs min_i F_i.
virtual MultiDimFunctionGraph< double > * _maximiseQactions(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ __visitLearner()

std::pair< NodeId, NodeId > gum::AdaptiveRMaxPlaner::__visitLearner ( const IVisitableGraphLearner visited,
NodeId  currentNodeId,
MultiDimFunctionGraph< double > *  rmax,
MultiDimFunctionGraph< double > *  boolQ 
)
private

Definition at line 309 of file adaptiveRMaxPlaner.cpp.

References __rmax, __rThreshold, gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addInternalNode(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::DiscreteVariable::domainSize(), gum::IVisitableGraphLearner::isTerminal(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::IVisitableGraphLearner::nodeNbObservation(), gum::IVisitableGraphLearner::nodeSon(), gum::IVisitableGraphLearner::nodeVar(), and SOA_ALLOCATE.

Referenced by __makeRMaxFunctionGraphs(), and TreeInstance().

312  {
313  std::pair< NodeId, NodeId > rep;
314  if (visited->isTerminal(currentNodeId)) {
315  rep.first = rmax->manager()->addTerminalNode(
316  visited->nodeNbObservation(currentNodeId) < __rThreshold ? __rmax : 0.0);
317  rep.second = boolQ->manager()->addTerminalNode(
318  visited->nodeNbObservation(currentNodeId) < __rThreshold ? 0.0 : 1.0);
319  return rep;
320  }
321 
322  NodeId* rmaxsons = static_cast< NodeId* >(SOA_ALLOCATE(
323  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
324  NodeId* bqsons = static_cast< NodeId* >(SOA_ALLOCATE(
325  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
326 
327  for (Idx moda = 0; moda < visited->nodeVar(currentNodeId)->domainSize();
328  ++moda) {
329  std::pair< NodeId, NodeId > sonp = __visitLearner(
330  visited, visited->nodeSon(currentNodeId, moda), rmax, boolQ);
331  rmaxsons[moda] = sonp.first;
332  bqsons[moda] = sonp.second;
333  }
334 
335  rep.first =
336  rmax->manager()->addInternalNode(visited->nodeVar(currentNodeId), rmaxsons);
337  rep.second =
338  boolQ->manager()->addInternalNode(visited->nodeVar(currentNodeId), bqsons);
339  return rep;
340  }
std::pair< NodeId, NodeId > __visitLearner(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
NodeId addInternalNode(const DiscreteVariable *var)
Inserts a new non terminal node in graph.
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
Size NodeId
Type for node ids.
Definition: graphElements.h:98
#define SOA_ALLOCATE(x)
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _addReward()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_addReward ( MultiDimFunctionGraph< double > *  function,
Idx  actionId = 0 
)
protectedvirtualinherited

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 408 of file structuredPlaner_tpl.h.

References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndMultiplyByScalar(), and RECAST.

Referenced by _evalPolicy(), and _valueIteration().

409  {
410  // *****************************************************************************************
411  // ... we multiply the result by the discount factor, ...
414  newVFunction->copyAndMultiplyByScalar(*Vold, this->_discountFactor);
415  delete Vold;
416 
417  // *****************************************************************************************
418  // ... and finally add reward
419  newVFunction = _operator->add(newVFunction, RECAST(_fmdp->reward(actionId)));
420 
421  return newVFunction;
422  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
double _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:324
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

◆ _argmaximiseQactions()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::_argmaximiseQactions ( std::vector< MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtualinherited

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 540 of file structuredPlaner_tpl.h.

Referenced by _evalPolicy().

543  {
544  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
545  newVFunction = qActionsSet.back();
546  qActionsSet.pop_back();
547 
548  while (!qActionsSet.empty()) {
549  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
550  qAction = qActionsSet.back();
551  qActionsSet.pop_back();
552  newVFunction = _operator->argmaximize(newVFunction, qAction);
553  }
554 
555  return newVFunction;
556  }
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximize(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f1, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f2, Idx del=3)=0

◆ _evalPolicy()

void gum::AdaptiveRMaxPlaner::_evalPolicy ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 194 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_argmaximiseQactions(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_extractOptimalPolicy(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_makeArgMax(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().

Referenced by TreeInstance().

194  {
195  // *****************************************************************************************
196  // Loop reset
197  MultiDimFunctionGraph< double >* newVFunction =
199  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
200 
201  std::vector<
202  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* >
203  argMaxQActionsSet;
204  // *****************************************************************************************
205  // For each action
206  for (auto actionIter = _fmdp->beginActions();
207  actionIter != _fmdp->endActions();
208  ++actionIter) {
210  this->_evalQaction(newVFunction, *actionIter);
211 
212  qAction = this->_addReward(qAction, *actionIter);
213 
214  qAction = this->_operator->maximize(
215  __actionsRMaxTable[*actionIter],
216  this->_operator->multiply(qAction, __actionsBoolTable[*actionIter], 1),
217  2);
218 
219  argMaxQActionsSet.push_back(_makeArgMax(qAction, *actionIter));
220  }
221  delete newVFunction;
222 
223  // *****************************************************************************************
224  // Next to evaluate main value function, we take maximise over all action
225  // value, ...
226  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >*
227  argMaxVFunction = _argmaximiseQactions(argMaxQActionsSet);
228 
229  // *****************************************************************************************
230  // Next to evaluate main value function, we take maximise over all action
231  // value, ...
232  _extractOptimalPolicy(argMaxVFunction);
233  }
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions(std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
Performs argmax_a Q(s,a)
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:117
void _extractOptimalPolicy(const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
virtual MultiDimFunctionGraph< double > * _evalQaction(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _makeArgMax(const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< double > * _addReward(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _evalQaction()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_evalQaction ( const MultiDimFunctionGraph< double > *  Vold,
Idx  actionId 
)
protectedvirtualinherited

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 353 of file structuredPlaner_tpl.h.

Referenced by _evalPolicy(), and _valueIteration().

354  {
355  // ******************************************************************************
356  // Initialisation :
357  // Creating a copy of last Vfunction to deduce from the new Qaction
358  // And finding the first var to eleminate (the one at the end)
359 
360  return _operator->regress(Vold, actionId, this->_fmdp, this->_elVarSeq);
361  }
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
Set< const DiscreteVariable * > _elVarSeq
A Set to eleminate primed variables.
virtual MultiDimFunctionGraph< GUM_SCALAR > * regress(const MultiDimFunctionGraph< GUM_SCALAR > *Vold, Idx actionId, const FMDP< GUM_SCALAR > *fmdp, const Set< const DiscreteVariable * > &elVarSeq)=0
Performs a multiplication/projection on given qAction.

◆ _extractOptimalPolicy()

void gum::StructuredPlaner< double >::_extractOptimalPolicy ( const MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protectedinherited

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 564 of file structuredPlaner_tpl.h.

Referenced by _evalPolicy().

567  {
568  _optimalPolicy->clear();
569 
570  // Insertion des nouvelles variables
571  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
572  argMaxOptimalValueFunction->variablesSequence().beginSafe();
573  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
574  ++varIter)
575  _optimalPolicy->add(**varIter);
576 
578  _optimalPolicy->manager()->setRootNode(__recurExtractOptPol(
579  argMaxOptimalValueFunction->root(), argMaxOptimalValueFunction, src2dest));
580 
581  delete argMaxOptimalValueFunction;
582  }
NodeId __recurExtractOptPol(NodeId, const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ _initVFunction()

void gum::AdaptiveRMaxPlaner::_initVFunction ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 133 of file adaptiveRMaxPlaner.cpp.

References gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::IOperatorStrategy< GUM_SCALAR >::add(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), RECASTED, gum::FMDP< GUM_SCALAR >::reward(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by TreeInstance().

133  {
136  for (auto actionIter = _fmdp->beginActions();
137  actionIter != _fmdp->endActions();
138  ++actionIter)
139  _vFunction = this->_operator->add(
140  _vFunction, RECASTED(this->_fmdp->reward(*actionIter)), 1);
141  }
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:324
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
#define RECASTED(x)
For shorter line and hence more comprehensive code purposes only.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _makeArgMax()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::_makeArgMax ( const MultiDimFunctionGraph< double > *  Qaction,
Idx  actionId 
)
protectedinherited

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 482 of file structuredPlaner_tpl.h.

References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::root(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode(), and gum::MultiDimImplementation< GUM_SCALAR >::variablesSequence().

Referenced by _evalPolicy().

483  {
484  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
486 
487  // Insertion des nouvelles variables
488  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
489  qAction->variablesSequence().beginSafe();
490  varIter != qAction->variablesSequence().endSafe();
491  ++varIter)
492  amcpy->add(**varIter);
493 
495  amcpy->manager()->setRootNode(
496  __recurArgMaxCopy(qAction->root(), actionId, qAction, amcpy, src2dest));
497 
498  delete qAction;
499  return amcpy;
500  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * getArgMaxFunctionInstance()=0
IOperatorStrategy< double > * _operator
NodeId __recurArgMaxCopy(NodeId, Idx, const MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.

◆ _maximiseQactions()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_maximiseQactions ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 369 of file structuredPlaner_tpl.h.

Referenced by __makeRMaxFunctionGraphs(), and _valueIteration().

370  {
371  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
372  qActionsSet.pop_back();
373 
374  while (!qActionsSet.empty()) {
375  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
376  qActionsSet.pop_back();
377  newVFunction = _operator->maximize(newVFunction, qAction);
378  }
379 
380  return newVFunction;
381  }
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0

◆ _minimiseFunctions()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_minimiseFunctions ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 389 of file structuredPlaner_tpl.h.

Referenced by __makeRMaxFunctionGraphs().

390  {
391  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
392  qActionsSet.pop_back();
393 
394  while (!qActionsSet.empty()) {
395  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
396  qActionsSet.pop_back();
397  newVFunction = _operator->minimize(newVFunction, qAction);
398  }
399 
400  return newVFunction;
401  }
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0

◆ _valueIteration()

MultiDimFunctionGraph< double > * gum::AdaptiveRMaxPlaner::_valueIteration ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 146 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().

Referenced by TreeInstance().

146  {
147  // *****************************************************************************************
148  // Loop reset
149  MultiDimFunctionGraph< double >* newVFunction =
151  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
152 
153  // *****************************************************************************************
154  // For each action
155  std::vector< MultiDimFunctionGraph< double >* > qActionsSet;
156  for (auto actionIter = _fmdp->beginActions();
157  actionIter != _fmdp->endActions();
158  ++actionIter) {
160  _evalQaction(newVFunction, *actionIter);
161 
162  // *******************************************************************************************
163  // Next, we add the reward
164  qAction = _addReward(qAction, *actionIter);
165 
166  qAction = this->_operator->maximize(
167  __actionsRMaxTable[*actionIter],
168  this->_operator->multiply(qAction, __actionsBoolTable[*actionIter], 1),
169  2);
170 
171  qActionsSet.push_back(qAction);
172  }
173  delete newVFunction;
174 
175  // *****************************************************************************************
176  // Next to evaluate main value function, we take maximise over all action
177  // value, ...
178  newVFunction = _maximiseQactions(qActionsSet);
179 
180  return newVFunction;
181  }
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:117
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
virtual MultiDimFunctionGraph< double > * _evalQaction(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< double > * _addReward(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< double > * _maximiseQactions(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ checkState()

void gum::AdaptiveRMaxPlaner::checkState ( const Instantiation newState,
Idx  actionId 
)
inlinevirtual

Implements gum::IDecisionStrategy.

Definition at line 201 of file adaptiveRMaxPlaner.h.

References __counterTable, and __initializedTable.

201  {
202  if (!__initializedTable[actionId]) {
203  __counterTable[actionId]->reset(newState);
204  __initializedTable[actionId] = true;
205  } else
206  __counterTable[actionId]->incState(newState);
207  }
HashTable< Idx, StatesCounter *> __counterTable
HashTable< Idx, bool > __initializedTable

◆ fmdp()

INLINE const FMDP< double >* gum::StructuredPlaner< double >::fmdp ( )
inlineinherited

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 137 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_fmdp.

Referenced by __clearTables(), __makeRMaxFunctionGraphs(), and TreeInstance().

137 { return _fmdp; }
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...

◆ initialize()

void gum::AdaptiveRMaxPlaner::initialize ( const FMDP< double > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Reimplemented from gum::IDecisionStrategy.

Definition at line 97 of file adaptiveRMaxPlaner.cpp.

References __counterTable, __initialized, __initializedTable, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IDecisionStrategy::initialize(), gum::StructuredPlaner< GUM_SCALAR >::initialize(), and gum::HashTable< Key, Val, Alloc >::insert().

Referenced by TreeInstance().

97  {
98  if (!__initialized) {
101  for (auto actionIter = fmdp->beginActions();
102  actionIter != fmdp->endActions();
103  ++actionIter) {
104  __counterTable.insert(*actionIter, new StatesCounter());
105  __initializedTable.insert(*actionIter, false);
106  }
107  __initialized = true;
108  }
109  }
HashTable< Idx, StatesCounter *> __counterTable
virtual void initialize(const FMDP< double > *fmdp)
Initializes the learner.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:137
HashTable< Idx, bool > __initializedTable
virtual void initialize(const FMDP< GUM_SCALAR > *fmdp)
Initializes data structure needed for making the planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:144
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ makePlanning()

void gum::AdaptiveRMaxPlaner::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 114 of file adaptiveRMaxPlaner.cpp.

References __clearTables(), __makeRMaxFunctionGraphs(), and gum::StructuredPlaner< GUM_SCALAR >::makePlanning().

Referenced by TreeInstance().

114  {
116 
118 
119  __clearTables();
120  }
virtual void makePlanning(Idx nbStep=1000000)
Performs a value iteration.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ optimalPolicy()

INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::optimalPolicy ( )
inlinevirtualinherited

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< double >.

Definition at line 157 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy.

157  {
158  return _optimalPolicy;
159  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ optimalPolicy2String()

std::string gum::StructuredPlaner< double >::optimalPolicy2String ( )
virtualinherited

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< double >.

Definition at line 105 of file structuredPlaner_tpl.h.

References gum::ActionSet::beginSafe(), gum::HashTable< Key, Val, Alloc >::beginSafe(), gum::Link< T >::element(), gum::ActionSet::endSafe(), gum::HashTable< Key, Val, Alloc >::endSafe(), gum::Set< Key, Alloc >::exists(), gum::HashTable< Key, Val, Alloc >::exists(), gum::HashTable< Key, Val, Alloc >::insert(), gum::HashTable< Key, Val, Alloc >::key(), gum::DiscreteVariable::label(), gum::Variable::name(), gum::InternalNode::nbSons(), gum::Link< T >::nextLink(), gum::InternalNode::nodeVar(), and gum::InternalNode::son().

105  {
106  // ************************************************************************
107  // Discarding the case where no \pi* have been computed
108  if (!_optimalPolicy || _optimalPolicy->root() == 0)
109  return "NO OPTIMAL POLICY CALCULATED YET";
110 
111  // ************************************************************************
112  // Initialisation
113 
114  // Declaration of the needed string stream
115  std::stringstream output;
116  std::stringstream terminalStream;
117  std::stringstream nonTerminalStream;
118  std::stringstream arcstream;
119 
120  // First line for the toDot
121  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
122 
123  // Form line for the internal node stream en the terminal node stream
124  terminalStream << "node [shape = box];" << std::endl;
125  nonTerminalStream << "node [shape = ellipse];" << std::endl;
126 
127  // For somme clarity in the final string
128  std::string tab = "\t";
129 
130  // To know if we already checked a node or not
131  Set< NodeId > visited;
132 
133  // FIFO of nodes to visit
134  std::queue< NodeId > fifo;
135 
136  // Loading the FIFO
137  fifo.push(_optimalPolicy->root());
138  visited << _optimalPolicy->root();
139 
140 
141  // ************************************************************************
142  // Main loop
143  while (!fifo.empty()) {
144  // Node to visit
145  NodeId currentNodeId = fifo.front();
146  fifo.pop();
147 
148  // Checking if it is terminal
149  if (_optimalPolicy->isTerminalNode(currentNodeId)) {
150  // Get back the associated ActionSet
151  ActionSet ase = _optimalPolicy->nodeValue(currentNodeId);
152 
153  // Creating a line for this node
154  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId
155  << " [label=\"" << currentNodeId << " - ";
156 
157  // Enumerating and adding to the line the associated optimal actions
158  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe();
159  valIter != ase.endSafe();
160  ++valIter)
161  terminalStream << _fmdp->actionName(*valIter) << " ";
162 
163  // Terminating line
164  terminalStream << "\"];" << std::endl;
165  continue;
166  }
167 
168  // Either wise
169  {
170  // Geting back the associated internal node
171  const InternalNode* currentNode = _optimalPolicy->node(currentNodeId);
172 
173  // Creating a line in internalnode stream for this node
174  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId
175  << " [label=\"" << currentNodeId << " - "
176  << currentNode->nodeVar()->name() << "\"];" << std::endl;
177 
178  // Going through the sons and agregating them according the the sons Ids
179  HashTable< NodeId, LinkedList< Idx >* > sonMap;
180  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
181  if (!visited.exists(currentNode->son(sonIter))) {
182  fifo.push(currentNode->son(sonIter));
183  visited << currentNode->son(sonIter);
184  }
185  if (!sonMap.exists(currentNode->son(sonIter)))
186  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
187  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
188  }
189 
190  // Adding to the arc stram
191  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe();
192  ++sonIter) {
193  arcstream << tab << currentNodeId << " -> " << sonIter.key()
194  << " [label=\" ";
195  Link< Idx >* modaIter = sonIter.val()->list();
196  while (modaIter) {
197  arcstream << currentNode->nodeVar()->label(modaIter->element());
198  if (modaIter->nextLink()) arcstream << ", ";
199  modaIter = modaIter->nextLink();
200  }
201  arcstream << "\",color=\"#00ff00\"];" << std::endl;
202  delete sonIter.val();
203  }
204  }
205  }
206 
207  // Terminating
208  output << terminalStream.str() << std::endl
209  << nonTerminalStream.str() << std::endl
210  << arcstream.str() << std::endl
211  << "}" << std::endl;
212 
213  return output.str();
214  }
const std::string & actionName(Idx actionId) const
Returns name of action given in parameter.
Definition: fmdp_tpl.h:347
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:607
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:98
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:613

◆ optimalPolicySize()

virtual Size gum::StructuredPlaner< double >::optimalPolicySize ( )
inlinevirtualinherited

◆ ReducedAndOrderedInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 65 of file adaptiveRMaxPlaner.h.

References AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxMDDInstance().

68  {
69  return new AdaptiveRMaxPlaner(new MDDOperatorStrategy< double >(),
70  discountFactor,
71  epsilon,
72  learner,
73  verbose);
74  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ setOptimalStrategy()

void gum::IDecisionStrategy::setOptimalStrategy ( const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *  optPol)
inlineinherited

Definition at line 90 of file IDecisionStrategy.h.

References gum::IDecisionStrategy::_optPol.

Referenced by gum::SDYNA::makePlanning().

91  {
92  _optPol =
93  const_cast< MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* >(
94  optPol);
95  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
+ Here is the caller graph for this function:

◆ spumddInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::spumddInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 80 of file structuredPlaner.h.

82  {
83  return new StructuredPlaner< GUM_SCALAR >(
84  new MDDOperatorStrategy< GUM_SCALAR >(),
85  discountFactor,
86  epsilon,
87  verbose);
88  }

◆ stateOptimalPolicy()

virtual ActionSet gum::IDecisionStrategy::stateOptimalPolicy ( const Instantiation curState)
inlinevirtualinherited

Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.

Definition at line 97 of file IDecisionStrategy.h.

References gum::IDecisionStrategy::_allActions, and gum::IDecisionStrategy::_optPol.

Referenced by gum::E_GreedyDecider::stateOptimalPolicy(), and gum::SDYNA::takeAction().

97  {
98  return (_optPol && _optPol->realSize() != 0) ? _optPol->get(curState)
99  : _allActions;
100  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
+ Here is the caller graph for this function:

◆ sviInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::sviInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 94 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner(), and gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner().

96  {
97  return new StructuredPlaner< GUM_SCALAR >(
98  new TreeOperatorStrategy< GUM_SCALAR >(),
99  discountFactor,
100  epsilon,
101  verbose);
102  }

◆ TreeInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::TreeInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 79 of file adaptiveRMaxPlaner.h.

References __clearTables(), __makeRMaxFunctionGraphs(), __visitLearner(), _evalPolicy(), _initVFunction(), _valueIteration(), AdaptiveRMaxPlaner(), gum::StructuredPlaner< double >::fmdp(), initialize(), makePlanning(), and ~AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxTreeInstance().

82  {
83  return new AdaptiveRMaxPlaner(new TreeOperatorStrategy< double >(),
84  discountFactor,
85  epsilon,
86  learner,
87  verbose);
88  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ vFunction()

INLINE const MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction ( )
inlineinherited

Returns a const ptr on the value function computed so far.

Definition at line 142 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_vFunction.

142  {
143  return _vFunction;
144  }
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.

◆ vFunctionSize()

virtual Size gum::StructuredPlaner< double >::vFunctionSize ( )
inlinevirtualinherited

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 149 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_vFunction, and gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::realSize().

149  {
150  return _vFunction != nullptr ? _vFunction->realSize() : 0;
151  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.

Member Data Documentation

◆ __actionsBoolTable

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::__actionsBoolTable
private

◆ __actionsRMaxTable

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::__actionsRMaxTable
private

◆ __counterTable

HashTable< Idx, StatesCounter* > gum::AdaptiveRMaxPlaner::__counterTable
private

◆ __fmdpLearner

const ILearningStrategy* gum::AdaptiveRMaxPlaner::__fmdpLearner
private

Definition at line 190 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs().

◆ __initialized

bool gum::AdaptiveRMaxPlaner::__initialized
private

Definition at line 213 of file adaptiveRMaxPlaner.h.

Referenced by initialize().

◆ __initializedTable

HashTable< Idx, bool > gum::AdaptiveRMaxPlaner::__initializedTable
private

Definition at line 211 of file adaptiveRMaxPlaner.h.

Referenced by checkState(), and initialize().

◆ __rmax

double gum::AdaptiveRMaxPlaner::__rmax
private

Definition at line 193 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().

◆ __rThreshold

double gum::AdaptiveRMaxPlaner::__rThreshold
private

Definition at line 192 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().

◆ _allActions

◆ _discountFactor

double gum::StructuredPlaner< double >::_discountFactor
protectedinherited

Discount Factor used for infinite horizon planning.

Definition at line 363 of file structuredPlaner.h.

Referenced by __makeRMaxFunctionGraphs().

◆ _elVarSeq

Set< const DiscreteVariable* > gum::StructuredPlaner< double >::_elVarSeq
protectedinherited

A Set to eleminate primed variables.

Definition at line 358 of file structuredPlaner.h.

◆ _fmdp

const FMDP< double >* gum::StructuredPlaner< double >::_fmdp
protectedinherited

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 338 of file structuredPlaner.h.

Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().

◆ _operator

IOperatorStrategy< double >* gum::StructuredPlaner< double >::_operator
protectedinherited

◆ _optimalPolicy

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the _fmdp is required. optimalPolicy2String do this job.

Definition at line 353 of file structuredPlaner.h.

◆ _optPol

◆ _verbose

bool gum::StructuredPlaner< double >::_verbose
protectedinherited

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 371 of file structuredPlaner.h.

◆ _vFunction

MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_vFunction
protectedinherited

The Value Function computed iteratively.

Definition at line 343 of file structuredPlaner.h.

Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().


The documentation for this class was generated from the following files: