aGrUM  0.14.2
gum::AdaptiveRMaxPlaner Class Reference

<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...

#include <adaptiveRMaxPlaner.h>

+ Inheritance diagram for gum::AdaptiveRMaxPlaner:
+ Collaboration diagram for gum::AdaptiveRMaxPlaner:

Public Member Functions

Planning Methods
void initialize (const FMDP< double > *fmdp)
 Initializes data structure needed for making the planning. More...
 
void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 
Datastructure access methods
INLINE const FMDP< double > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< double > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 

Static Public Member Functions

static AdaptiveRMaxPlanerReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static AdaptiveRMaxPlanerTreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< double > * _fmdp
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< double > * _vFunction
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
 The associated optimal policy. More...
 
Set< const DiscreteVariable * > _elVarSeq
 A Set to eleminate primed variables. More...
 
double _discountFactor
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< double > * _operator
 
bool _verbose
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void _initVFunction ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< double > * _valueIteration ()
 Performs a single step of value iteration. More...
 
Optimal policy extraction methods
virtual void _evalPolicy ()
 Perform the required tasks to extract an optimal policy. More...
 
Value Iteration Methods
virtual MultiDimFunctionGraph< double > * _evalQaction (const MultiDimFunctionGraph< double > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< double > * _maximiseQactions (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< double > * _minimiseFunctions (std::vector< MultiDimFunctionGraph< double > *> &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< double > * _addReward (MultiDimFunctionGraph< double > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 
Optimal policy extraction methods
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _makeArgMax (const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
 Performs argmax_a Q(s,a) More...
 
void _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 

Constructor & destructor.

 AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
 Default constructor. More...
 
 ~AdaptiveRMaxPlaner ()
 Default destructor. More...
 

Incremental methods

HashTable< Idx, StatesCounter *> __counterTable
 
HashTable< Idx, bool__initializedTable
 
bool __initialized
 
void checkState (const Instantiation &newState, Idx actionId)
 

Incremental methods

void setOptimalStrategy (const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol)
 
virtual ActionSet stateOptimalPolicy (const Instantiation &curState)
 
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
 
ActionSet _allActions
 

Detailed Description

<agrum/FMDP/planning/adaptiveRMaxPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a RMax planning on given in parameter factored markov decision process

Definition at line 50 of file adaptiveRMaxPlaner.h.

Constructor & Destructor Documentation

◆ AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::AdaptiveRMaxPlaner ( IOperatorStrategy< double > *  opi,
double  discountFactor,
double  epsilon,
const ILearningStrategy learner,
bool  verbose 
)
private

Default constructor.

Definition at line 60 of file adaptiveRMaxPlaner.cpp.

Referenced by ReducedAndOrderedInstance(), and TreeInstance().

64  :
65  StructuredPlaner(opi, discountFactor, epsilon, verbose),
66  IDecisionStrategy(), __fmdpLearner(learner), __initialized(false) {
67  GUM_CONSTRUCTOR(AdaptiveRMaxPlaner);
68  }
const ILearningStrategy * __fmdpLearner
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
StructuredPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, bool verbose)
Default constructor.
+ Here is the caller graph for this function:

◆ ~AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner ( )

Default destructor.

Definition at line 73 of file adaptiveRMaxPlaner.cpp.

References __counterTable.

Referenced by TreeInstance().

73  {
74  GUM_DESTRUCTOR(AdaptiveRMaxPlaner);
75 
76  for (HashTableIteratorSafe< Idx, StatesCounter* > scIter =
77  __counterTable.beginSafe();
78  scIter != __counterTable.endSafe();
79  ++scIter)
80  delete scIter.val();
81  }
HashTable< Idx, StatesCounter *> __counterTable
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
+ Here is the caller graph for this function:

Member Function Documentation

◆ __clearTables()

void gum::AdaptiveRMaxPlaner::__clearTables ( )
private

Definition at line 342 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::FMDP< GUM_SCALAR >::endActions(), and gum::StructuredPlaner< double >::fmdp().

Referenced by makePlanning(), and TreeInstance().

342  {
343  for (auto actionIter = this->fmdp()->beginActions();
344  actionIter != this->fmdp()->endActions();
345  ++actionIter) {
346  delete __actionsBoolTable[*actionIter];
347  delete __actionsRMaxTable[*actionIter];
348  }
349  __actionsRMaxTable.clear();
350  __actionsBoolTable.clear();
351  }
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ __makeRMaxFunctionGraphs()

void gum::AdaptiveRMaxPlaner::__makeRMaxFunctionGraphs ( )
private

Definition at line 235 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, __counterTable, __fmdpLearner, __rmax, __rThreshold, __visitLearner(), gum::StructuredPlaner< double >::_discountFactor, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_minimiseFunctions(), gum::StructuredPlaner< double >::_operator, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::beginVariables(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::clean(), gum::FMDP< GUM_SCALAR >::endActions(), gum::FMDP< GUM_SCALAR >::endVariables(), gum::StructuredPlaner< double >::fmdp(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::IVisitableGraphLearner::insertSetOfVars(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::ILearningStrategy::modaMax(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::reduce(), gum::ILearningStrategy::rMax(), gum::IVisitableGraphLearner::root(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by makePlanning(), and TreeInstance().

235  {
236  __rThreshold =
237  __fmdpLearner->modaMax() * 5 > 30 ? __fmdpLearner->modaMax() * 5 : 30;
238  __rmax = __fmdpLearner->rMax() / (1.0 - this->_discountFactor);
239 
240  for (auto actionIter = this->fmdp()->beginActions();
241  actionIter != this->fmdp()->endActions();
242  ++actionIter) {
243  std::vector< MultiDimFunctionGraph< double >* > rmaxs;
244  std::vector< MultiDimFunctionGraph< double >* > boolQs;
245 
246  for (auto varIter = this->fmdp()->beginVariables();
247  varIter != this->fmdp()->endVariables();
248  ++varIter) {
249  const IVisitableGraphLearner* visited = __counterTable[*actionIter];
250 
255 
256  visited->insertSetOfVars(varRMax);
257  visited->insertSetOfVars(varBoolQ);
258 
259  std::pair< NodeId, NodeId > rooty =
260  __visitLearner(visited, visited->root(), varRMax, varBoolQ);
261  varRMax->manager()->setRootNode(rooty.first);
262  varRMax->manager()->reduce();
263  varRMax->manager()->clean();
264  varBoolQ->manager()->setRootNode(rooty.second);
265  varBoolQ->manager()->reduce();
266  varBoolQ->manager()->clean();
267 
268  rmaxs.push_back(varRMax);
269  boolQs.push_back(varBoolQ);
270 
271  // std::cout << RECASTED(this->_fmdp->transition(*actionIter,
272  // *varIter))->toDot() << std::endl;
273  // for( auto varIter2 =
274  // RECASTED(this->_fmdp->transition(*actionIter,
275  // *varIter))->variablesSequence().beginSafe(); varIter2 !=
276  // RECASTED(this->_fmdp->transition(*actionIter,
277  // *varIter))->variablesSequence().endSafe(); ++varIter2 )
278  // std::cout << (*varIter2)->name() << " | ";
279  // std::cout << std::endl;
280 
281  // std::cout << varRMax->toDot() << std::endl;
282  // for( auto varIter =
283  // varRMax->variablesSequence().beginSafe(); varIter !=
284  // varRMax->variablesSequence().endSafe(); ++varIter )
285  // std::cout << (*varIter)->name() << " | ";
286  // std::cout << std::endl;
287 
288  // std::cout << varBoolQ->toDot() << std::endl;
289  // for( auto varIter =
290  // varBoolQ->variablesSequence().beginSafe(); varIter !=
291  // varBoolQ->variablesSequence().endSafe(); ++varIter )
292  // std::cout << (*varIter)->name() << " | ";
293  // std::cout << std::endl;
294  }
295 
296  // std::cout << "Maximising" << std::endl;
297  __actionsRMaxTable.insert(*actionIter, this->_maximiseQactions(rmaxs));
298  __actionsBoolTable.insert(*actionIter, this->_minimiseFunctions(boolQs));
299  }
300  }
HashTable< Idx, StatesCounter *> __counterTable
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
void clean()
Removes var without nodes in the diagram.
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
double _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * _operator
std::pair< NodeId, NodeId > __visitLearner(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
SequenceIteratorSafe< const DiscreteVariable *> beginVariables() const
Returns an iterator reference to he beginning of the list of variables.
Definition: fmdp.h:92
const ILearningStrategy * __fmdpLearner
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
virtual double modaMax() const =0
learnerSize
virtual double rMax() const =0
learnerSize
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
SequenceIteratorSafe< const DiscreteVariable *> endVariables() const
Returns an iterator reference to the end of the list of variables.
Definition: fmdp.h:99
virtual void reduce()=0
Ensures that every isomorphic subgraphs are merged together.
virtual MultiDimFunctionGraph< double > * _minimiseFunctions(std::vector< MultiDimFunctionGraph< double > *> &)
Performs min_i F_i.
virtual MultiDimFunctionGraph< double > * _maximiseQactions(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ __visitLearner()

std::pair< NodeId, NodeId > gum::AdaptiveRMaxPlaner::__visitLearner ( const IVisitableGraphLearner visited,
NodeId  currentNodeId,
MultiDimFunctionGraph< double > *  rmax,
MultiDimFunctionGraph< double > *  boolQ 
)
private

Definition at line 306 of file adaptiveRMaxPlaner.cpp.

References __rmax, __rThreshold, gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addInternalNode(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::DiscreteVariable::domainSize(), gum::IVisitableGraphLearner::isTerminal(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::IVisitableGraphLearner::nodeNbObservation(), gum::IVisitableGraphLearner::nodeSon(), gum::IVisitableGraphLearner::nodeVar(), and SOA_ALLOCATE.

Referenced by __makeRMaxFunctionGraphs(), and TreeInstance().

309  {
310  std::pair< NodeId, NodeId > rep;
311  if (visited->isTerminal(currentNodeId)) {
312  rep.first = rmax->manager()->addTerminalNode(
313  visited->nodeNbObservation(currentNodeId) < __rThreshold ? __rmax : 0.0);
314  rep.second = boolQ->manager()->addTerminalNode(
315  visited->nodeNbObservation(currentNodeId) < __rThreshold ? 0.0 : 1.0);
316  return rep;
317  }
318 
319  NodeId* rmaxsons = static_cast< NodeId* >(SOA_ALLOCATE(
320  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
321  NodeId* bqsons = static_cast< NodeId* >(SOA_ALLOCATE(
322  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
323 
324  for (Idx moda = 0; moda < visited->nodeVar(currentNodeId)->domainSize();
325  ++moda) {
326  std::pair< NodeId, NodeId > sonp = __visitLearner(
327  visited, visited->nodeSon(currentNodeId, moda), rmax, boolQ);
328  rmaxsons[moda] = sonp.first;
329  bqsons[moda] = sonp.second;
330  }
331 
332  rep.first =
333  rmax->manager()->addInternalNode(visited->nodeVar(currentNodeId), rmaxsons);
334  rep.second =
335  boolQ->manager()->addInternalNode(visited->nodeVar(currentNodeId), bqsons);
336  return rep;
337  }
std::pair< NodeId, NodeId > __visitLearner(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
NodeId addInternalNode(const DiscreteVariable *var)
Inserts a new non terminal node in graph.
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
#define SOA_ALLOCATE(x)
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _addReward()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_addReward ( MultiDimFunctionGraph< double > *  function,
Idx  actionId = 0 
)
protectedvirtualinherited

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 405 of file structuredPlaner_tpl.h.

References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndMultiplyByScalar(), and RECAST.

Referenced by _evalPolicy(), and _valueIteration().

406  {
407  // *****************************************************************************************
408  // ... we multiply the result by the discount factor, ...
411  newVFunction->copyAndMultiplyByScalar(*Vold, this->_discountFactor);
412  delete Vold;
413 
414  // *****************************************************************************************
415  // ... and finally add reward
416  newVFunction = _operator->add(newVFunction, RECAST(_fmdp->reward(actionId)));
417 
418  return newVFunction;
419  }
void copyAndMultiplyByScalar(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, GUM_SCALAR gamma)
Copies src diagrams and multiply every value by the given scalar.
#define RECAST(x)
For shorter line and hence more comprehensive code purposes only.
double _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:321
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

◆ _argmaximiseQactions()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::_argmaximiseQactions ( std::vector< MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * > &  qActionsSet)
protectedvirtualinherited

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 537 of file structuredPlaner_tpl.h.

Referenced by _evalPolicy().

540  {
541  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
542  newVFunction = qActionsSet.back();
543  qActionsSet.pop_back();
544 
545  while (!qActionsSet.empty()) {
546  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
547  qAction = qActionsSet.back();
548  qActionsSet.pop_back();
549  newVFunction = _operator->argmaximize(newVFunction, qAction);
550  }
551 
552  return newVFunction;
553  }
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximize(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f1, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *f2, Idx del=3)=0

◆ _evalPolicy()

void gum::AdaptiveRMaxPlaner::_evalPolicy ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 191 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_argmaximiseQactions(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_extractOptimalPolicy(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_makeArgMax(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().

Referenced by TreeInstance().

191  {
192  // *****************************************************************************************
193  // Loop reset
194  MultiDimFunctionGraph< double >* newVFunction =
196  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
197 
198  std::vector<
199  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* >
200  argMaxQActionsSet;
201  // *****************************************************************************************
202  // For each action
203  for (auto actionIter = _fmdp->beginActions();
204  actionIter != _fmdp->endActions();
205  ++actionIter) {
207  this->_evalQaction(newVFunction, *actionIter);
208 
209  qAction = this->_addReward(qAction, *actionIter);
210 
211  qAction = this->_operator->maximize(
212  __actionsRMaxTable[*actionIter],
213  this->_operator->multiply(qAction, __actionsBoolTable[*actionIter], 1),
214  2);
215 
216  argMaxQActionsSet.push_back(_makeArgMax(qAction, *actionIter));
217  }
218  delete newVFunction;
219 
220  // *****************************************************************************************
221  // Next to evaluate main value function, we take maximise over all action
222  // value, ...
223  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >*
224  argMaxVFunction = _argmaximiseQactions(argMaxQActionsSet);
225 
226  // *****************************************************************************************
227  // Next to evaluate main value function, we take maximise over all action
228  // value, ...
229  _extractOptimalPolicy(argMaxVFunction);
230  }
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions(std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *> &)
Performs argmax_a Q(s,a)
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:114
void _extractOptimalPolicy(const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
virtual MultiDimFunctionGraph< double > * _evalQaction(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _makeArgMax(const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< double > * _addReward(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _evalQaction()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_evalQaction ( const MultiDimFunctionGraph< double > *  Vold,
Idx  actionId 
)
protectedvirtualinherited

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 350 of file structuredPlaner_tpl.h.

Referenced by _evalPolicy(), and _valueIteration().

351  {
352  // ******************************************************************************
353  // Initialisation :
354  // Creating a copy of last Vfunction to deduce from the new Qaction
355  // And finding the first var to eleminate (the one at the end)
356 
357  return _operator->regress(Vold, actionId, this->_fmdp, this->_elVarSeq);
358  }
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
Set< const DiscreteVariable * > _elVarSeq
A Set to eleminate primed variables.
virtual MultiDimFunctionGraph< GUM_SCALAR > * regress(const MultiDimFunctionGraph< GUM_SCALAR > *Vold, Idx actionId, const FMDP< GUM_SCALAR > *fmdp, const Set< const DiscreteVariable * > &elVarSeq)=0
Performs a multiplication/projection on given qAction.

◆ _extractOptimalPolicy()

void gum::StructuredPlaner< double >::_extractOptimalPolicy ( const MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protectedinherited

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 561 of file structuredPlaner_tpl.h.

Referenced by _evalPolicy().

564  {
565  _optimalPolicy->clear();
566 
567  // Insertion des nouvelles variables
568  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
569  argMaxOptimalValueFunction->variablesSequence().beginSafe();
570  varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
571  ++varIter)
572  _optimalPolicy->add(**varIter);
573 
575  _optimalPolicy->manager()->setRootNode(__recurExtractOptPol(
576  argMaxOptimalValueFunction->root(), argMaxOptimalValueFunction, src2dest));
577 
578  delete argMaxOptimalValueFunction;
579  }
NodeId __recurExtractOptPol(NodeId, const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ _initVFunction()

void gum::AdaptiveRMaxPlaner::_initVFunction ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 130 of file adaptiveRMaxPlaner.cpp.

References gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::IOperatorStrategy< GUM_SCALAR >::add(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), RECASTED, gum::FMDP< GUM_SCALAR >::reward(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by TreeInstance().

130  {
133  for (auto actionIter = _fmdp->beginActions();
134  actionIter != _fmdp->endActions();
135  ++actionIter)
136  _vFunction = this->_operator->add(
137  _vFunction, RECASTED(this->_fmdp->reward(*actionIter)), 1);
138  }
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:321
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
#define RECASTED(x)
For shorter line and hence more comprehensive code purposes only.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _makeArgMax()

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::_makeArgMax ( const MultiDimFunctionGraph< double > *  Qaction,
Idx  actionId 
)
protectedinherited

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 479 of file structuredPlaner_tpl.h.

References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::root(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode(), and gum::MultiDimImplementation< GUM_SCALAR >::variablesSequence().

Referenced by _evalPolicy().

480  {
481  MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >*
483 
484  // Insertion des nouvelles variables
485  for (SequenceIteratorSafe< const DiscreteVariable* > varIter =
486  qAction->variablesSequence().beginSafe();
487  varIter != qAction->variablesSequence().endSafe();
488  ++varIter)
489  amcpy->add(**varIter);
490 
492  amcpy->manager()->setRootNode(
493  __recurArgMaxCopy(qAction->root(), actionId, qAction, amcpy, src2dest));
494 
495  delete qAction;
496  return amcpy;
497  }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * getArgMaxFunctionInstance()=0
IOperatorStrategy< double > * _operator
NodeId __recurArgMaxCopy(NodeId, Idx, const MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.

◆ _maximiseQactions()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_maximiseQactions ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Definition at line 366 of file structuredPlaner_tpl.h.

Referenced by __makeRMaxFunctionGraphs(), and _valueIteration().

367  {
368  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
369  qActionsSet.pop_back();
370 
371  while (!qActionsSet.empty()) {
372  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
373  qActionsSet.pop_back();
374  newVFunction = _operator->maximize(newVFunction, qAction);
375  }
376 
377  return newVFunction;
378  }
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0

◆ _minimiseFunctions()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::_minimiseFunctions ( std::vector< MultiDimFunctionGraph< double > * > &  qActionsSet)
protectedvirtualinherited

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 386 of file structuredPlaner_tpl.h.

Referenced by __makeRMaxFunctionGraphs().

387  {
388  MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
389  qActionsSet.pop_back();
390 
391  while (!qActionsSet.empty()) {
392  MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
393  qActionsSet.pop_back();
394  newVFunction = _operator->minimize(newVFunction, qAction);
395  }
396 
397  return newVFunction;
398  }
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0

◆ _valueIteration()

MultiDimFunctionGraph< double > * gum::AdaptiveRMaxPlaner::_valueIteration ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 143 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().

Referenced by TreeInstance().

143  {
144  // *****************************************************************************************
145  // Loop reset
146  MultiDimFunctionGraph< double >* newVFunction =
148  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
149 
150  // *****************************************************************************************
151  // For each action
152  std::vector< MultiDimFunctionGraph< double >* > qActionsSet;
153  for (auto actionIter = _fmdp->beginActions();
154  actionIter != _fmdp->endActions();
155  ++actionIter) {
157  _evalQaction(newVFunction, *actionIter);
158 
159  // *******************************************************************************************
160  // Next, we add the reward
161  qAction = _addReward(qAction, *actionIter);
162 
163  qAction = this->_operator->maximize(
164  __actionsRMaxTable[*actionIter],
165  this->_operator->multiply(qAction, __actionsBoolTable[*actionIter], 1),
166  2);
167 
168  qActionsSet.push_back(qAction);
169  }
170  delete newVFunction;
171 
172  // *****************************************************************************************
173  // Next to evaluate main value function, we take maximise over all action
174  // value, ...
175  newVFunction = _maximiseQactions(qActionsSet);
176 
177  return newVFunction;
178  }
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsBoolTable
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable *> & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:114
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
HashTable< Idx, MultiDimFunctionGraph< double > *> __actionsRMaxTable
virtual MultiDimFunctionGraph< double > * _evalQaction(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
virtual MultiDimFunctionGraph< double > * _addReward(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< double > * _maximiseQactions(std::vector< MultiDimFunctionGraph< double > *> &)
Performs max_a Q(s,a)
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ checkState()

void gum::AdaptiveRMaxPlaner::checkState ( const Instantiation newState,
Idx  actionId 
)
inlinevirtual

Implements gum::IDecisionStrategy.

Definition at line 198 of file adaptiveRMaxPlaner.h.

References __counterTable, and __initializedTable.

198  {
199  if (!__initializedTable[actionId]) {
200  __counterTable[actionId]->reset(newState);
201  __initializedTable[actionId] = true;
202  } else
203  __counterTable[actionId]->incState(newState);
204  }
HashTable< Idx, StatesCounter *> __counterTable
HashTable< Idx, bool > __initializedTable

◆ fmdp()

INLINE const FMDP< double >* gum::StructuredPlaner< double >::fmdp ( )
inlineinherited

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 134 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_fmdp.

Referenced by __clearTables(), __makeRMaxFunctionGraphs(), and TreeInstance().

134 { return _fmdp; }
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...

◆ initialize()

void gum::AdaptiveRMaxPlaner::initialize ( const FMDP< double > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Reimplemented from gum::IDecisionStrategy.

Definition at line 94 of file adaptiveRMaxPlaner.cpp.

References __counterTable, __initialized, __initializedTable, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IDecisionStrategy::initialize(), gum::StructuredPlaner< GUM_SCALAR >::initialize(), and gum::HashTable< Key, Val, Alloc >::insert().

Referenced by TreeInstance().

94  {
95  if (!__initialized) {
98  for (auto actionIter = fmdp->beginActions();
99  actionIter != fmdp->endActions();
100  ++actionIter) {
101  __counterTable.insert(*actionIter, new StatesCounter());
102  __initializedTable.insert(*actionIter, false);
103  }
104  __initialized = true;
105  }
106  }
HashTable< Idx, StatesCounter *> __counterTable
virtual void initialize(const FMDP< double > *fmdp)
Initializes the learner.
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
HashTable< Idx, bool > __initializedTable
virtual void initialize(const FMDP< GUM_SCALAR > *fmdp)
Initializes data structure needed for making the planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ makePlanning()

void gum::AdaptiveRMaxPlaner::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 111 of file adaptiveRMaxPlaner.cpp.

References __clearTables(), __makeRMaxFunctionGraphs(), and gum::StructuredPlaner< GUM_SCALAR >::makePlanning().

Referenced by TreeInstance().

111  {
113 
115 
116  __clearTables();
117  }
virtual void makePlanning(Idx nbStep=1000000)
Performs a value iteration.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ optimalPolicy()

INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::optimalPolicy ( )
inlinevirtualinherited

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< double >.

Definition at line 154 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy.

154  {
155  return _optimalPolicy;
156  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.

◆ optimalPolicy2String()

std::string gum::StructuredPlaner< double >::optimalPolicy2String ( )
virtualinherited

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< double >.

Definition at line 102 of file structuredPlaner_tpl.h.

References gum::ActionSet::beginSafe(), gum::HashTable< Key, Val, Alloc >::beginSafe(), gum::Link< T >::element(), gum::ActionSet::endSafe(), gum::HashTable< Key, Val, Alloc >::endSafe(), gum::Set< Key, Alloc >::exists(), gum::HashTable< Key, Val, Alloc >::exists(), gum::HashTable< Key, Val, Alloc >::insert(), gum::HashTable< Key, Val, Alloc >::key(), gum::DiscreteVariable::label(), gum::Variable::name(), gum::InternalNode::nbSons(), gum::Link< T >::nextLink(), gum::InternalNode::nodeVar(), and gum::InternalNode::son().

102  {
103  // ************************************************************************
104  // Discarding the case where no \pi* have been computed
105  if (!_optimalPolicy || _optimalPolicy->root() == 0)
106  return "NO OPTIMAL POLICY CALCULATED YET";
107 
108  // ************************************************************************
109  // Initialisation
110 
111  // Declaration of the needed string stream
112  std::stringstream output;
113  std::stringstream terminalStream;
114  std::stringstream nonTerminalStream;
115  std::stringstream arcstream;
116 
117  // First line for the toDot
118  output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
119 
120  // Form line for the internal node stream en the terminal node stream
121  terminalStream << "node [shape = box];" << std::endl;
122  nonTerminalStream << "node [shape = ellipse];" << std::endl;
123 
124  // For somme clarity in the final string
125  std::string tab = "\t";
126 
127  // To know if we already checked a node or not
128  Set< NodeId > visited;
129 
130  // FIFO of nodes to visit
131  std::queue< NodeId > fifo;
132 
133  // Loading the FIFO
134  fifo.push(_optimalPolicy->root());
135  visited << _optimalPolicy->root();
136 
137 
138  // ************************************************************************
139  // Main loop
140  while (!fifo.empty()) {
141  // Node to visit
142  NodeId currentNodeId = fifo.front();
143  fifo.pop();
144 
145  // Checking if it is terminal
146  if (_optimalPolicy->isTerminalNode(currentNodeId)) {
147  // Get back the associated ActionSet
148  ActionSet ase = _optimalPolicy->nodeValue(currentNodeId);
149 
150  // Creating a line for this node
151  terminalStream << tab << currentNodeId << ";" << tab << currentNodeId
152  << " [label=\"" << currentNodeId << " - ";
153 
154  // Enumerating and adding to the line the associated optimal actions
155  for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe();
156  valIter != ase.endSafe();
157  ++valIter)
158  terminalStream << _fmdp->actionName(*valIter) << " ";
159 
160  // Terminating line
161  terminalStream << "\"];" << std::endl;
162  continue;
163  }
164 
165  // Either wise
166  {
167  // Geting back the associated internal node
168  const InternalNode* currentNode = _optimalPolicy->node(currentNodeId);
169 
170  // Creating a line in internalnode stream for this node
171  nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId
172  << " [label=\"" << currentNodeId << " - "
173  << currentNode->nodeVar()->name() << "\"];" << std::endl;
174 
175  // Going through the sons and agregating them according the the sons Ids
176  HashTable< NodeId, LinkedList< Idx >* > sonMap;
177  for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
178  if (!visited.exists(currentNode->son(sonIter))) {
179  fifo.push(currentNode->son(sonIter));
180  visited << currentNode->son(sonIter);
181  }
182  if (!sonMap.exists(currentNode->son(sonIter)))
183  sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
184  sonMap[currentNode->son(sonIter)]->addLink(sonIter);
185  }
186 
187  // Adding to the arc stram
188  for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe();
189  ++sonIter) {
190  arcstream << tab << currentNodeId << " -> " << sonIter.key()
191  << " [label=\" ";
192  Link< Idx >* modaIter = sonIter.val()->list();
193  while (modaIter) {
194  arcstream << currentNode->nodeVar()->label(modaIter->element());
195  if (modaIter->nextLink()) arcstream << ", ";
196  modaIter = modaIter->nextLink();
197  }
198  arcstream << "\",color=\"#00ff00\"];" << std::endl;
199  delete sonIter.val();
200  }
201  }
202  }
203 
204  // Terminating
205  output << terminalStream.str() << std::endl
206  << nonTerminalStream.str() << std::endl
207  << arcstream.str() << std::endl
208  << "}" << std::endl;
209 
210  return output.str();
211  }
const std::string & actionName(Idx actionId) const
Returns name of action given in parameter.
Definition: fmdp_tpl.h:344
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
bool exists(const Key &k) const
Indicates whether a given elements belong to the set.
Definition: set_tpl.h:604
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
Size NodeId
Type for node ids.
Definition: graphElements.h:97
void insert(const Key &k)
Inserts a new element into the set.
Definition: set_tpl.h:610

◆ optimalPolicySize()

virtual Size gum::StructuredPlaner< double >::optimalPolicySize ( )
inlinevirtualinherited

◆ ReducedAndOrderedInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 62 of file adaptiveRMaxPlaner.h.

References AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxMDDInstance().

65  {
66  return new AdaptiveRMaxPlaner(new MDDOperatorStrategy< double >(),
67  discountFactor,
68  epsilon,
69  learner,
70  verbose);
71  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ setOptimalStrategy()

void gum::IDecisionStrategy::setOptimalStrategy ( const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *  optPol)
inlineinherited

Definition at line 87 of file IDecisionStrategy.h.

References gum::IDecisionStrategy::_optPol.

Referenced by gum::SDYNA::makePlanning().

88  {
89  _optPol =
90  const_cast< MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* >(
91  optPol);
92  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
+ Here is the caller graph for this function:

◆ spumddInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::spumddInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 77 of file structuredPlaner.h.

79  {
80  return new StructuredPlaner< GUM_SCALAR >(
81  new MDDOperatorStrategy< GUM_SCALAR >(),
82  discountFactor,
83  epsilon,
84  verbose);
85  }

◆ stateOptimalPolicy()

virtual ActionSet gum::IDecisionStrategy::stateOptimalPolicy ( const Instantiation curState)
inlinevirtualinherited

Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.

Definition at line 94 of file IDecisionStrategy.h.

References gum::IDecisionStrategy::_allActions, and gum::IDecisionStrategy::_optPol.

Referenced by gum::E_GreedyDecider::stateOptimalPolicy(), and gum::SDYNA::takeAction().

94  {
95  return (_optPol && _optPol->realSize() != 0) ? _optPol->get(curState)
96  : _allActions;
97  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
+ Here is the caller graph for this function:

◆ sviInstance()

static StructuredPlaner< double >* gum::StructuredPlaner< double >::sviInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 91 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner(), and gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner().

93  {
94  return new StructuredPlaner< GUM_SCALAR >(
95  new TreeOperatorStrategy< GUM_SCALAR >(),
96  discountFactor,
97  epsilon,
98  verbose);
99  }

◆ TreeInstance()

static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::TreeInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 76 of file adaptiveRMaxPlaner.h.

References __clearTables(), __makeRMaxFunctionGraphs(), __visitLearner(), _evalPolicy(), _initVFunction(), _valueIteration(), AdaptiveRMaxPlaner(), gum::StructuredPlaner< double >::fmdp(), initialize(), makePlanning(), and ~AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxTreeInstance().

79  {
80  return new AdaptiveRMaxPlaner(new TreeOperatorStrategy< double >(),
81  discountFactor,
82  epsilon,
83  learner,
84  verbose);
85  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ vFunction()

INLINE const MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction ( )
inlineinherited

Returns a const ptr on the value function computed so far.

Definition at line 139 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_vFunction.

139  {
140  return _vFunction;
141  }
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.

◆ vFunctionSize()

virtual Size gum::StructuredPlaner< double >::vFunctionSize ( )
inlinevirtualinherited

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 146 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_vFunction, and gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::realSize().

146  {
147  return _vFunction != nullptr ? _vFunction->realSize() : 0;
148  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.

Member Data Documentation

◆ __actionsBoolTable

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::__actionsBoolTable
private

◆ __actionsRMaxTable

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::__actionsRMaxTable
private

◆ __counterTable

HashTable< Idx, StatesCounter* > gum::AdaptiveRMaxPlaner::__counterTable
private

◆ __fmdpLearner

const ILearningStrategy* gum::AdaptiveRMaxPlaner::__fmdpLearner
private

Definition at line 187 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs().

◆ __initialized

bool gum::AdaptiveRMaxPlaner::__initialized
private

Definition at line 210 of file adaptiveRMaxPlaner.h.

Referenced by initialize().

◆ __initializedTable

HashTable< Idx, bool > gum::AdaptiveRMaxPlaner::__initializedTable
private

Definition at line 208 of file adaptiveRMaxPlaner.h.

Referenced by checkState(), and initialize().

◆ __rmax

double gum::AdaptiveRMaxPlaner::__rmax
private

Definition at line 190 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().

◆ __rThreshold

double gum::AdaptiveRMaxPlaner::__rThreshold
private

Definition at line 189 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().

◆ _allActions

◆ _discountFactor

double gum::StructuredPlaner< double >::_discountFactor
protectedinherited

Discount Factor used for infinite horizon planning.

Definition at line 360 of file structuredPlaner.h.

Referenced by __makeRMaxFunctionGraphs().

◆ _elVarSeq

Set< const DiscreteVariable* > gum::StructuredPlaner< double >::_elVarSeq
protectedinherited

A Set to eleminate primed variables.

Definition at line 355 of file structuredPlaner.h.

◆ _fmdp

const FMDP< double >* gum::StructuredPlaner< double >::_fmdp
protectedinherited

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 335 of file structuredPlaner.h.

Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().

◆ _operator

IOperatorStrategy< double >* gum::StructuredPlaner< double >::_operator
protectedinherited

◆ _optimalPolicy

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the _fmdp is required. optimalPolicy2String do this job.

Definition at line 350 of file structuredPlaner.h.

◆ _optPol

◆ _verbose

bool gum::StructuredPlaner< double >::_verbose
protectedinherited

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 368 of file structuredPlaner.h.

◆ _vFunction

MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_vFunction
protectedinherited

The Value Function computed iteratively.

Definition at line 340 of file structuredPlaner.h.

Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().


The documentation for this class was generated from the following files: