aGrUM  0.13.2
gum::AdaptiveRMaxPlaner Class Reference

<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...

#include <adaptiveRMaxPlaner.h>

+ Inheritance diagram for gum::AdaptiveRMaxPlaner:
+ Collaboration diagram for gum::AdaptiveRMaxPlaner:

Public Member Functions

Planning Methods
void initialize (const FMDP< double > *fmdp)
 Initializes data structure needed for making the planning. More...
 
void makePlanning (Idx nbStep=1000000)
 Performs a value iteration. More...
 
Datastructure access methods
INLINE const FMDP< double > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning. More...
 
INLINE const MultiDimFunctionGraph< double > * vFunction ()
 Returns a const ptr on the value function computed so far. More...
 
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size. More...
 
INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far. More...
 
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size. More...
 
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. More...
 

Static Public Member Functions

static AdaptiveRMaxPlanerReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static AdaptiveRMaxPlanerTreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * spumddInstance (doublediscountFactor=0.9, doubleepsilon=0.00001, bool verbose=true)
 
static StructuredPlaner< double > * sviInstance (doublediscountFactor=0.9, doubleepsilon=0.00001, bool verbose=true)
 

Protected Attributes

const FMDP< double > * _fmdp
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ) More...
 
MultiDimFunctionGraph< double > * _vFunction
 The Value Function computed iteratively. More...
 
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
 The associated optimal policy. More...
 
Set< const DiscreteVariable * > _elVarSeq
 A Set to eleminate primed variables. More...
 
double _discountFactor
 Discount Factor used for infinite horizon planning. More...
 
IOperatorStrategy< double > * _operator
 
bool _verbose
 Boolean used to indcates whether or not iteration informations should be displayed on terminal. More...
 

Protected Member Functions

Value Iteration Methods
virtual void _initVFunction ()
 Performs a single step of value iteration. More...
 
virtual MultiDimFunctionGraph< double > * _valueIteration ()
 Performs a single step of value iteration. More...
 
Optimal policy extraction methods
virtual void _evalPolicy ()
 Perform the required tasks to extract an optimal policy. More...
 
Value Iteration Methods
virtual MultiDimFunctionGraph< double > * _evalQaction (const MultiDimFunctionGraph< double > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. More...
 
virtual MultiDimFunctionGraph< double > * _maximiseQactions (std::vector< MultiDimFunctionGraph< double > * > &)
 Performs max_a Q(s,a) More...
 
virtual MultiDimFunctionGraph< double > * _minimiseFunctions (std::vector< MultiDimFunctionGraph< double > * > &)
 Performs min_i F_i. More...
 
virtual MultiDimFunctionGraph< double > * _addReward (MultiDimFunctionGraph< double > *function, Idx actionId=0)
 Perform the R(s) + gamma . function. More...
 
Optimal policy extraction methods
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _makeArgMax (const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax. More...
 
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * > &)
 Performs argmax_a Q(s,a) More...
 
void _extractOptimalPolicy (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. More...
 

Constructor & destructor.

 AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
 Default constructor. More...
 
 ~AdaptiveRMaxPlaner ()
 Default destructor. More...
 

Incremental methods

HashTable< Idx, StatesCounter * > __counterTable
 
HashTable< Idx, bool__initializedTable
 
bool __initialized
 
void checkState (const Instantiation &newState, Idx actionId)
 

Incremental methods

void setOptimalStrategy (const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol)
 
virtual ActionSet stateOptimalPolicy (const Instantiation &curState)
 
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol
 
ActionSet _allActions
 

Detailed Description

<agrum/FMDP/planning/adaptiveRMaxPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a RMax planning on given in parameter factored markov decision process

Definition at line 50 of file adaptiveRMaxPlaner.h.

Constructor & Destructor Documentation

gum::AdaptiveRMaxPlaner::AdaptiveRMaxPlaner ( IOperatorStrategy< double > *  opi,
double  discountFactor,
double  epsilon,
const ILearningStrategy learner,
bool  verbose 
)
private

Default constructor.

Definition at line 60 of file adaptiveRMaxPlaner.cpp.

Referenced by ReducedAndOrderedInstance(), and TreeInstance().

64  :
65  StructuredPlaner(opi, discountFactor, epsilon, verbose),
66  IDecisionStrategy(), __fmdpLearner(learner), __initialized(false) {
67  GUM_CONSTRUCTOR(AdaptiveRMaxPlaner);
68  }
const ILearningStrategy * __fmdpLearner
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.
StructuredPlaner(IOperatorStrategy< double > *opi, doublediscountFactor, doubleepsilon, bool verbose)
Default constructor.

+ Here is the caller graph for this function:

gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner ( )

Default destructor.

Definition at line 73 of file adaptiveRMaxPlaner.cpp.

References __counterTable.

Referenced by TreeInstance().

73  {
74  GUM_DESTRUCTOR(AdaptiveRMaxPlaner);
75 
76  for (HashTableIteratorSafe< Idx, StatesCounter* > scIter =
77  __counterTable.beginSafe();
78  scIter != __counterTable.endSafe();
79  ++scIter)
80  delete scIter.val();
81  }
HashTable< Idx, StatesCounter * > __counterTable
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

+ Here is the caller graph for this function:

Member Function Documentation

void gum::AdaptiveRMaxPlaner::__clearTables ( )
private

Definition at line 342 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::FMDP< GUM_SCALAR >::endActions(), and gum::StructuredPlaner< double >::fmdp().

Referenced by makePlanning(), and TreeInstance().

342  {
343  for (auto actionIter = this->fmdp()->beginActions();
344  actionIter != this->fmdp()->endActions();
345  ++actionIter) {
346  delete __actionsBoolTable[*actionIter];
347  delete __actionsRMaxTable[*actionIter];
348  }
349  __actionsRMaxTable.clear();
350  __actionsBoolTable.clear();
351  }
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsBoolTable
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsRMaxTable
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void gum::AdaptiveRMaxPlaner::__makeRMaxFunctionGraphs ( )
private

Definition at line 235 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, __counterTable, __fmdpLearner, __rmax, __rThreshold, __visitLearner(), gum::StructuredPlaner< double >::_discountFactor, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_minimiseFunctions(), gum::StructuredPlaner< double >::_operator, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::beginVariables(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::clean(), gum::FMDP< GUM_SCALAR >::endActions(), gum::FMDP< GUM_SCALAR >::endVariables(), gum::StructuredPlaner< double >::fmdp(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::IVisitableGraphLearner::insertSetOfVars(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::ILearningStrategy::modaMax(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::reduce(), gum::ILearningStrategy::rMax(), gum::IVisitableGraphLearner::root(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by makePlanning(), and TreeInstance().

235  {
236  __rThreshold =
237  __fmdpLearner->modaMax() * 5 > 30 ? __fmdpLearner->modaMax() * 5 : 30;
238  __rmax = __fmdpLearner->rMax() / (1.0 - this->_discountFactor);
239 
240  for (auto actionIter = this->fmdp()->beginActions();
241  actionIter != this->fmdp()->endActions();
242  ++actionIter) {
243  std::vector< MultiDimFunctionGraph< double >* > rmaxs;
244  std::vector< MultiDimFunctionGraph< double >* > boolQs;
245 
246  for (auto varIter = this->fmdp()->beginVariables();
247  varIter != this->fmdp()->endVariables();
248  ++varIter) {
249  const IVisitableGraphLearner* visited = __counterTable[*actionIter];
250 
255 
256  visited->insertSetOfVars(varRMax);
257  visited->insertSetOfVars(varBoolQ);
258 
259  std::pair< NodeId, NodeId > rooty =
260  __visitLearner(visited, visited->root(), varRMax, varBoolQ);
261  varRMax->manager()->setRootNode(rooty.first);
262  varRMax->manager()->reduce();
263  varRMax->manager()->clean();
264  varBoolQ->manager()->setRootNode(rooty.second);
265  varBoolQ->manager()->reduce();
266  varBoolQ->manager()->clean();
267 
268  rmaxs.push_back(varRMax);
269  boolQs.push_back(varBoolQ);
270 
271  // std::cout << RECASTED(this->_fmdp->transition(*actionIter,
272  // *varIter))->toDot() << std::endl;
273  // for( auto varIter2 =
274  // RECASTED(this->_fmdp->transition(*actionIter,
275  // *varIter))->variablesSequence().beginSafe(); varIter2 !=
276  // RECASTED(this->_fmdp->transition(*actionIter,
277  // *varIter))->variablesSequence().endSafe(); ++varIter2 )
278  // std::cout << (*varIter2)->name() << " | ";
279  // std::cout << std::endl;
280 
281  // std::cout << varRMax->toDot() << std::endl;
282  // for( auto varIter =
283  // varRMax->variablesSequence().beginSafe(); varIter !=
284  // varRMax->variablesSequence().endSafe(); ++varIter )
285  // std::cout << (*varIter)->name() << " | ";
286  // std::cout << std::endl;
287 
288  // std::cout << varBoolQ->toDot() << std::endl;
289  // for( auto varIter =
290  // varBoolQ->variablesSequence().beginSafe(); varIter !=
291  // varBoolQ->variablesSequence().endSafe(); ++varIter )
292  // std::cout << (*varIter)->name() << " | ";
293  // std::cout << std::endl;
294  }
295 
296  // std::cout << "Maximising" << std::endl;
297  __actionsRMaxTable.insert(*actionIter, this->_maximiseQactions(rmaxs));
298  __actionsBoolTable.insert(*actionIter, this->_minimiseFunctions(boolQs));
299  }
300  }
HashTable< Idx, StatesCounter * > __counterTable
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsBoolTable
void clean()
Removes var without nodes in the diagram.
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
double _discountFactor
Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > * _operator
std::pair< NodeId, NodeId > __visitLearner(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
const ILearningStrategy * __fmdpLearner
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsRMaxTable
virtual double modaMax() const =0
learnerSize
SequenceIteratorSafe< const DiscreteVariable * > beginVariables() const
Returns an iterator reference to he beginning of the list of variables.
Definition: fmdp.h:92
virtual double rMax() const =0
learnerSize
INLINE const FMDP< double > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we&#39;re planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
SequenceIteratorSafe< const DiscreteVariable * > endVariables() const
Returns an iterator reference to the end of the list of variables.
Definition: fmdp.h:99
virtual void reduce()=0
Ensures that every isomorphic subgraphs are merged together.
virtual MultiDimFunctionGraph< double > * _minimiseFunctions(std::vector< MultiDimFunctionGraph< double > * > &)
Performs min_i F_i.
virtual MultiDimFunctionGraph< double > * _maximiseQactions(std::vector< MultiDimFunctionGraph< double > * > &)
Performs max_a Q(s,a)
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

std::pair< NodeId, NodeId > gum::AdaptiveRMaxPlaner::__visitLearner ( const IVisitableGraphLearner visited,
NodeId  currentNodeId,
MultiDimFunctionGraph< double > *  rmax,
MultiDimFunctionGraph< double > *  boolQ 
)
private

Definition at line 306 of file adaptiveRMaxPlaner.cpp.

References __rmax, __rThreshold, gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addInternalNode(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::DiscreteVariable::domainSize(), gum::IVisitableGraphLearner::isTerminal(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::IVisitableGraphLearner::nodeNbObservation(), gum::IVisitableGraphLearner::nodeSon(), gum::IVisitableGraphLearner::nodeVar(), and SOA_ALLOCATE.

Referenced by __makeRMaxFunctionGraphs(), and TreeInstance().

309  {
310  std::pair< NodeId, NodeId > rep;
311  if (visited->isTerminal(currentNodeId)) {
312  rep.first = rmax->manager()->addTerminalNode(
313  visited->nodeNbObservation(currentNodeId) < __rThreshold ? __rmax : 0.0);
314  rep.second = boolQ->manager()->addTerminalNode(
315  visited->nodeNbObservation(currentNodeId) < __rThreshold ? 0.0 : 1.0);
316  return rep;
317  }
318 
319  NodeId* rmaxsons = static_cast< NodeId* >(SOA_ALLOCATE(
320  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
321  NodeId* bqsons = static_cast< NodeId* >(SOA_ALLOCATE(
322  sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
323 
324  for (Idx moda = 0; moda < visited->nodeVar(currentNodeId)->domainSize();
325  ++moda) {
326  std::pair< NodeId, NodeId > sonp = __visitLearner(
327  visited, visited->nodeSon(currentNodeId, moda), rmax, boolQ);
328  rmaxsons[moda] = sonp.first;
329  bqsons[moda] = sonp.second;
330  }
331 
332  rep.first =
333  rmax->manager()->addInternalNode(visited->nodeVar(currentNodeId), rmaxsons);
334  rep.second =
335  boolQ->manager()->addInternalNode(visited->nodeVar(currentNodeId), bqsons);
336  return rep;
337  }
unsigned int NodeId
Type for node ids.
Definition: graphElements.h:97
std::pair< NodeId, NodeId > __visitLearner(const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *)
NodeId addInternalNode(const DiscreteVariable *var)
Inserts a new non terminal node in graph.
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
unsigned long Idx
Type for indexes.
Definition: types.h:43
#define SOA_ALLOCATE(x)

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

virtual MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_addReward ( MultiDimFunctionGraph< double > *  function,
Idx  actionId = 0 
)
protectedvirtualinherited

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Referenced by _evalPolicy(), and _valueIteration().

virtual MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::_argmaximiseQactions ( std::vector< MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > * > &  )
protectedvirtualinherited

Performs argmax_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Referenced by _evalPolicy().

void gum::AdaptiveRMaxPlaner::_evalPolicy ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 191 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_argmaximiseQactions(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_extractOptimalPolicy(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_makeArgMax(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().

Referenced by TreeInstance().

191  {
192  // *****************************************************************************************
193  // Loop reset
194  MultiDimFunctionGraph< double >* newVFunction =
196  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
197 
198  std::vector<
199  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* >
200  argMaxQActionsSet;
201  // *****************************************************************************************
202  // For each action
203  for (auto actionIter = _fmdp->beginActions();
204  actionIter != _fmdp->endActions();
205  ++actionIter) {
207  this->_evalQaction(newVFunction, *actionIter);
208 
209  qAction = this->_addReward(qAction, *actionIter);
210 
211  qAction = this->_operator->maximize(
212  __actionsRMaxTable[*actionIter],
213  this->_operator->multiply(qAction, __actionsBoolTable[*actionIter], 1),
214  2);
215 
216  argMaxQActionsSet.push_back(_makeArgMax(qAction, *actionIter));
217  }
218  delete newVFunction;
219 
220  // *****************************************************************************************
221  // Next to evaluate main value function, we take maximise over all action
222  // value, ...
223  MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >*
224  argMaxVFunction = _argmaximiseQactions(argMaxQActionsSet);
225 
226  // *****************************************************************************************
227  // Next to evaluate main value function, we take maximise over all action
228  // value, ...
229  _extractOptimalPolicy(argMaxVFunction);
230  }
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsBoolTable
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _argmaximiseQactions(std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * > &)
Performs argmax_a Q(s,a)
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
void _extractOptimalPolicy(const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable * > & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:114
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsRMaxTable
virtual MultiDimFunctionGraph< double > * _evalQaction(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * _makeArgMax(const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
virtual MultiDimFunctionGraph< double > * _addReward(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

virtual MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_evalQaction ( const MultiDimFunctionGraph< double > *  ,
Idx   
)
protectedvirtualinherited

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Referenced by _evalPolicy(), and _valueIteration().

void gum::StructuredPlaner< double >::_extractOptimalPolicy ( const MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy > *  optimalValueFunction)
protectedinherited

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Referenced by _evalPolicy().

void gum::AdaptiveRMaxPlaner::_initVFunction ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 130 of file adaptiveRMaxPlaner.cpp.

References gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::IOperatorStrategy< GUM_SCALAR >::add(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), RECASTED, gum::FMDP< GUM_SCALAR >::reward(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by TreeInstance().

130  {
133  for (auto actionIter = _fmdp->beginActions();
134  actionIter != _fmdp->endActions();
135  ++actionIter)
136  _vFunction = this->_operator->add(
137  _vFunction, RECASTED(this->_fmdp->reward(*actionIter)), 1);
138  }
void setRootNode(const NodeId &root)
Sets root node of decision diagram.
IOperatorStrategy< double > * _operator
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
virtual MultiDimFunctionGraph< GUM_SCALAR > * add(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=1)=0
const MultiDimImplementation< GUM_SCALAR > * reward(Idx actionId=0) const
Returns the reward table of mdp.
Definition: fmdp_tpl.h:319
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
NodeId addTerminalNode(const GUM_SCALAR &value)
Adds a value to the MultiDimFunctionGraph.
MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy > * manager()
Returns a const reference to the manager of this diagram.
#define RECASTED(x)
For shorter line and hence more comprehensive code purposes only.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

MultiDimFunctionGraph< ArgMaxSet< double , Idx >, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::_makeArgMax ( const MultiDimFunctionGraph< double > *  Qaction,
Idx  actionId 
)
protectedinherited

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Referenced by _evalPolicy().

virtual MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_maximiseQactions ( std::vector< MultiDimFunctionGraph< double > * > &  )
protectedvirtualinherited

Performs max_a Q(s,a)

Warning
Performs also the deallocation of the QActions

Referenced by __makeRMaxFunctionGraphs(), and _valueIteration().

virtual MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_minimiseFunctions ( std::vector< MultiDimFunctionGraph< double > * > &  )
protectedvirtualinherited

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Referenced by __makeRMaxFunctionGraphs().

MultiDimFunctionGraph< double > * gum::AdaptiveRMaxPlaner::_valueIteration ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 143 of file adaptiveRMaxPlaner.cpp.

References __actionsBoolTable, __actionsRMaxTable, gum::StructuredPlaner< double >::_addReward(), gum::StructuredPlaner< double >::_evalQaction(), gum::StructuredPlaner< double >::_fmdp, gum::StructuredPlaner< double >::_maximiseQactions(), gum::StructuredPlaner< double >::_operator, gum::StructuredPlaner< double >::_vFunction, gum::FMDP< GUM_SCALAR >::beginActions(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IOperatorStrategy< GUM_SCALAR >::getFunctionInstance(), gum::FMDP< GUM_SCALAR >::mapMainPrime(), gum::IOperatorStrategy< GUM_SCALAR >::maximize(), and gum::IOperatorStrategy< GUM_SCALAR >::multiply().

Referenced by TreeInstance().

143  {
144  // *****************************************************************************************
145  // Loop reset
146  MultiDimFunctionGraph< double >* newVFunction =
148  newVFunction->copyAndReassign(*_vFunction, _fmdp->mapMainPrime());
149 
150  // *****************************************************************************************
151  // For each action
152  std::vector< MultiDimFunctionGraph< double >* > qActionsSet;
153  for (auto actionIter = _fmdp->beginActions();
154  actionIter != _fmdp->endActions();
155  ++actionIter) {
157  _evalQaction(newVFunction, *actionIter);
158 
159  // *******************************************************************************************
160  // Next, we add the reward
161  qAction = _addReward(qAction, *actionIter);
162 
163  qAction = this->_operator->maximize(
164  __actionsRMaxTable[*actionIter],
165  this->_operator->multiply(qAction, __actionsBoolTable[*actionIter], 1),
166  2);
167 
168  qActionsSet.push_back(qAction);
169  }
170  delete newVFunction;
171 
172  // *****************************************************************************************
173  // Next to evaluate main value function, we take maximise over all action
174  // value, ...
175  newVFunction = _maximiseQactions(qActionsSet);
176 
177  return newVFunction;
178  }
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsBoolTable
void copyAndReassign(const MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy > &src, const Bijection< const DiscreteVariable *, const DiscreteVariable * > &reassign)
Copies src diagrams structure into this diagrams.
IOperatorStrategy< double > * _operator
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximize(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
virtual MultiDimFunctionGraph< GUM_SCALAR > * multiply(const MultiDimFunctionGraph< GUM_SCALAR > *f1, const MultiDimFunctionGraph< GUM_SCALAR > *f2, Idx del=3)=0
INLINE const Bijection< const DiscreteVariable *, const DiscreteVariable * > & mapMainPrime() const
Returns the map binding main variables and prime variables.
Definition: fmdp.h:114
HashTable< Idx, MultiDimFunctionGraph< double > * > __actionsRMaxTable
virtual MultiDimFunctionGraph< double > * _evalQaction(const MultiDimFunctionGraph< double > *, Idx)
Performs the P(s&#39;|s,a).V^{t-1}(s&#39;) part of the value itération.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
virtual MultiDimFunctionGraph< double > * _addReward(MultiDimFunctionGraph< double > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual MultiDimFunctionGraph< double > * _maximiseQactions(std::vector< MultiDimFunctionGraph< double > * > &)
Performs max_a Q(s,a)
virtual MultiDimFunctionGraph< GUM_SCALAR, ExactTerminalNodePolicy > * getFunctionInstance()=0

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void gum::AdaptiveRMaxPlaner::checkState ( const Instantiation newState,
Idx  actionId 
)
inlinevirtual

Implements gum::IDecisionStrategy.

Definition at line 198 of file adaptiveRMaxPlaner.h.

References __counterTable, and __initializedTable.

198  {
199  if (!__initializedTable[actionId]) {
200  __counterTable[actionId]->reset(newState);
201  __initializedTable[actionId] = true;
202  } else
203  __counterTable[actionId]->incState(newState);
204  }
HashTable< Idx, StatesCounter * > __counterTable
HashTable< Idx, bool > __initializedTable
INLINE const FMDP< double >* gum::StructuredPlaner< double >::fmdp ( )
inlineinherited

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 131 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_fmdp.

Referenced by __clearTables(), __makeRMaxFunctionGraphs(), and TreeInstance().

131 { return _fmdp; }
const FMDP< double > * _fmdp
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
void gum::AdaptiveRMaxPlaner::initialize ( const FMDP< double > *  fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Reimplemented from gum::IDecisionStrategy.

Definition at line 94 of file adaptiveRMaxPlaner.cpp.

References __counterTable, __initialized, __initializedTable, gum::FMDP< GUM_SCALAR >::beginActions(), gum::FMDP< GUM_SCALAR >::endActions(), gum::IDecisionStrategy::initialize(), gum::StructuredPlaner< GUM_SCALAR >::initialize(), and gum::HashTable< Key, Val, Alloc >::insert().

Referenced by TreeInstance().

94  {
95  if (!__initialized) {
98  for (auto actionIter = fmdp->beginActions();
99  actionIter != fmdp->endActions();
100  ++actionIter) {
101  __counterTable.insert(*actionIter, new StatesCounter());
102  __initializedTable.insert(*actionIter, false);
103  }
104  __initialized = true;
105  }
106  }
HashTable< Idx, StatesCounter * > __counterTable
virtual void initialize(const FMDP< double > *fmdp)
Initializes the learner.
HashTable< Idx, bool > __initializedTable
SequenceIteratorSafe< Idx > beginActions() const
Returns an iterator reference to he beginning of the list of actions.
Definition: fmdp.h:134
virtual void initialize(const FMDP< GUM_SCALAR > *fmdp)
Initializes data structure needed for making the planning.
SequenceIteratorSafe< Idx > endActions() const
Returns an iterator reference to the end of the list of actions.
Definition: fmdp.h:141
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void gum::AdaptiveRMaxPlaner::makePlanning ( Idx  nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 111 of file adaptiveRMaxPlaner.cpp.

References __clearTables(), __makeRMaxFunctionGraphs(), and gum::StructuredPlaner< GUM_SCALAR >::makePlanning().

Referenced by TreeInstance().

111  {
113 
115 
116  __clearTables();
117  }
virtual void makePlanning(Idx nbStep=1000000)
Performs a value iteration.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

INLINE const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::optimalPolicy ( )
inlinevirtualinherited

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< double >.

Definition at line 151 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_optimalPolicy.

151  {
152  return _optimalPolicy;
153  }
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optimalPolicy
The associated optimal policy.
std::string gum::StructuredPlaner< double >::optimalPolicy2String ( )
virtualinherited

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< double >.

virtual Size gum::StructuredPlaner< double >::optimalPolicySize ( )
inlinevirtualinherited
static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 62 of file adaptiveRMaxPlaner.h.

References AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxMDDInstance().

65  {
66  return new AdaptiveRMaxPlaner(new MDDOperatorStrategy< double >(),
67  discountFactor,
68  epsilon,
69  learner,
70  verbose);
71  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void gum::IDecisionStrategy::setOptimalStrategy ( const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *  optPol)
inlineinherited

Definition at line 87 of file IDecisionStrategy.h.

References gum::IDecisionStrategy::_optPol.

Referenced by gum::SDYNA::makePlanning().

88  {
89  _optPol =
90  const_cast< MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* >(
91  optPol);
92  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol

+ Here is the caller graph for this function:

static StructuredPlaner< double >* gum::StructuredPlaner< double >::spumddInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 77 of file structuredPlaner.h.

79  {
80  return new StructuredPlaner< GUM_SCALAR >(
81  new MDDOperatorStrategy< GUM_SCALAR >(), discountFactor, epsilon, verbose);
82  }
virtual ActionSet gum::IDecisionStrategy::stateOptimalPolicy ( const Instantiation curState)
inlinevirtualinherited

Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.

Definition at line 94 of file IDecisionStrategy.h.

References gum::IDecisionStrategy::_allActions, and gum::IDecisionStrategy::_optPol.

Referenced by gum::E_GreedyDecider::stateOptimalPolicy(), and gum::SDYNA::takeAction().

94  {
95  return (_optPol && _optPol->realSize() != 0) ? _optPol->get(curState)
96  : _allActions;
97  }
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * _optPol

+ Here is the caller graph for this function:

static StructuredPlaner< double >* gum::StructuredPlaner< double >::sviInstance ( double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestaticinherited

Definition at line 88 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner(), and gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner().

90  {
91  return new StructuredPlaner< GUM_SCALAR >(
92  new TreeOperatorStrategy< GUM_SCALAR >(),
93  discountFactor,
94  epsilon,
95  verbose);
96  }
static AdaptiveRMaxPlaner* gum::AdaptiveRMaxPlaner::TreeInstance ( const ILearningStrategy learner,
double  discountFactor = 0.9,
double  epsilon = 0.00001,
bool  verbose = true 
)
inlinestatic

Definition at line 76 of file adaptiveRMaxPlaner.h.

References __clearTables(), __makeRMaxFunctionGraphs(), __visitLearner(), _evalPolicy(), _initVFunction(), _valueIteration(), AdaptiveRMaxPlaner(), gum::StructuredPlaner< double >::fmdp(), initialize(), makePlanning(), and ~AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxTreeInstance().

79  {
80  return new AdaptiveRMaxPlaner(new TreeOperatorStrategy< double >(),
81  discountFactor,
82  epsilon,
83  learner,
84  verbose);
85  }
AdaptiveRMaxPlaner(IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose)
Default constructor.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

INLINE const MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction ( )
inlineinherited

Returns a const ptr on the value function computed so far.

Definition at line 136 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_vFunction.

136  {
137  return _vFunction;
138  }
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.
virtual Size gum::StructuredPlaner< double >::vFunctionSize ( )
inlinevirtualinherited

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 143 of file structuredPlaner.h.

References gum::StructuredPlaner< GUM_SCALAR >::_vFunction, and gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::realSize().

143  {
144  return _vFunction != nullptr ? _vFunction->realSize() : 0;
145  }
virtual Size realSize() const
Returns the real number of parameters used for this table.
MultiDimFunctionGraph< double > * _vFunction
The Value Function computed iteratively.

Member Data Documentation

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::__actionsBoolTable
private
HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::__actionsRMaxTable
private
HashTable< Idx, StatesCounter* > gum::AdaptiveRMaxPlaner::__counterTable
private
const ILearningStrategy* gum::AdaptiveRMaxPlaner::__fmdpLearner
private

Definition at line 187 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs().

bool gum::AdaptiveRMaxPlaner::__initialized
private

Definition at line 210 of file adaptiveRMaxPlaner.h.

Referenced by initialize().

HashTable< Idx, bool > gum::AdaptiveRMaxPlaner::__initializedTable
private

Definition at line 208 of file adaptiveRMaxPlaner.h.

Referenced by checkState(), and initialize().

double gum::AdaptiveRMaxPlaner::__rmax
private

Definition at line 190 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().

double gum::AdaptiveRMaxPlaner::__rThreshold
private

Definition at line 189 of file adaptiveRMaxPlaner.h.

Referenced by __makeRMaxFunctionGraphs(), and __visitLearner().

double gum::StructuredPlaner< double >::_discountFactor
protectedinherited

Discount Factor used for infinite horizon planning.

Definition at line 357 of file structuredPlaner.h.

Referenced by __makeRMaxFunctionGraphs().

Set< const DiscreteVariable* > gum::StructuredPlaner< double >::_elVarSeq
protectedinherited

A Set to eleminate primed variables.

Definition at line 352 of file structuredPlaner.h.

const FMDP< double >* gum::StructuredPlaner< double >::_fmdp
protectedinherited

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions )

Definition at line 332 of file structuredPlaner.h.

Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().

IOperatorStrategy< double >* gum::StructuredPlaner< double >::_operator
protectedinherited

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the _fmdp is required. optimalPolicy2String do this job.

Definition at line 347 of file structuredPlaner.h.

bool gum::StructuredPlaner< double >::_verbose
protectedinherited

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 365 of file structuredPlaner.h.

MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::_vFunction
protectedinherited

The Value Function computed iteratively.

Definition at line 337 of file structuredPlaner.h.

Referenced by _evalPolicy(), _initVFunction(), and _valueIteration().


The documentation for this class was generated from the following files: