The general SDyna architecture abstract class. More...

#include <agrum/FMDP/SDyna/sdyna.h>

Collaboration diagram for gum::SDYNA:

Public Member Functions
std::string	toString ()
	Returns. More...

std::string	optimalPolicy2String ()

Problem specification methods
void	addAction (const Idx actionId, const std::string &actionName)
	Inserts a new action in the SDyna instance. More...

void	addVariable (const DiscreteVariable *var)
	Inserts a new variable in the SDyna instance. More...

Initialization
void	initialize ()
	Initializes the Sdyna instance. More...

void	initialize (const Instantiation &initialState)
	Initializes the Sdyna instance at given state. More...

Incremental methods
void	setCurrentState (const Instantiation &currentState)
	Sets last state visited to the given state. More...

Idx	takeAction (const Instantiation &curState)

Idx	takeAction ()

void	feedback (const Instantiation &originalState, const Instantiation &reachedState, Idx performedAction, double obtainedReward)
	Performs a feedback on the last transition. More...

void	feedback (const Instantiation &reachedState, double obtainedReward)
	Performs a feedback on the last transition. More...

void	makePlanning (Idx nbStep)
	Starts a new planning. More...

Size methods
just to get the size of the different data structure for performance evaluation purposes only
Size	learnerSize ()
	learnerSize More...

Size	modelSize ()
	modelSize More...

Size	valueFunctionSize ()
	valueFunctionSize More...

Size	optimalPolicySize ()
	optimalPolicySize More...

Static Public Member Functions

static SDYNA *	spitiInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	spimddiInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RMaxMDDInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RMaxTreeInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RandomMDDInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RandomTreeInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

Protected Attributes
FMDP< double > *	fmdp_
	The learnt Markovian Decision Process. More...

Instantiation	lastState_
	The state in which the system is before we perform a new action. More...

Constructor & destructor.
	SDYNA (ILearningStrategy learner, IPlanningStrategy< double > planer, IDecisionStrategy *decider, Idx observationPhaseLenght, Idx nbValueIterationStep, bool actionReward, bool verbose=true)
	Constructor. More...

	~SDYNA ()
	Destructor. More...

Detailed Description

The general SDyna architecture abstract class.

Instance of SDyna architecture should inherit

Definition at line 66 of file sdyna.h.

Constructor & Destructor Documentation

◆ SDYNA()

gum::SDYNA::SDYNA	(	ILearningStrategy *	learner,
		IPlanningStrategy< double > *	planer,
		IDecisionStrategy *	decider,
		Idx	observationPhaseLenght,
		Idx	nbValueIterationStep,
		bool	actionReward,
		bool	verbose = `true`
	)

private

Constructor.

Returns: an instance of SDyna architecture

Definition at line 57 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                      :
       _learner_(learner),
       _planer_(planer), _decider_(decider), _observationPhaseLenght_(observationPhaseLenght),
       _nbValueIterationStep_(nbValueIterationStep), _actionReward_(actionReward),
       verbose_(verbose) {
     GUM_CONSTRUCTOR(SDYNA);
 
     fmdp_ = new FMDP< double >();
 
     _nbObservation_ = 1;
   }

Here is the call graph for this function:

◆ ~SDYNA()

gum::SDYNA::~SDYNA ( )

Destructor.

Definition at line 78 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                 {
     delete _decider_;
 
     delete _learner_;
 
     delete _planer_;
 
     for (auto obsIter = _bin_.beginSafe(); obsIter != _bin_.endSafe(); ++obsIter)
       delete *obsIter;
 
     delete fmdp_;
 
     GUM_DESTRUCTOR(SDYNA);
   }

Here is the call graph for this function:

Member Function Documentation

◆ addAction()

void gum::SDYNA::addAction	(	const Idx	actionId,
		const std::string &	actionName
	)

inline

Inserts a new action in the SDyna instance.

Warning: Without effect until method initialize is called

Parameters

actionId	: an id to identify the action
actionName	: its human name

Definition at line 238 of file sdyna.h.

                                                                     {
       fmdp_->addAction(actionId, actionName);
     }

◆ addVariable()

void gum::SDYNA::addVariable ( const DiscreteVariable * var )

inline

Inserts a new variable in the SDyna instance.

Warning: Without effect until method initialize is called

Parameters

var	: the var to be added. Note that variable may or may not have all its modalities given. If not they will be discovered by the SDyna architecture during the process

Definition at line 252 of file sdyna.h.

252 { fmdp_->addVariable(var); }

gum::SDYNA::fmdp_

FMDP< double > * fmdp_

The learnt Markovian Decision Process.

Definition: sdyna.h:410

gum::FMDP::addVariable

void addVariable(const DiscreteVariable *var)

Adds a variable to FMDP description.

Definition: fmdp_tpl.h:116

◆ feedback() [1/2]

void gum::SDYNA::feedback	(	const Instantiation &	originalState,
		const Instantiation &	reachedState,
		Idx	performedAction,
		double	obtainedReward
	)

Performs a feedback on the last transition.

Incremental methods.

In extenso, learn from the transition.

Parameters

originalState	: the state we were in before the transition
reachedState	: the state we reached after
performedAction	: the action we performed
obtainedReward	: the reward we obtained

Definition at line 129 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                     {
     _lastAction_ = lastAction;
     lastState_   = prevState;
     feedback(curState, reward);
   }

Here is the call graph for this function:

◆ feedback() [2/2]

void gum::SDYNA::feedback	(	const Instantiation &	reachedState,
		double	obtainedReward
	)

Performs a feedback on the last transition.

In extenso, learn from the transition.

Parameters

reachedState	: the state reached after the transition
obtainedReward	: the reward obtained during the transition

Warning: Uses the originalState and performedAction stored in cache If you want to specify the original state and the performed action, see below

Definition at line 149 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                                    {
     Observation* obs = new Observation();
 
     for (auto varIter = lastState_.variablesSequence().beginSafe();
          varIter != lastState_.variablesSequence().endSafe();
          ++varIter)
       obs->setModality(*varIter, lastState_.val(**varIter));
 
     for (auto varIter = newState.variablesSequence().beginSafe();
          varIter != newState.variablesSequence().endSafe();
          ++varIter) {
       obs->setModality(fmdp_->main2prime(*varIter), newState.val(**varIter));
 
       if (this->_actionReward_)
         obs->setRModality(*varIter, lastState_.val(**varIter));
       else
         obs->setRModality(*varIter, newState.val(**varIter));
     }
 
     obs->setReward(reward);
 
     _learner_->addObservation(_lastAction_, obs);
     _bin_.insert(obs);
 
     setCurrentState(newState);
     _decider_->checkState(lastState_, _lastAction_);
 
     if (_nbObservation_ % _observationPhaseLenght_ == 0) makePlanning(_nbValueIterationStep_);
 
     _nbObservation_++;
   }

Here is the call graph for this function:

◆ initialize() [1/2]

void gum::SDYNA::initialize ( )

Initializes the Sdyna instance.

Definition at line 97 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                          {
     _learner_->initialize(fmdp_);
     _planer_->initialize(fmdp_);
     _decider_->initialize(fmdp_);
   }

Here is the call graph for this function:

◆ initialize() [2/2]

void gum::SDYNA::initialize ( const Instantiation & initialState )

Initializes the Sdyna instance at given state.

Parameters

initialState : the state of the studied system from which we will begin the explore, learn and exploit process

Definition at line 110 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                           {
     initialize();
     setCurrentState(initialState);
   }

Here is the call graph for this function:

◆ learnerSize()

Size gum::SDYNA::learnerSize ( )

inline

learnerSize

Returns

Definition at line 379 of file sdyna.h.

379 { return _learner_->size(); }

gum::SDYNA::_learner_

ILearningStrategy * _learner_

The learner used to learn the FMDP.

Definition: sdyna.h:417

gum::ILearningStrategy::size

virtual Size size()=0

learnerSize

◆ makePlanning()

void gum::SDYNA::makePlanning ( Idx nbStep )

Starts a new planning.

Parameters

nbStep : the maximal number of value iteration performed in this planning

Definition at line 188 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                    {
     if (verbose_) std::cout << "Updating decision trees ..." << std::endl;
     _learner_->updateFMDP();
     // std::cout << << "Done" << std::endl;
 
     if (verbose_) std::cout << "Planning ..." << std::endl;
     _planer_->makePlanning(nbValueIterationStep);
     // std::cout << << "Done" << std::endl;
 
     _decider_->setOptimalStrategy(_planer_->optimalPolicy());
   }

Here is the call graph for this function:

◆ modelSize()

Size gum::SDYNA::modelSize ( )

inline

modelSize

Returns

Definition at line 387 of file sdyna.h.

387 { return fmdp_->size(); }

gum::SDYNA::fmdp_

FMDP< double > * fmdp_

The learnt Markovian Decision Process.

Definition: sdyna.h:410

gum::FMDP::size

Size size() const

Returns the map binding main variables and prime variables.

Definition: fmdp_tpl.h:356

◆ optimalPolicy2String()

std::string gum::SDYNA::optimalPolicy2String ( )

inline

Definition at line 363 of file sdyna.h.

363 { return _planer_->optimalPolicy2String(); }

gum::IPlanningStrategy::optimalPolicy2String

virtual std::string optimalPolicy2String()=0

Returns a string describing the optimal policy in a dot format.

gum::SDYNA::_planer_

IPlanningStrategy< double > * _planer_

The planer used to plan an optimal strategy.

Definition: sdyna.h:420

◆ optimalPolicySize()

Size gum::SDYNA::optimalPolicySize ( )

inline

optimalPolicySize

Returns

Definition at line 403 of file sdyna.h.

403 { return _planer_->optimalPolicySize(); }

gum::IPlanningStrategy::optimalPolicySize

virtual Size optimalPolicySize()=0

Returns optimalPolicy computed so far current size.

gum::SDYNA::_planer_

IPlanningStrategy< double > * _planer_

The planer used to plan an optimal strategy.

Definition: sdyna.h:420

◆ RandomMDDInstance()

static SDYNA* gum::SDYNA::RandomMDDInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	similarityThreshold = `0.3`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 157 of file sdyna.h.

                                                                              {
       bool               actionReward = true;
       ILearningStrategy* ls
          = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(attributeSelectionThreshold,
                                                          actionReward,
                                                          similarityThreshold);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::spumddInstance(discountFactor, epsilon);
       IDecisionStrategy* ds = new RandomDecider();
       return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
     }

◆ RandomTreeInstance()

static SDYNA* gum::SDYNA::RandomTreeInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 177 of file sdyna.h.

                                                                               {
       bool               actionReward = true;
       ILearningStrategy* ls
          = new FMDPLearner< CHI2TEST, CHI2TEST, ITILEARNER >(attributeSelectionThreshold,
                                                              actionReward);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::sviInstance(discountFactor, epsilon);
       IDecisionStrategy* ds = new RandomDecider();
       return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
     }

◆ RMaxMDDInstance()

static SDYNA* gum::SDYNA::RMaxMDDInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	similarityThreshold = `0.3`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 119 of file sdyna.h.

                                                                            {
       bool               actionReward = true;
       ILearningStrategy* ls
          = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(attributeSelectionThreshold,
                                                          actionReward,
                                                          similarityThreshold);
       AdaptiveRMaxPlaner* rm
          = AdaptiveRMaxPlaner::ReducedAndOrderedInstance(ls, discountFactor, epsilon);
       IPlanningStrategy< double >* ps = rm;
       IDecisionStrategy*           ds = rm;
       return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
     }

◆ RMaxTreeInstance()

static SDYNA* gum::SDYNA::RMaxTreeInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 140 of file sdyna.h.

                                                                             {
       bool               actionReward = true;
       ILearningStrategy* ls
          = new FMDPLearner< GTEST, GTEST, ITILEARNER >(attributeSelectionThreshold, actionReward);
       AdaptiveRMaxPlaner* rm = AdaptiveRMaxPlaner::TreeInstance(ls, discountFactor, epsilon);
       IPlanningStrategy< double >* ps = rm;
       IDecisionStrategy*           ds = rm;
       return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
     }

◆ setCurrentState()

void gum::SDYNA::setCurrentState ( const Instantiation & currentState )

inline

Sets last state visited to the given state.

During the learning process, we will consider that were in this state before the transition.

Parameters

currentState : the state

Definition at line 294 of file sdyna.h.

294 { lastState_ = currentState; }

gum::SDYNA::lastState_

Instantiation lastState_

The state in which the system is before we perform a new action.

Definition: sdyna.h:413

◆ spimddiInstance()

static SDYNA* gum::SDYNA::spimddiInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	similarityThreshold = `0.3`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 93 of file sdyna.h.

                                                                            {
       bool               actionReward = false;
       ILearningStrategy* ls
          = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(attributeSelectionThreshold,
                                                          actionReward,
                                                          similarityThreshold);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::spumddInstance(discountFactor, epsilon, false);
       IDecisionStrategy* ds = new E_GreedyDecider();
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward,
                        false);
     }

◆ spitiInstance()

static SDYNA* gum::SDYNA::spitiInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 75 of file sdyna.h.

                                                                          {
       bool               actionReward = false;
       ILearningStrategy* ls
          = new FMDPLearner< CHI2TEST, CHI2TEST, ITILEARNER >(attributeSelectionThreshold,
                                                              actionReward);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::sviInstance(discountFactor, epsilon);
       IDecisionStrategy* ds = new E_GreedyDecider();
       return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
     }

◆ takeAction() [1/2]

Idx gum::SDYNA::takeAction ( const Instantiation & curState )

Returns: actionId the id of the action the SDyna instance wish to be performed

Parameters

curState the state in which we currently are

Definition at line 206 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                      {
     lastState_ = curState;
     return takeAction();
   }

Here is the call graph for this function:

◆ takeAction() [2/2]

Idx gum::SDYNA::takeAction ( )

Returns: the id of the action the SDyna instance wish to be performed

Definition at line 216 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                         {
     ActionSet actionSet = _decider_->stateOptimalPolicy(lastState_);
     if (actionSet.size() == 1) {
       _lastAction_ = actionSet[0];
     } else {
       Idx randy    = (Idx)((double)std::rand() / (double)RAND_MAX * actionSet.size());
       _lastAction_ = actionSet[randy == actionSet.size() ? 0 : randy];
     }
     return _lastAction_;
   }

Here is the call graph for this function:

◆ toString()

std::string gum::SDYNA::toString ( )

Returns.

Returns: a string describing the learned FMDP, and the associated optimal policy. Both in DOT language.

Definition at line 230 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                             {
     std::stringstream description;
 
     description << fmdp_->toString() << std::endl;
     description << _planer_->optimalPolicy2String() << std::endl;
 
     return description.str();
   }

Here is the call graph for this function:

◆ valueFunctionSize()

Size gum::SDYNA::valueFunctionSize ( )

inline

valueFunctionSize

Returns

Definition at line 395 of file sdyna.h.

395 { return _planer_->vFunctionSize(); }

gum::SDYNA::_planer_

IPlanningStrategy< double > * _planer_

The planer used to plan an optimal strategy.

Definition: sdyna.h:420

gum::IPlanningStrategy::vFunctionSize

virtual Size vFunctionSize()=0

Returns vFunction computed so far current size.

Member Data Documentation

◆ _actionReward_

bool gum::SDYNA::_actionReward_

private

Definition at line 441 of file sdyna.h.

◆ _bin_

Set< Observation* > gum::SDYNA::_bin_

private

Since SDYNA made these observation, it has to delete them on quitting.

Definition at line 439 of file sdyna.h.

◆ _decider_

IDecisionStrategy* gum::SDYNA::_decider_

private

The decider.

Definition at line 423 of file sdyna.h.

◆ _lastAction_

Idx gum::SDYNA::_lastAction_

private

The last performed action.

Definition at line 436 of file sdyna.h.

◆ _learner_

ILearningStrategy* gum::SDYNA::_learner_

private

The learner used to learn the FMDP.

Definition at line 417 of file sdyna.h.

◆ _nbObservation_

Idx gum::SDYNA::_nbObservation_

private

The total number of observation made so far.

Definition at line 430 of file sdyna.h.

◆ _nbValueIterationStep_

Idx gum::SDYNA::_nbValueIterationStep_

private

The number of Value Iteration step we perform.

Definition at line 433 of file sdyna.h.

◆ _observationPhaseLenght_

Idx gum::SDYNA::_observationPhaseLenght_

private

The number of observation we make before using again the planer.

Definition at line 427 of file sdyna.h.

◆ _planer_

IPlanningStrategy< double >* gum::SDYNA::_planer_

private

The planer used to plan an optimal strategy.

Definition at line 420 of file sdyna.h.

◆ fmdp_

FMDP< double >* gum::SDYNA::fmdp_

protected

The learnt Markovian Decision Process.

Definition at line 410 of file sdyna.h.

◆ lastState_

Instantiation gum::SDYNA::lastState_

protected

The state in which the system is before we perform a new action.

Definition at line 413 of file sdyna.h.

◆ verbose_

bool gum::SDYNA::verbose_

private

Definition at line 443 of file sdyna.h.

The documentation for this class was generated from the following files:

agrum/FMDP/SDyna/sdyna.h
agrum/FMDP/SDyna/sdyna.cpp

Public Member Functions

Static Public Member Functions

Protected Attributes

Constructor & destructor.

Detailed Description

Constructor & Destructor Documentation

◆ SDYNA()

◆ ~SDYNA()

Member Function Documentation

◆ addAction()

◆ addVariable()

◆ feedback() [1/2]

◆ feedback() [2/2]

◆ initialize() [1/2]

◆ initialize() [2/2]

◆ learnerSize()

◆ makePlanning()

◆ modelSize()

◆ optimalPolicy2String()

◆ optimalPolicySize()

◆ RandomMDDInstance()

◆ RandomTreeInstance()

◆ RMaxMDDInstance()

◆ RMaxTreeInstance()

◆ setCurrentState()

◆ spimddiInstance()

◆ spitiInstance()

◆ takeAction() [1/2]

◆ takeAction() [2/2]

◆ toString()

◆ valueFunctionSize()

Member Data Documentation

◆ _actionReward_

◆ _bin_

◆ _decider_

◆ _lastAction_

◆ _learner_

◆ _nbObservation_

◆ _nbValueIterationStep_

◆ _observationPhaseLenght_

◆ _planer_

◆ fmdp_

◆ lastState_

◆ verbose_