The general SDyna architecture abstract class. More...

#include <agrum/FMDP/SDyna/sdyna.h>

Collaboration diagram for gum::SDYNA:

Public Member Functions
std::string	toString ()
	Returns. More...

std::string	optimalPolicy2String ()

Problem specification methods
void	addAction (const Idx actionId, const std::string &actionName)
	Inserts a new action in the SDyna instance. More...

void	addVariable (const DiscreteVariable *var)
	Inserts a new variable in the SDyna instance. More...

Initialization
void	initialize ()
	Initializes the Sdyna instance. More...

void	initialize (const Instantiation &initialState)
	Initializes the Sdyna instance at given state. More...

Incremental methods
void	setCurrentState (const Instantiation &currentState)
	Sets last state visited to the given state. More...

Idx	takeAction (const Instantiation &curState)

Idx	takeAction ()

void	feedback (const Instantiation &originalState, const Instantiation &reachedState, Idx performedAction, double obtainedReward)
	Performs a feedback on the last transition. More...

void	feedback (const Instantiation &reachedState, double obtainedReward)
	Performs a feedback on the last transition. More...

void	makePlanning (Idx nbStep)
	Starts a new planning. More...

Size methods
just to get the size of the different data structure for performance evaluation purposes only
Size	learnerSize ()
	learnerSize More...

Size	modelSize ()
	modelSize More...

Size	valueFunctionSize ()
	valueFunctionSize More...

Size	optimalPolicySize ()
	optimalPolicySize More...

Static Public Member Functions

static SDYNA *	spitiInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	spimddiInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RMaxMDDInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RMaxTreeInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RandomMDDInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

static SDYNA *	RandomTreeInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@ More...

Protected Attributes
FMDP< double > *	fmdp_
	The learnt Markovian Decision Process. More...

Instantiation	lastState_
	The state in which the system is before we perform a new action. More...

Constructor & destructor.
	SDYNA (ILearningStrategy learner, IPlanningStrategy< double > planer, IDecisionStrategy *decider, Idx observationPhaseLenght, Idx nbValueIterationStep, bool actionReward, bool verbose=true)
	Constructor. More...

	~SDYNA ()
	Destructor. More...

Detailed Description

The general SDyna architecture abstract class.

Instance of SDyna architecture should inherit

Definition at line 66 of file sdyna.h.

Constructor & Destructor Documentation

◆ SDYNA()

gum::SDYNA::SDYNA	(	ILearningStrategy *	learner,
		IPlanningStrategy< double > *	planer,
		IDecisionStrategy *	decider,
		Idx	observationPhaseLenght,
		Idx	nbValueIterationStep,
		bool	actionReward,
		bool	verbose = `true`
	)

private

Constructor.

Returns: an instance of SDyna architecture

Definition at line 57 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                      :
       learner__(learner),
       planer__(planer), decider__(decider),
       observationPhaseLenght__(observationPhaseLenght),
       nbValueIterationStep__(nbValueIterationStep), actionReward__(actionReward),
       verbose_(verbose) {
     GUM_CONSTRUCTOR(SDYNA);
 
     fmdp_ = new FMDP< double >();
 
     nbObservation__ = 1;
   }

Here is the call graph for this function:

◆ ~SDYNA()

gum::SDYNA::~SDYNA ( )

Destructor.

Definition at line 79 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                 {
     delete decider__;
 
     delete learner__;
 
     delete planer__;
 
     for (auto obsIter = bin__.beginSafe(); obsIter != bin__.endSafe(); ++obsIter)
       delete *obsIter;
 
     delete fmdp_;
 
     GUM_DESTRUCTOR(SDYNA);
   }

Here is the call graph for this function:

Member Function Documentation

◆ addAction()

void gum::SDYNA::addAction	(	const Idx	actionId,
		const std::string &	actionName
	)

inline

Inserts a new action in the SDyna instance.

Warning: Without effect until method initialize is called

Parameters

actionId	: an id to identify the action
actionName	: its human name

Definition at line 269 of file sdyna.h.

                                                                     {
       fmdp_->addAction(actionId, actionName);
     }

◆ addVariable()

void gum::SDYNA::addVariable ( const DiscreteVariable * var )

inline

Inserts a new variable in the SDyna instance.

Warning: Without effect until method initialize is called

Parameters

var	: the var to be added. Note that variable may or may not have all its modalities given. If not they will be discovered by the SDyna architecture during the process

Definition at line 283 of file sdyna.h.

283 { fmdp_->addVariable(var); }

gum::SDYNA::fmdp_

FMDP< double > * fmdp_

The learnt Markovian Decision Process.

Definition: sdyna.h:443

gum::FMDP::addVariable

void addVariable(const DiscreteVariable *var)

Adds a variable to FMDP description.

Definition: fmdp_tpl.h:124

◆ feedback() [1/2]

void gum::SDYNA::feedback	(	const Instantiation &	originalState,
		const Instantiation &	reachedState,
		Idx	performedAction,
		double	obtainedReward
	)

Performs a feedback on the last transition.

Incremental methods.

In extenso, learn from the transition.

Parameters

originalState	: the state we were in before the transition
reachedState	: the state we reached after
performedAction	: the action we performed
obtainedReward	: the reward we obtained

Definition at line 130 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                     {
     lastAction__ = lastAction;
     lastState_   = prevState;
     feedback(curState, reward);
   }

Here is the call graph for this function:

◆ feedback() [2/2]

void gum::SDYNA::feedback	(	const Instantiation &	reachedState,
		double	obtainedReward
	)

Performs a feedback on the last transition.

In extenso, learn from the transition.

Parameters

reachedState	: the state reached after the transition
obtainedReward	: the reward obtained during the transition

Warning: Uses the originalState__ and performedAction__ stored in cache If you want to specify the original state and the performed action, see below

Definition at line 150 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                                    {
     Observation* obs = new Observation();
 
     for (auto varIter = lastState_.variablesSequence().beginSafe();
          varIter != lastState_.variablesSequence().endSafe();
          ++varIter)
       obs->setModality(*varIter, lastState_.val(**varIter));
 
     for (auto varIter = newState.variablesSequence().beginSafe();
          varIter != newState.variablesSequence().endSafe();
          ++varIter) {
       obs->setModality(fmdp_->main2prime(*varIter), newState.val(**varIter));
 
       if (this->actionReward__)
         obs->setRModality(*varIter, lastState_.val(**varIter));
       else
         obs->setRModality(*varIter, newState.val(**varIter));
     }
 
     obs->setReward(reward);
 
     learner__->addObservation(lastAction__, obs);
     bin__.insert(obs);
 
     setCurrentState(newState);
     decider__->checkState(lastState_, lastAction__);
 
     if (nbObservation__ % observationPhaseLenght__ == 0)
       makePlanning(nbValueIterationStep__);
 
     nbObservation__++;
   }

Here is the call graph for this function:

◆ initialize() [1/2]

void gum::SDYNA::initialize ( )

Initializes the Sdyna instance.

Definition at line 98 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                          {
     learner__->initialize(fmdp_);
     planer__->initialize(fmdp_);
     decider__->initialize(fmdp_);
   }

Here is the call graph for this function:

◆ initialize() [2/2]

void gum::SDYNA::initialize ( const Instantiation & initialState )

Initializes the Sdyna instance at given state.

Parameters

initialState : the state of the studied system from which we will begin the explore, learn and exploit process

Definition at line 111 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                           {
     initialize();
     setCurrentState(initialState);
   }

Here is the call graph for this function:

◆ learnerSize()

Size gum::SDYNA::learnerSize ( )

inline

learnerSize

Returns

Definition at line 412 of file sdyna.h.

412 { return learner__->size(); }

gum::ILearningStrategy::size

virtual Size size()=0

learnerSize

gum::SDYNA::learner__

ILearningStrategy * learner__

The learner used to learn the FMDP.

Definition: sdyna.h:450

◆ makePlanning()

void gum::SDYNA::makePlanning ( Idx nbStep )

Starts a new planning.

Parameters

nbStep : the maximal number of value iteration performed in this planning

Definition at line 190 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                    {
     if (verbose_) std::cout << "Updating decision trees ..." << std::endl;
     learner__->updateFMDP();
     // std::cout << << "Done" << std::endl;
 
     if (verbose_) std::cout << "Planning ..." << std::endl;
     planer__->makePlanning(nbValueIterationStep);
     // std::cout << << "Done" << std::endl;
 
     decider__->setOptimalStrategy(planer__->optimalPolicy());
   }

Here is the call graph for this function:

◆ modelSize()

Size gum::SDYNA::modelSize ( )

inline

modelSize

Returns

Definition at line 420 of file sdyna.h.

420 { return fmdp_->size(); }

gum::SDYNA::fmdp_

FMDP< double > * fmdp_

The learnt Markovian Decision Process.

Definition: sdyna.h:443

gum::FMDP::size

Size size() const

Returns the map binding main variables and prime variables.

Definition: fmdp_tpl.h:394

◆ optimalPolicy2String()

std::string gum::SDYNA::optimalPolicy2String ( )

inline

Definition at line 396 of file sdyna.h.

396 { return planer__->optimalPolicy2String(); }

gum::IPlanningStrategy::optimalPolicy2String

virtual std::string optimalPolicy2String()=0

Returns a string describing the optimal policy in a dot format.

gum::SDYNA::planer__

IPlanningStrategy< double > * planer__

The planer used to plan an optimal strategy.

Definition: sdyna.h:453

◆ optimalPolicySize()

Size gum::SDYNA::optimalPolicySize ( )

inline

optimalPolicySize

Returns

Definition at line 436 of file sdyna.h.

436 { return planer__->optimalPolicySize(); }

gum::IPlanningStrategy::optimalPolicySize

virtual Size optimalPolicySize()=0

Returns optimalPolicy computed so far current size.

gum::SDYNA::planer__

IPlanningStrategy< double > * planer__

The planer used to plan an optimal strategy.

Definition: sdyna.h:453

◆ RandomMDDInstance()

static SDYNA* gum::SDYNA::RandomMDDInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	similarityThreshold = `0.3`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 178 of file sdyna.h.

                                                                              {
       bool               actionReward = true;
       ILearningStrategy* ls = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(
          attributeSelectionThreshold,
          actionReward,
          similarityThreshold);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::spumddInstance(discountFactor, epsilon);
       IDecisionStrategy* ds = new RandomDecider();
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward);
     }

◆ RandomTreeInstance()

static SDYNA* gum::SDYNA::RandomTreeInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 203 of file sdyna.h.

                                                                               {
       bool               actionReward = true;
       ILearningStrategy* ls = new FMDPLearner< CHI2TEST, CHI2TEST, ITILEARNER >(
          attributeSelectionThreshold,
          actionReward);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::sviInstance(discountFactor, epsilon);
       IDecisionStrategy* ds = new RandomDecider();
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward);
     }

◆ RMaxMDDInstance()

static SDYNA* gum::SDYNA::RMaxMDDInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	similarityThreshold = `0.3`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 126 of file sdyna.h.

                                                                            {
       bool               actionReward = true;
       ILearningStrategy* ls = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(
          attributeSelectionThreshold,
          actionReward,
          similarityThreshold);
       AdaptiveRMaxPlaner* rm
          = AdaptiveRMaxPlaner::ReducedAndOrderedInstance(ls,
                                                          discountFactor,
                                                          epsilon);
       IPlanningStrategy< double >* ps = rm;
       IDecisionStrategy*           ds = rm;
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward);
     }

◆ RMaxTreeInstance()

static SDYNA* gum::SDYNA::RMaxTreeInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 154 of file sdyna.h.

                                                                             {
       bool               actionReward = true;
       ILearningStrategy* ls
          = new FMDPLearner< GTEST, GTEST, ITILEARNER >(attributeSelectionThreshold,
                                                        actionReward);
       AdaptiveRMaxPlaner* rm
          = AdaptiveRMaxPlaner::TreeInstance(ls, discountFactor, epsilon);
       IPlanningStrategy< double >* ps = rm;
       IDecisionStrategy*           ds = rm;
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward);
     }

◆ setCurrentState()

void gum::SDYNA::setCurrentState ( const Instantiation & currentState )

inline

Sets last state visited to the given state.

During the learning process, we will consider that were in this state before the transition.

Parameters

currentState : the state

Definition at line 325 of file sdyna.h.

                                                             {
       lastState_ = currentState;
     }

◆ spimddiInstance()

static SDYNA* gum::SDYNA::spimddiInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	similarityThreshold = `0.3`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 98 of file sdyna.h.

                                                                            {
       bool               actionReward = false;
       ILearningStrategy* ls = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(
          attributeSelectionThreshold,
          actionReward,
          similarityThreshold);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::spumddInstance(discountFactor,
                                                       epsilon,
                                                       false);
       IDecisionStrategy* ds = new E_GreedyDecider();
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward,
                        false);
     }

◆ spitiInstance()

static SDYNA* gum::SDYNA::spitiInstance	(	double	attributeSelectionThreshold = `0.99`,
		double	discountFactor = `0.9`,
		double	epsilon = `1`,
		Idx	observationPhaseLenght = `100`,
		Idx	nbValueIterationStep = `10`
	)

inlinestatic

@

Definition at line 75 of file sdyna.h.

                                                                          {
       bool               actionReward = false;
       ILearningStrategy* ls = new FMDPLearner< CHI2TEST, CHI2TEST, ITILEARNER >(
          attributeSelectionThreshold,
          actionReward);
       IPlanningStrategy< double >* ps
          = StructuredPlaner< double >::sviInstance(discountFactor, epsilon);
       IDecisionStrategy* ds = new E_GreedyDecider();
       return new SDYNA(ls,
                        ps,
                        ds,
                        observationPhaseLenght,
                        nbValueIterationStep,
                        actionReward);
     }

◆ takeAction() [1/2]

Idx gum::SDYNA::takeAction ( const Instantiation & curState )

Returns: actionId the id of the action the SDyna instance wish to be performed

Parameters

curState the state in which we currently are

Definition at line 208 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                                                      {
     lastState_ = curState;
     return takeAction();
   }

Here is the call graph for this function:

◆ takeAction() [2/2]

Idx gum::SDYNA::takeAction ( )

Returns: the id of the action the SDyna instance wish to be performed

Definition at line 218 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                         {
     ActionSet actionSet = decider__->stateOptimalPolicy(lastState_);
     if (actionSet.size() == 1) {
       lastAction__ = actionSet[0];
     } else {
       Idx randy = (Idx)((double)std::rand() / (double)RAND_MAX * actionSet.size());
       lastAction__ = actionSet[randy == actionSet.size() ? 0 : randy];
     }
     return lastAction__;
   }

Here is the call graph for this function:

◆ toString()

std::string gum::SDYNA::toString ( )

Returns.

Returns: a string describing the learned FMDP, and the associated optimal policy. Both in DOT language.

Definition at line 232 of file sdyna.cpp.

References gum::Set< Key, Alloc >::emplace().

                             {
     std::stringstream description;
 
     description << fmdp_->toString() << std::endl;
     description << planer__->optimalPolicy2String() << std::endl;
 
     return description.str();
   }

Here is the call graph for this function:

◆ valueFunctionSize()

Size gum::SDYNA::valueFunctionSize ( )

inline

valueFunctionSize

Returns

Definition at line 428 of file sdyna.h.

428 { return planer__->vFunctionSize(); }

gum::SDYNA::planer__

IPlanningStrategy< double > * planer__

The planer used to plan an optimal strategy.

Definition: sdyna.h:453

gum::IPlanningStrategy::vFunctionSize

virtual Size vFunctionSize()=0

Returns vFunction computed so far current size.

Member Data Documentation

◆ actionReward__

bool gum::SDYNA::actionReward__

private

Definition at line 474 of file sdyna.h.

◆ bin__

Set< Observation* > gum::SDYNA::bin__

private

Since SDYNA made these observation, it has to delete them on quitting.

Definition at line 472 of file sdyna.h.

◆ decider__

IDecisionStrategy* gum::SDYNA::decider__

private

The decider.

Definition at line 456 of file sdyna.h.

◆ fmdp_

FMDP< double >* gum::SDYNA::fmdp_

protected

The learnt Markovian Decision Process.

Definition at line 443 of file sdyna.h.

◆ lastAction__

Idx gum::SDYNA::lastAction__

private

The last performed action.

Definition at line 469 of file sdyna.h.

◆ lastState_

Instantiation gum::SDYNA::lastState_

protected

The state in which the system is before we perform a new action.

Definition at line 446 of file sdyna.h.

◆ learner__

ILearningStrategy* gum::SDYNA::learner__

private

The learner used to learn the FMDP.

Definition at line 450 of file sdyna.h.

◆ nbObservation__

Idx gum::SDYNA::nbObservation__

private

The total number of observation made so far.

Definition at line 463 of file sdyna.h.

◆ nbValueIterationStep__

Idx gum::SDYNA::nbValueIterationStep__

private

The number of Value Iteration step we perform.

Definition at line 466 of file sdyna.h.

◆ observationPhaseLenght__

Idx gum::SDYNA::observationPhaseLenght__

private

The number of observation we make before using again the planer.

Definition at line 460 of file sdyna.h.

◆ planer__

IPlanningStrategy< double >* gum::SDYNA::planer__

private

The planer used to plan an optimal strategy.

Definition at line 453 of file sdyna.h.

◆ verbose_

bool gum::SDYNA::verbose_

private

Definition at line 476 of file sdyna.h.

The documentation for this class was generated from the following files:

agrum/FMDP/SDyna/sdyna.h
agrum/FMDP/SDyna/sdyna.cpp

Public Member Functions

Static Public Member Functions

Protected Attributes

Constructor & destructor.

Detailed Description

Constructor & Destructor Documentation

◆ SDYNA()

◆ ~SDYNA()

Member Function Documentation

◆ addAction()

◆ addVariable()

◆ feedback() [1/2]

◆ feedback() [2/2]

◆ initialize() [1/2]

◆ initialize() [2/2]

◆ learnerSize()

◆ makePlanning()

◆ modelSize()

◆ optimalPolicy2String()

◆ optimalPolicySize()

◆ RandomMDDInstance()

◆ RandomTreeInstance()

◆ RMaxMDDInstance()

◆ RMaxTreeInstance()

◆ setCurrentState()

◆ spimddiInstance()

◆ spitiInstance()

◆ takeAction() [1/2]

◆ takeAction() [2/2]

◆ toString()

◆ valueFunctionSize()

Member Data Documentation

◆ actionReward__

◆ bin__

◆ decider__

◆ fmdp_

◆ lastAction__

◆ lastState_

◆ learner__

◆ nbObservation__

◆ nbValueIterationStep__

◆ observationPhaseLenght__

◆ planer__

◆ verbose_