aGrUM  0.20.3
a C++ library for (probabilistic) graphical models
gum::learning::DBRowGeneratorSet< ALLOC > Class Template Reference

The class used to pack sets of generators. More...

#include <agrum/tools/database/DBRowGeneratorSet.h>

Public Member Functions

Constructors / Destructors
 DBRowGeneratorSet (const allocator_type &alloc=allocator_type())
 default constructor More...
 
 DBRowGeneratorSet (const DBRowGeneratorSet< ALLOC > &from)
 copy constructor More...
 
 DBRowGeneratorSet (const DBRowGeneratorSet< ALLOC > &from, const allocator_type &alloc)
 copy constructor with a given allocator More...
 
 DBRowGeneratorSet (DBRowGeneratorSet< ALLOC > &&from)
 move constructor More...
 
 DBRowGeneratorSet (DBRowGeneratorSet< ALLOC > &&from, const allocator_type &alloc)
 move constructor with a given allocator More...
 
virtual DBRowGeneratorSet< ALLOC > * clone () const
 virtual copy constructor More...
 
virtual DBRowGeneratorSet< ALLOC > * clone (const allocator_type &alloc) const
 virtual copy constructor with a given allocator More...
 
virtual ~DBRowGeneratorSet ()
 destructor More...
 
Operators
DBRowGeneratorSet< ALLOC > & operator= (const DBRowGeneratorSet< ALLOC > &from)
 copy operator More...
 
DBRowGeneratorSet< ALLOC > & operator= (DBRowGeneratorSet< ALLOC > &&from)
 move operator More...
 
DBRowGenerator< ALLOC > & operator[] (const std::size_t i)
 returns the ith generator More...
 
const DBRowGenerator< ALLOC > & operator[] (const std::size_t i) const
 returns the ith generator More...
 
Accessors / Modifiers
template<template< template< typename > class > class Generator>
void insertGenerator (const Generator< ALLOC > &generator)
 inserts a new generator at the end of the set More...
 
template<template< template< typename > class > class Generator>
void insertGenerator (const Generator< ALLOC > &generator, const std::size_t i)
 inserts a new generator at the ith position of the set More...
 
std::size_t nbGenerators () const noexcept
 returns the number of generators More...
 
std::size_t size () const noexcept
 returns the number of generators (alias for nbGenerators) More...
 
bool hasRows ()
 returns true if there are still rows that can be output by the set of generators More...
 
bool setInputRow (const DBRow< DBTranslatedValue, ALLOC > &input_row)
 sets the input row from which the generators will create new rows More...
 
const DBRow< DBTranslatedValue, ALLOC > & generate ()
 generates a new output row from the input row More...
 
template<typename GUM_SCALAR >
void setBayesNet (const BayesNet< GUM_SCALAR > &new_bn)
 assign a new Bayes net to all the generators that depend on a BN More...
 
void reset ()
 resets all the generators More...
 
void clear ()
 removes all the generators More...
 
void setColumnsOfInterest (const std::vector< std::size_t, ALLOC< std::size_t > > &cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns More...
 
void setColumnsOfInterest (std::vector< std::size_t, ALLOC< std::size_t > > &&cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns More...
 
const std::vector< std::size_t, ALLOC< std::size_t > > & columnsOfInterest () const
 returns the current set of columns of interest More...
 
allocator_type getAllocator () const
 returns the allocator used More...
 

Public Types

using allocator_type = ALLOC< DBTranslatedValue >
 type for the allocators passed in arguments of methods More...
 

Detailed Description

template<template< typename > class ALLOC = std::allocator>
class gum::learning::DBRowGeneratorSet< ALLOC >

The class used to pack sets of generators.

When learning Bayesian networks, the records of the train dataset are used to construct contingency tables that are either exploited in statistical conditional independence tests or in scores. To achieve this, the values of the DatabaseTable's records need all be observed, i.e., there should be no missing value. When this is not the case, we need to decide what to do with the records (actually the DBRows) that contain missing values. Should we discard them? Should we use an EM algorithm to substitute them by several fully-observed DBRows weighted by their probability of occurrence? Should we use a K-means algorithm to substitute them by only one DBRow of highest probability of occurrence? DBRowGenerator classes are used to perform these substitutions. From one input DBRow, they can produce from 0 to several output DBRows. DBRowGenerator instances can be used in sequences, i.e., a first DBRowGenerator can, e.g., apply an EM algorithm to produce many output DBRows, then these DBRows can feed another DBRowGenerator that only keeps those whose weight is higher than a given threshold. The purpose of Class DBRowGeneratorSet is to contain this sequence of DBRowGenerator instances. The key idea is that it makes the parsing of the output DBRow generated easier. For instance, if we want to use a sequence of 2 generators, outputing 3 times and 4 times the DBRows they get in input respectively, we could use the following code:

gum::learning::DBRowGeneratorDuplicate<> generator3 ( col_types, 3 );
gum::learning::DBRowGeneratorDuplicate<> generator4 ( col_types, 4 );
for ( auto dbrow : database ) {
generator3.setInputRow ( dbrow );
while ( generator3.hasRows () ) {
const auto& output3_dbrow = generator3.generate ();
generator4.setInputRow ( output3_dbrow );
while ( generator4.hasRows () ) {
const auto& output4_dbrow = generator4.generate ();
// do something with output4_dbrow
}
}
}

For each input DBRow of the DatabaseTable, these while loops output 3 x 4 = 12 identical DBRows. As can be seen, when several DBRowGenerator instances are to be used in sequence, the code is not very easy to write. The DBRowGeneratorSet simplifies the coding as follows:

gum::learning::DBRowGeneratorDuplicate<> generator3 ( col_types, 3 );
gum::learning::DBRowGeneratorDuplicate<> generator4 ( col_types, 4 );
DBRowGeneratorSet<> genset;
genset.insertGenerator ( generator3 );
genset.insertGenerator ( generator4 );
for ( auto dbrow : database ) {
genset.setInputRow ( dbrow );
while ( genset.hasRows () ) {
const auto& output_dbrow = genset.generate ();
// do something with output_dbrow
}
}

As can be seen, whatever the number of DBRowGenerator instances packed into the DBRowGeneratorSet, only one while loop is needed to parse all the generated output DBRow instances.

Definition at line 112 of file DBRowGeneratorSet.h.

Member Typedef Documentation

◆ allocator_type

template<template< typename > class ALLOC = std::allocator>
using gum::learning::DBRowGeneratorSet< ALLOC >::allocator_type = ALLOC< DBTranslatedValue >

type for the allocators passed in arguments of methods

Definition at line 115 of file DBRowGeneratorSet.h.

Constructor & Destructor Documentation

◆ DBRowGeneratorSet() [1/5]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGeneratorSet< ALLOC >::DBRowGeneratorSet ( const allocator_type alloc = allocator_type())

default constructor

◆ DBRowGeneratorSet() [2/5]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGeneratorSet< ALLOC >::DBRowGeneratorSet ( const DBRowGeneratorSet< ALLOC > &  from)

copy constructor

◆ DBRowGeneratorSet() [3/5]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGeneratorSet< ALLOC >::DBRowGeneratorSet ( const DBRowGeneratorSet< ALLOC > &  from,
const allocator_type alloc 
)

copy constructor with a given allocator

◆ DBRowGeneratorSet() [4/5]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGeneratorSet< ALLOC >::DBRowGeneratorSet ( DBRowGeneratorSet< ALLOC > &&  from)

move constructor

◆ DBRowGeneratorSet() [5/5]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGeneratorSet< ALLOC >::DBRowGeneratorSet ( DBRowGeneratorSet< ALLOC > &&  from,
const allocator_type alloc 
)

move constructor with a given allocator

◆ ~DBRowGeneratorSet()

template<template< typename > class ALLOC = std::allocator>
virtual gum::learning::DBRowGeneratorSet< ALLOC >::~DBRowGeneratorSet ( )
virtual

destructor

Member Function Documentation

◆ clear()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBRowGeneratorSet< ALLOC >::clear ( )

removes all the generators

◆ clone() [1/2]

template<template< typename > class ALLOC = std::allocator>
virtual DBRowGeneratorSet< ALLOC >* gum::learning::DBRowGeneratorSet< ALLOC >::clone ( ) const
virtual

virtual copy constructor

◆ clone() [2/2]

template<template< typename > class ALLOC = std::allocator>
virtual DBRowGeneratorSet< ALLOC >* gum::learning::DBRowGeneratorSet< ALLOC >::clone ( const allocator_type alloc) const
virtual

virtual copy constructor with a given allocator

◆ columnsOfInterest()

template<template< typename > class ALLOC = std::allocator>
const std::vector< std::size_t, ALLOC< std::size_t > >& gum::learning::DBRowGeneratorSet< ALLOC >::columnsOfInterest ( ) const

returns the current set of columns of interest

◆ generate()

template<template< typename > class ALLOC = std::allocator>
const DBRow< DBTranslatedValue, ALLOC >& gum::learning::DBRowGeneratorSet< ALLOC >::generate ( )

generates a new output row from the input row

◆ getAllocator()

template<template< typename > class ALLOC = std::allocator>
allocator_type gum::learning::DBRowGeneratorSet< ALLOC >::getAllocator ( ) const

returns the allocator used

◆ hasRows()

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBRowGeneratorSet< ALLOC >::hasRows ( )

returns true if there are still rows that can be output by the set of generators

◆ insertGenerator() [1/2]

template<template< typename > class ALLOC = std::allocator>
template<template< template< typename > class > class Generator>
void gum::learning::DBRowGeneratorSet< ALLOC >::insertGenerator ( const Generator< ALLOC > &  generator)

inserts a new generator at the end of the set

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

◆ insertGenerator() [2/2]

template<template< typename > class ALLOC = std::allocator>
template<template< template< typename > class > class Generator>
void gum::learning::DBRowGeneratorSet< ALLOC >::insertGenerator ( const Generator< ALLOC > &  generator,
const std::size_t  i 
)

inserts a new generator at the ith position of the set

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

◆ nbGenerators()

template<template< typename > class ALLOC = std::allocator>
std::size_t gum::learning::DBRowGeneratorSet< ALLOC >::nbGenerators ( ) const
noexcept

returns the number of generators

◆ operator=() [1/2]

template<template< typename > class ALLOC = std::allocator>
DBRowGeneratorSet< ALLOC >& gum::learning::DBRowGeneratorSet< ALLOC >::operator= ( const DBRowGeneratorSet< ALLOC > &  from)

copy operator

◆ operator=() [2/2]

template<template< typename > class ALLOC = std::allocator>
DBRowGeneratorSet< ALLOC >& gum::learning::DBRowGeneratorSet< ALLOC >::operator= ( DBRowGeneratorSet< ALLOC > &&  from)

move operator

◆ operator[]() [1/2]

template<template< typename > class ALLOC = std::allocator>
DBRowGenerator< ALLOC >& gum::learning::DBRowGeneratorSet< ALLOC >::operator[] ( const std::size_t  i)

returns the ith generator

Warning
this operator assumes that there are at least i+1 generators. So, it won't check that the ith generator actually exists. If unsure, use method generatorSafe that performs this check.

◆ operator[]() [2/2]

template<template< typename > class ALLOC = std::allocator>
const DBRowGenerator< ALLOC >& gum::learning::DBRowGeneratorSet< ALLOC >::operator[] ( const std::size_t  i) const

returns the ith generator

Warning
this operator assumes that there are at least i+1 generators. So, it won't check that the ith generator actually exists. If unsure, use method generatorSafe that performs this check.

◆ reset()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBRowGeneratorSet< ALLOC >::reset ( )

resets all the generators

◆ setBayesNet()

template<template< typename > class ALLOC = std::allocator>
template<typename GUM_SCALAR >
void gum::learning::DBRowGeneratorSet< ALLOC >::setBayesNet ( const BayesNet< GUM_SCALAR > &  new_bn)

assign a new Bayes net to all the generators that depend on a BN

Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.

◆ setColumnsOfInterest() [1/2]

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBRowGeneratorSet< ALLOC >::setColumnsOfInterest ( const std::vector< std::size_t, ALLOC< std::size_t > > &  cols_of_interest)

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator instances contained in the DBRowGeneratorSet still output DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector<> { 0, 3, 4 }, then the DBRowGenerator instances contained in the DBRowGeneratorSet will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

◆ setColumnsOfInterest() [2/2]

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBRowGeneratorSet< ALLOC >::setColumnsOfInterest ( std::vector< std::size_t, ALLOC< std::size_t > > &&  cols_of_interest)

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator instances contained in the DBRowGeneratorSet still output DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector<> { 0, 3, 4 }, then the DBRowGenerator instances contained in the DBRowGeneratorSet will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

◆ setInputRow()

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBRowGeneratorSet< ALLOC >::setInputRow ( const DBRow< DBTranslatedValue, ALLOC > &  input_row)

sets the input row from which the generators will create new rows

Returns
true if the set of generators is able to generate output rows from the input row passed in argument

◆ size()

template<template< typename > class ALLOC = std::allocator>
std::size_t gum::learning::DBRowGeneratorSet< ALLOC >::size ( ) const
noexcept

returns the number of generators (alias for nbGenerators)


The documentation for this class was generated from the following file: