aGrUM  0.13.2
gum::learning::DBRowGenerator< ALLOC > Class Template Referenceabstract

The base class for all DBRow generators. More...

#include <agrum/learning/database/DBRowGenerator.h>

+ Inheritance diagram for gum::learning::DBRowGenerator< ALLOC >:

Public Member Functions

Constructors / Destructors
 DBRowGenerator (const std::vector< DBTranslatedValueType, ALLOC< DBTranslatedValueType > > column_types, const DBRowGeneratorGoal goal, const allocator_type &alloc=allocator_type())
 default constructor More...
 
 DBRowGenerator (const DBRowGenerator< ALLOC > &from)
 copy constructor More...
 
 DBRowGenerator (const DBRowGenerator< ALLOC > &from, const allocator_type &alloc)
 copy constructor with a given allocator More...
 
 DBRowGenerator (DBRowGenerator< ALLOC > &&from)
 move constructor More...
 
 DBRowGenerator (DBRowGenerator< ALLOC > &&from, const allocator_type &alloc)
 move constructor with a given allocator More...
 
virtual DBRowGenerator< ALLOC > * clone () const =0
 virtual copy constructor More...
 
virtual DBRowGenerator< ALLOC > * clone (const allocator_type &alloc) const =0
 virtual copy constructor with a given allocator More...
 
virtual ~DBRowGenerator ()
 destructor More...
 
Accessors / Modifiers
bool hasRows ()
 returns true if there are still rows that can be output by the DBRowGenerator More...
 
bool setInputRow (const DBRow< DBTranslatedValue, ALLOC > &row)
 sets the input row from which the generator will create its output rows More...
 
virtual const DBRow< DBTranslatedValue, ALLOC > & generate ()=0
 generate new rows from the input row More...
 
void decreaseRemainingRows ()
 decrease the number of remaining output rows More...
 
virtual void reset ()
 resets the generator. There are therefore no more ouput row to generate More...
 
virtual void setColumnsOfInterest (const std::vector< std::size_t, ALLOC< std::size_t > > &cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns More...
 
virtual void setColumnsOfInterest (std::vector< std::size_t, ALLOC< std::size_t > > &&cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns More...
 
const std::vector< std::size_t, ALLOC< std::size_t > > & columnsOfInterest () const
 returns the current set of columns of interest More...
 
allocator_type getAllocator () const
 returns the allocator used More...
 
DBRowGeneratorGoal goal () const
 returns the goal of the DBRowGenerator More...
 

Public Types

using allocator_type = ALLOC< DBTranslatedValue >
 type for the allocators passed in arguments of methods More...
 

Protected Attributes

std::size_t _nb_remaining_output_rows {std::size_t(0)}
 the number of output rows still to retrieve through the generate method More...
 
std::vector< DBTranslatedValueType, ALLOC< DBTranslatedValueType > > _column_types
 the types of the columns in the DatabaseTable More...
 
std::vector< std::size_t, ALLOC< std::size_t > > _columns_of_interest
 the set of columns of interest More...
 
DBRowGeneratorGoal _goal
 the goal of the DBRowGenerator (just remove missing values or not) More...
 

Protected Member Functions

DBRowGenerator< ALLOC > & operator= (const DBRowGenerator< ALLOC > &)
 copy constructor More...
 
DBRowGenerator< ALLOC > & operator= (DBRowGenerator< ALLOC > &&)
 move constructor More...
 
virtual std::size_t _computeRows (const DBRow< DBTranslatedValue, ALLOC > &row)=0
 the method that computes the set of DBRow instances to output after method setInputRow has been called More...
 

Detailed Description

template<template< typename > class ALLOC = std::allocator>
class gum::learning::DBRowGenerator< ALLOC >

The base class for all DBRow generators.

A DBRowGenerator instance takes as input a DBRow containing DBTranslatedValue instances provided directly by a DatabaseTable or resulting from a DBRow generation by another DBRowGenerator. Then, it produces from 0 to several instances of DBRow of DBTranslatedValue. This is essentially useful to deal with missing values: during learning, when a DBRow contains some missing values, what should we do with it? Should we discard it? Should we use an EM algorithm to produce several DBRows weighted by their probability of occurrence? Should we use a K-means algorithm to produce only one DBRow of highest probability of occurrence? Using the appropriate DBRowGenerator, you can apply any of these rules when your learning algorithm parses the DatabaseTable. You just need to indicate which DBRowGenerator to use, no line of code needs be changed in your high-level learning algorithm.

As an example of how a DBRowGenerator works, an "Identity" DBRowGenerator takes as input a DBRow and returns it without any further processing, so it "produces" only one output DBRow. An EM DBRowGenerator takes in input a DBRow in which some cells may be missing. In this case, it produces all the possible combinations of values that these missing values may take and it assigns to these combinations a weight proportional to their probability of occurrence according to a given model. As such, it may most often produce several output DBRows.

The standard usage of a DBRowGenerator is the following:

// create a DatabaseTable and fill it
for ( int i = 0; i < 10; ++i )
// fill the database
// keep in a vector the types of the columns in the database
const std::vector<gum::learning::DBTranslatedValueType>
// create the generator
// parse the database and produce output rows
for ( auto dbrow : database ) {
generator.setInputRow ( dbrow );
while ( generator.hasRows () ) {
const auto& output_dbrow = generator.generate ();
// do something with the output dbrow
}
}

All DBRowGenerator classes should derive from this class. It takes care of the interaction with the RecordCounter / Score classes. The user who wishes to create a new DBRowGenerator, say for instance, one that outputs k times the input row, just has to define the following class (not all the constructors/destructors are required, but we provide them for self-consistency), the important part of which is located from the "Accessors / Modifiers" section on:

template <template<typename> class ALLOC = std::allocator>
class DuplicateGenerator : public DBRowGenerator<ALLOC> {
public:
using allocator_type = ALLOC<DBTranslatedValue>;
// ######################################################################
// Constructors / Destructors
// ######################################################################
DuplicateGenerator( const std::vector<DBTranslatedValueType,
ALLOC<DBTranslatedValueType>> column_types,
const std::size_t nb_duplicates,
const allocator_type& alloc = allocator_type () )
: DBRowGenerator<ALLOC> ( column_types, alloc )
, __nb_duplicates ( nb_duplicates ) {}
DuplicateGenerator( const DuplicateGenerator<ALLOC>& from,
const allocator_type& alloc )
: DBRowGenerator<ALLOC>( from, alloc )
, __input_row( from.__input_row )
, __nb_duplicates ( from.__nb_duplicates ) {}
DuplicateGenerator( const DuplicateGenerator<ALLOC>& from )
: DuplicateGenerator<ALLOC> ( from, from.getAllocator () ) {}
DuplicateGenerator( DuplicateGenerator<ALLOC>&& from,
const allocator_type& alloc )
: DBRowGenerator<ALLOC> ( std::move( from ), alloc )
, __input_row( from.__input_row )
, __nb_duplicates ( from.__nb_duplicates ) {}
DuplicateGenerator( DuplicateGenerator<ALLOC>&& from )
: DuplicateGenerator<ALLOC> ( std::move(from), from.getAllocator() ) {}
virtual DuplicateGenerator<ALLOC>*
clone ( const allocator_type& alloc ) const {
ALLOC<DuplicateGenerator<ALLOC>> allocator ( alloc );
DuplicateGenerator<ALLOC>* generator = allocator.allocate(1);
try { allocator.construct ( generator, *this, alloc ); }
catch ( ... ) {
allocator.deallocate ( generator, 1 );
throw;
}
return generator;
}
virtual DuplicateGenerator<ALLOC>* clone () const {
return clone ( this->getAllocator () );
}
~DuplicateGenerator() {}
// ######################################################################
// Operators
// ######################################################################
DuplicateGenerator<ALLOC>&
operator=( const DuplicateGenerator<ALLOC>& from ) {
__input_row = from.__input_row;
__nb_duplicates = from.__nb_duplicates;
return *this;
}
DuplicateGenerator<ALLOC>& operator=( DuplicateGenerator<ALLOC>&& from ) {
DBRowGenerator<ALLOC>::operator=( std::move( from ) );
__input_row = from.__input_row;
__nb_duplicates = from.__nb_duplicates;
return *this;
}
// ######################################################################
// Accessors / Modifiers
// ######################################################################
virtual const DBRow<DBTranslatedValue,ALLOC>& generate() final {
return *__input_row;
}
protected:
virtual std::size_t
_computeRows( const DBRow<DBTranslatedValue,ALLOC>& row ) final {
__input_row = &row;
return __nb_duplicates;
}
private:
const DBRow<DBTranslatedValue,ALLOC>* __input_row { nullptr };
std::size_t __nb_duplicates { std::size_t(1) };
};

Definition at line 232 of file DBRowGenerator.h.

Member Typedef Documentation

template<template< typename > class ALLOC = std::allocator>
using gum::learning::DBRowGenerator< ALLOC >::allocator_type = ALLOC< DBTranslatedValue >

type for the allocators passed in arguments of methods

Definition at line 235 of file DBRowGenerator.h.

Constructor & Destructor Documentation

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGenerator< ALLOC >::DBRowGenerator ( const std::vector< DBTranslatedValueType, ALLOC< DBTranslatedValueType > >  column_types,
const DBRowGeneratorGoal  goal,
const allocator_type alloc = allocator_type() 
)

default constructor

Parameters
column_typesindicates for each column whether this is a continuous or a discrete one
allocthe allocator used by all the methods
template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGenerator< ALLOC >::DBRowGenerator ( const DBRowGenerator< ALLOC > &  from)

copy constructor

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGenerator< ALLOC >::DBRowGenerator ( const DBRowGenerator< ALLOC > &  from,
const allocator_type alloc 
)

copy constructor with a given allocator

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGenerator< ALLOC >::DBRowGenerator ( DBRowGenerator< ALLOC > &&  from)

move constructor

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBRowGenerator< ALLOC >::DBRowGenerator ( DBRowGenerator< ALLOC > &&  from,
const allocator_type alloc 
)

move constructor with a given allocator

template<template< typename > class ALLOC = std::allocator>
virtual gum::learning::DBRowGenerator< ALLOC >::~DBRowGenerator ( )
virtual

destructor

Member Function Documentation

template<template< typename > class ALLOC = std::allocator>
virtual std::size_t gum::learning::DBRowGenerator< ALLOC >::_computeRows ( const DBRow< DBTranslatedValue, ALLOC > &  row)
protectedpure virtual

the method that computes the set of DBRow instances to output after method setInputRow has been called

Implemented in gum::learning::DBRowGeneratorIdentity< ALLOC >.

template<template< typename > class ALLOC = std::allocator>
virtual DBRowGenerator< ALLOC >* gum::learning::DBRowGenerator< ALLOC >::clone ( ) const
pure virtual

virtual copy constructor

Implemented in gum::learning::DBRowGeneratorIdentity< ALLOC >.

template<template< typename > class ALLOC = std::allocator>
virtual DBRowGenerator< ALLOC >* gum::learning::DBRowGenerator< ALLOC >::clone ( const allocator_type alloc) const
pure virtual

virtual copy constructor with a given allocator

Implemented in gum::learning::DBRowGeneratorIdentity< ALLOC >.

template<template< typename > class ALLOC = std::allocator>
const std::vector< std::size_t, ALLOC< std::size_t > >& gum::learning::DBRowGenerator< ALLOC >::columnsOfInterest ( ) const

returns the current set of columns of interest

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBRowGenerator< ALLOC >::decreaseRemainingRows ( )

decrease the number of remaining output rows

When method setInputRow is performed, the DBRowGenerator knows how many output rows it will be able to generate. Each time method decreaseRemainingRows is called, we decrement this number. When the number becomes equal to 0, then there remains no new output row to generate.

template<template< typename > class ALLOC = std::allocator>
virtual const DBRow< DBTranslatedValue, ALLOC >& gum::learning::DBRowGenerator< ALLOC >::generate ( )
pure virtual

generate new rows from the input row

Implemented in gum::learning::DBRowGeneratorIdentity< ALLOC >.

template<template< typename > class ALLOC = std::allocator>
allocator_type gum::learning::DBRowGenerator< ALLOC >::getAllocator ( ) const

returns the allocator used

template<template< typename > class ALLOC = std::allocator>
DBRowGeneratorGoal gum::learning::DBRowGenerator< ALLOC >::goal ( ) const

returns the goal of the DBRowGenerator

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBRowGenerator< ALLOC >::hasRows ( )

returns true if there are still rows that can be output by the DBRowGenerator

template<template< typename > class ALLOC = std::allocator>
DBRowGenerator< ALLOC >& gum::learning::DBRowGenerator< ALLOC >::operator= ( const DBRowGenerator< ALLOC > &  )
protected

copy constructor

template<template< typename > class ALLOC = std::allocator>
DBRowGenerator< ALLOC >& gum::learning::DBRowGenerator< ALLOC >::operator= ( DBRowGenerator< ALLOC > &&  )
protected

move constructor

template<template< typename > class ALLOC = std::allocator>
virtual void gum::learning::DBRowGenerator< ALLOC >::reset ( )
virtual

resets the generator. There are therefore no more ouput row to generate

template<template< typename > class ALLOC = std::allocator>
virtual void gum::learning::DBRowGenerator< ALLOC >::setColumnsOfInterest ( const std::vector< std::size_t, ALLOC< std::size_t > > &  cols_of_interest)
virtual

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator still outputs DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector<> { 0, 3, 4 }, then the DBRowGenerator will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

template<template< typename > class ALLOC = std::allocator>
virtual void gum::learning::DBRowGenerator< ALLOC >::setColumnsOfInterest ( std::vector< std::size_t, ALLOC< std::size_t > > &&  cols_of_interest)
virtual

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator still outputs DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector<> { 0, 3, 4 }, then the DBRowGenerator will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBRowGenerator< ALLOC >::setInputRow ( const DBRow< DBTranslatedValue, ALLOC > &  row)

sets the input row from which the generator will create its output rows

Returns
a Boolean indicating whether, from this input DBRow, the DBRowGenerator is capable of outputing at least one row or not

Member Data Documentation

template<template< typename > class ALLOC = std::allocator>
std::vector< DBTranslatedValueType, ALLOC< DBTranslatedValueType > > gum::learning::DBRowGenerator< ALLOC >::_column_types
protected

the types of the columns in the DatabaseTable

This is useful to determine whether we need to use the .discr_val field or the .cont_val field in DBTranslatedValue instances.

Definition at line 363 of file DBRowGenerator.h.

template<template< typename > class ALLOC = std::allocator>
std::vector< std::size_t, ALLOC< std::size_t > > gum::learning::DBRowGenerator< ALLOC >::_columns_of_interest
protected

the set of columns of interest

Definition at line 366 of file DBRowGenerator.h.

template<template< typename > class ALLOC = std::allocator>
DBRowGeneratorGoal gum::learning::DBRowGenerator< ALLOC >::_goal
protected
Initial value:

the goal of the DBRowGenerator (just remove missing values or not)

Definition at line 369 of file DBRowGenerator.h.

template<template< typename > class ALLOC = std::allocator>
std::size_t gum::learning::DBRowGenerator< ALLOC >::_nb_remaining_output_rows {std::size_t(0)}
protected

the number of output rows still to retrieve through the generate method

Definition at line 357 of file DBRowGenerator.h.


The documentation for this class was generated from the following file: