aGrUM  0.20.3
a C++ library for (probabilistic) graphical models
gum::learning::DBTranslator< ALLOC > Class Template Referenceabstract

The base class for all the tabular database cell translators. More...

#include <agrum/tools/database/DBTranslator.h>

+ Inheritance diagram for gum::learning::DBTranslator< ALLOC >:
+ Collaboration diagram for gum::learning::DBTranslator< ALLOC >:

Public Member Functions

Constructors / Destructors
template<template< typename > class XALLOC>
 DBTranslator (DBTranslatedValueType val_type, const std::vector< std::string, XALLOC< std::string > > &missing_symbols, const bool editable_dictionary=true, std::size_t max_dico_entries=std::numeric_limits< std::size_t >::max(), const allocator_type &alloc=allocator_type())
 default constructor More...
 
 DBTranslator (DBTranslatedValueType val_type, const bool editable_dictionary=true, std::size_t max_dico_entries=std::numeric_limits< std::size_t >::max(), const allocator_type &alloc=allocator_type())
 default constructor without missing symbols More...
 
 DBTranslator (const DBTranslator< ALLOC > &from)
 copy constructor More...
 
 DBTranslator (const DBTranslator< ALLOC > &from, const allocator_type &alloc)
 copy constructor with a given allocator More...
 
 DBTranslator (DBTranslator< ALLOC > &&from)
 move constructor More...
 
 DBTranslator (DBTranslator< ALLOC > &&from, const allocator_type &alloc)
 move constructor with a given allocator More...
 
virtual DBTranslator< ALLOC > * clone () const =0
 virtual copy constructor More...
 
virtual DBTranslator< ALLOC > * clone (const allocator_type &alloc) const =0
 virtual copy constructor with a given allocator More...
 
virtual ~DBTranslator ()
 destructor More...
 
Operators
DBTranslatedValue operator<< (const std::string &str)
 alias for method translate More...
 
std::string operator>> (const DBTranslatedValue translated_val)
 alias for method translateBack More...
 
Accessors / Modifiers
virtual DBTranslatedValue translate (const std::string &str)=0
 returns the translation of a string More...
 
virtual std::string translateBack (const DBTranslatedValue translated_val) const =0
 returns the original value for a given translation More...
 
virtual std::size_t domainSize () const =0
 returns the domain size of a variable corresponding to the translations More...
 
virtual bool hasEditableDictionary () const
 indicates whether the translator has an editable dictionary or not More...
 
virtual void setEditableDictionaryMode (bool new_mode)
 sets/unset the editable dictionary mode More...
 
virtual bool needsReordering () const =0
 indicates whether a reordering is needed to make the translations sorted More...
 
virtual HashTable< std::size_t, std::size_t, ALLOC< std::pair< std::size_t, std::size_t > > > reorder ()=0
 performs a reordering of the dictionary and returns a mapping from the old translated values to the new ones. More...
 
const Set< std::string, ALLOC< std::string > > & missingSymbols () const
 returns the set of missing symbols taken into account by the translator More...
 
bool isMissingSymbol (const std::string &str) const
 indicates whether a string corresponds to a missing symbol More...
 
virtual const Variablevariable () const =0
 returns the variable stored into the translator More...
 
void setVariableName (const std::string &str) const
 sets the name of the variable stored into the translator More...
 
void setVariableDescription (const std::string &str) const
 sets the name of the variable stored into the translator More...
 
DBTranslatedValueType getValType () const
 returns the type of values handled by the translator More...
 
allocator_type getAllocator () const
 returns the allocator used by the translator More...
 
bool isMissingValue (const DBTranslatedValue &val) const
 indicates whether a translated value corresponds to a missing value More...
 
virtual DBTranslatedValue missingValue () const =0
 returns the translation of a missing value More...
 

Public Types

using allocator_type = ALLOC< DBTranslatedValue >
 type for the allocators passed in arguments of methods More...
 

Protected Attributes

bool is_dictionary_dynamic_
 indicates whether the dictionary can be updated or not More...
 
std::size_t max_dico_entries_
 the maximum number of entries that the dictionary is allowed to contain More...
 
Set< std::string, ALLOC< std::string > > missing_symbols_
 the set of missing symbols More...
 
Bijection< std::size_t, std::string, ALLOC< std::pair< float, std::string > > > back_dico_
 the bijection relating back translated values and their original strings. More...
 
DBTranslatedValueType val_type_
 the type of the values translated by the translator More...
 

Protected Member Functions

Protected Operators
DBTranslator< ALLOC > & operator= (const DBTranslator< ALLOC > &from)
 copy operator More...
 
DBTranslator< ALLOC > & operator= (DBTranslator< ALLOC > &&from)
 move operator More...
 

Detailed Description

template<template< typename > class ALLOC = std::allocator>
class gum::learning::DBTranslator< ALLOC >

The base class for all the tabular database cell translators.

Translators are used by DatabaseTable instances to transform datasets' strings into DBTranslatedValue instances. The point is that strings are not adequate for fast learning, they need to be preprocessed into a type that can be analyzed quickly (the so-called DBTranslatedValue type). The DBTranslator class is the abstract base class for all the translators used in aGrUM.

Here is an example of how to use it, illustrated with the DBTranslator4ContinuousVariable class:

// create the translator, with possible missing symbols: "N/A" and "???"
// i.e., each time the translator reads a "N/A" or a "???" string, it
// won't translate it into a number but into a missing value.
std::vector<std::string> missing { "N/A", "???" };
// gets the DBTranslatedValue corresponding to some strings
auto val1 = translator.translate("5"); // val1 = DBTranslatedValue {5.0f}
auto val2 = translator.translate("4.2"); // val2 = DBTRanslatedValue {4.2f}
auto val3 = translator << "3.4"; // val3 = DBTranslatedValue {3.4f}
// add the numbers assigned to val1, val2, val3
float sum = val1.cont_val + val2.cont_val + val3.cont_val;
// translate missing values: val4 and val5 will be equal to:
// DBTranslatedValue { std::numeric_limits<float>::max () }
auto val4 = translator << "N/A";
auto val5 = translator.translate ( "???" );
// the following instructions raise TypeError exceptions because the
// strings cannot be translated into real numbers
auto val6 = translator << "4.22x";
auto val7 = translator.translate ( "xxx" );
// given a DBTranslatedValue that is supposed to contain a float, get
// the corresponding string. The strings should be equivalent to those
// indicated below (maybe they could contain more zeroes after the dot).
std::string str;
str = translator.translateBack ( val1 ); // str ~ "5.0"
str = translator >> val2; // str ~ "4.2"
str = translator >> gum::learning::DBTranslatedValue {7.2e3f};
// str ~ "7.2 e3"
// translate back missing values: the string will corresponds to one of
// the missing symbols known to the translator
str = translator >> val4; // str = "N/A" or "???"
str = translator >> val5; // str = "N/A" or "???"
// get the domain size of the variable stored into the translatator
// This size is only useful for translators with discrete variables
std::size_t size = translator.domainSize ();
// get the variable stored within the translator
dynamic_cast<const gum::ContinuousVariable<float>*>
( translator.variable () );

Definition at line 116 of file DBTranslator.h.

Member Typedef Documentation

◆ allocator_type

template<template< typename > class ALLOC = std::allocator>
using gum::learning::DBTranslator< ALLOC >::allocator_type = ALLOC< DBTranslatedValue >

type for the allocators passed in arguments of methods

Definition at line 119 of file DBTranslator.h.

Constructor & Destructor Documentation

◆ DBTranslator() [1/6]

template<template< typename > class ALLOC = std::allocator>
template<template< typename > class XALLOC>
gum::learning::DBTranslator< ALLOC >::DBTranslator ( DBTranslatedValueType  val_type,
const std::vector< std::string, XALLOC< std::string > > &  missing_symbols,
const bool  editable_dictionary = true,
std::size_t  max_dico_entries = std::numeric_limits< std::size_t >::max(),
const allocator_type alloc = allocator_type() 
)

default constructor

Parameters
val_typeindicates whether the DBTranslator deals with discrete or continuous variables
editable_dictionaryindicates whether the dictionary used for translations can be updated dynamically when observing new string or whether it should remain constant. To see how this parameter is handled, see the child classes inheriting from DBTranslator
missing_symbolsthe set of symbols in the database representing missing values
max_dico_entriesthe max number of entries that the dictionary can contain. If we try to add new entries in the dictionary, this will be considered as an error and a SizeError exception will be raised
allocThe allocator used to allocate memory for all the fields of the DBTranslator

◆ DBTranslator() [2/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBTranslator< ALLOC >::DBTranslator ( DBTranslatedValueType  val_type,
const bool  editable_dictionary = true,
std::size_t  max_dico_entries = std::numeric_limits< std::size_t >::max(),
const allocator_type alloc = allocator_type() 
)

default constructor without missing symbols

Parameters
val_typeindicates whether the DBTranslator deals with discrete or continuous variables
editable_dictionaryindicates whether the dictionary used for translations can be updated dynamically when observing new string or whether it should remain constant. To see how this parameter is handled, see the child classes inheriting from DBTranslator
max_dico_entriesthe max number of entries that the dictionary can contain. If we try to add new entries in the dictionary, this will be considered as an error and a SizeError exception will be raised
allocThe allocator used to allocate memory for all the fields of the DBTranslator

◆ DBTranslator() [3/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBTranslator< ALLOC >::DBTranslator ( const DBTranslator< ALLOC > &  from)

copy constructor

◆ DBTranslator() [4/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBTranslator< ALLOC >::DBTranslator ( const DBTranslator< ALLOC > &  from,
const allocator_type alloc 
)

copy constructor with a given allocator

◆ DBTranslator() [5/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBTranslator< ALLOC >::DBTranslator ( DBTranslator< ALLOC > &&  from)

move constructor

◆ DBTranslator() [6/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::DBTranslator< ALLOC >::DBTranslator ( DBTranslator< ALLOC > &&  from,
const allocator_type alloc 
)

move constructor with a given allocator

◆ ~DBTranslator()

template<template< typename > class ALLOC = std::allocator>
virtual gum::learning::DBTranslator< ALLOC >::~DBTranslator ( )
virtual

destructor

Member Function Documentation

◆ clone() [1/2]

template<template< typename > class ALLOC = std::allocator>
virtual DBTranslator< ALLOC >* gum::learning::DBTranslator< ALLOC >::clone ( ) const
pure virtual

◆ clone() [2/2]

template<template< typename > class ALLOC = std::allocator>
virtual DBTranslator< ALLOC >* gum::learning::DBTranslator< ALLOC >::clone ( const allocator_type alloc) const
pure virtual

◆ domainSize()

template<template< typename > class ALLOC = std::allocator>
virtual std::size_t gum::learning::DBTranslator< ALLOC >::domainSize ( ) const
pure virtual

returns the domain size of a variable corresponding to the translations

Assume that the translator has been fed with the observed values of a random variable. Then it has produced a set of translated values. The latter define the domain of the variable. When the variable is discrete, values are assumed to span from 0 to a number n-1. In this case, the domain size of the variable is n. When the function is continuous, the domain size should be infinite and we return a std::numeric_limits<std::size_t>::max() to represent it. Note that missing values are encoded as std::numeric_limits<>::max () and are not taken into account in the domain sizes.

Implemented in gum::learning::DBTranslator4ContinuousVariable< ALLOC >, gum::learning::DBTranslator4RangeVariable< ALLOC >, gum::learning::DBTranslator4LabelizedVariable< ALLOC >, and gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ getAllocator()

template<template< typename > class ALLOC = std::allocator>
allocator_type gum::learning::DBTranslator< ALLOC >::getAllocator ( ) const

returns the allocator used by the translator

◆ getValType()

template<template< typename > class ALLOC = std::allocator>
DBTranslatedValueType gum::learning::DBTranslator< ALLOC >::getValType ( ) const

returns the type of values handled by the translator

Returns
either DBTranslatedValueType::DISCRETE if the translator includes a discrete variable or DBTranslatedValueType::CONTINUOUS if it contains a continuous variable. This is convenient to know how to interpret the DBTranslatedValue instances produced by the DBTranslator: either using their discr_val field or their cont_val field.

◆ hasEditableDictionary()

template<template< typename > class ALLOC = std::allocator>
virtual bool gum::learning::DBTranslator< ALLOC >::hasEditableDictionary ( ) const
virtual

indicates whether the translator has an editable dictionary or not

Reimplemented in gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ isMissingSymbol()

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBTranslator< ALLOC >::isMissingSymbol ( const std::string &  str) const

indicates whether a string corresponds to a missing symbol

◆ isMissingValue()

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBTranslator< ALLOC >::isMissingValue ( const DBTranslatedValue val) const

indicates whether a translated value corresponds to a missing value

◆ missingSymbols()

template<template< typename > class ALLOC = std::allocator>
const Set< std::string, ALLOC< std::string > >& gum::learning::DBTranslator< ALLOC >::missingSymbols ( ) const

returns the set of missing symbols taken into account by the translator

◆ missingValue()

template<template< typename > class ALLOC = std::allocator>
virtual DBTranslatedValue gum::learning::DBTranslator< ALLOC >::missingValue ( ) const
pure virtual

◆ needsReordering()

template<template< typename > class ALLOC = std::allocator>
virtual bool gum::learning::DBTranslator< ALLOC >::needsReordering ( ) const
pure virtual

indicates whether a reordering is needed to make the translations sorted

If the strings represented by the translations are only numbers, translations are considered to be sorted if and only if they are sorted by increasing number. If the strings do not only represent numbers, then translations are considered to be sorted if and only if they are sorted lexicographically.

When constructing dynamically its dictionary, the translator may assign wrong DBTranslatedValue values to strings. For instance, a translator reading sequentially integer strings 4, 1, 3, may map 4 into DBTranslatedValue{std::size_t(0)}, 1 into DBTranslatedValue{std::size_t(1)} and 3 into DBTranslatedValue{std::size_t(2)}, resulting in random variables having domain {4,1,3}. The user may prefer having domain {1,3,4}, i.e., a domain specified with increasing values. This requires a reordering. Method needsReodering() returns a Boolean indicating whether such a reordering should be performed or whether the current order is OK.

Implemented in gum::learning::DBTranslator4ContinuousVariable< ALLOC >, gum::learning::DBTranslator4LabelizedVariable< ALLOC >, gum::learning::DBTranslator4RangeVariable< ALLOC >, and gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ operator<<()

template<template< typename > class ALLOC = std::allocator>
DBTranslatedValue gum::learning::DBTranslator< ALLOC >::operator<< ( const std::string &  str)

alias for method translate

◆ operator=() [1/2]

template<template< typename > class ALLOC = std::allocator>
DBTranslator< ALLOC >& gum::learning::DBTranslator< ALLOC >::operator= ( const DBTranslator< ALLOC > &  from)
protected

copy operator

◆ operator=() [2/2]

template<template< typename > class ALLOC = std::allocator>
DBTranslator< ALLOC >& gum::learning::DBTranslator< ALLOC >::operator= ( DBTranslator< ALLOC > &&  from)
protected

move operator

◆ operator>>()

template<template< typename > class ALLOC = std::allocator>
std::string gum::learning::DBTranslator< ALLOC >::operator>> ( const DBTranslatedValue  translated_val)

alias for method translateBack

◆ reorder()

template<template< typename > class ALLOC = std::allocator>
virtual HashTable< std::size_t, std::size_t, ALLOC< std::pair< std::size_t, std::size_t > > > gum::learning::DBTranslator< ALLOC >::reorder ( )
pure virtual

performs a reordering of the dictionary and returns a mapping from the old translated values to the new ones.

When a reordering is needed, i.e., string values must be translated differently, Method reorder() computes how the translations should be changed. It updates accordingly the dictionary and returns the mapping that enables changing the old dictionary values into the new ones. Note that the hash table returned is expressed in terms of std::size_t because only the translations for discrete random variables need be reordered, those for continuous random variables are identity mappings.

Warning
If there is no reordering to perform, the method returns an empty hashtable.

Implemented in gum::learning::DBTranslator4LabelizedVariable< ALLOC >, gum::learning::DBTranslator4RangeVariable< ALLOC >, gum::learning::DBTranslator4ContinuousVariable< ALLOC >, and gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ setEditableDictionaryMode()

template<template< typename > class ALLOC = std::allocator>
virtual void gum::learning::DBTranslator< ALLOC >::setEditableDictionaryMode ( bool  new_mode)
virtual

sets/unset the editable dictionary mode

Reimplemented in gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ setVariableDescription()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBTranslator< ALLOC >::setVariableDescription ( const std::string &  str) const

sets the name of the variable stored into the translator

◆ setVariableName()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::DBTranslator< ALLOC >::setVariableName ( const std::string &  str) const

sets the name of the variable stored into the translator

◆ translate()

template<template< typename > class ALLOC = std::allocator>
virtual DBTranslatedValue gum::learning::DBTranslator< ALLOC >::translate ( const std::string &  str)
pure virtual

returns the translation of a string

This method tries to translate a given string into the DBTranslatedValue that should be stored into a DatabaseTable. If the translator cannot find the translation in its current dictionary, then two situations can obtain:

  1. if the translator is not in an editable dictionary mode, then the translator raises a NotFound exception.
  2. if the translator is in an editable dictionary mode, i.e., it is allowed to update its dictionary, then it tries to add the string as a new value in the dictionary. Upon success, it returns the translated value, otherwise, it raises either:
    • a SizeError exception if the number of entries in the dictionary has already reached its maximum,
    • a TypeError exception if the string cannot be converted into a value that can be inserted into the dictionary
    • an OperationNotAllowed exception if the translation would induce incoherent behavior (e.g., a DBTranslator4ContinuousVariable that contains a variable whose domain is [x,y] as well as a missing value symbol z \(\in\) [x,y]).
Warning
Note that missing values (i.e., string encoded as missing symbols) are translated as std::numeric_limits<>::max ().
Parameters
strthe string that the DBTranslator will try to translate
Returns
the translated value of the string to be stored into a DatabaseTable
Exceptions
UnknownLabelInDatabaseis raised if the translation cannot be found and the translator is not in an editable dictionary mode.
SizeErroris raised if the number of entries in the dictionary has already reached its maximum.
OperationNotAllowedexception is raised if the translation cannot be found and the insertion of the string into the translator's dictionary fails because it would induce incoherent behavior (e.g., a DBTranslator4ContinuousVariable that contains a variable whose domain is [x,y] as well as a missing value symbol z \(\in\) [x,y]).
TypeErroris raised if the translation cannot be found and the insertion of the string into the translator's dictionary fails due to str being impossible to be converted into an appropriate type.

Implemented in gum::learning::DBTranslator4ContinuousVariable< ALLOC >, gum::learning::DBTranslator4RangeVariable< ALLOC >, gum::learning::DBTranslator4LabelizedVariable< ALLOC >, and gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ translateBack()

template<template< typename > class ALLOC = std::allocator>
virtual std::string gum::learning::DBTranslator< ALLOC >::translateBack ( const DBTranslatedValue  translated_val) const
pure virtual

returns the original value for a given translation

Parameters
translated_vala value that should result from a translation and for which we are looking for the corresponding DBTranslator's variable's label (a string)
Returns
the string that was translated into a given DBTranslatedValue.
Warning
when the translator is not a proper bijection, like, e.g., DBTranslator4DiscretizedVariable, the method returns the value of the random variable corresponding to translated_val (i.e., for a discretized variable, it would return the interval corresponding to translated_val).
Exceptions
UnknownLabelInDatabaseis raised if this original value cannot be found

Implemented in gum::learning::DBTranslator4ContinuousVariable< ALLOC >, gum::learning::DBTranslator4RangeVariable< ALLOC >, gum::learning::DBTranslator4LabelizedVariable< ALLOC >, and gum::learning::DBTranslator4DiscretizedVariable< ALLOC >.

◆ variable()

template<template< typename > class ALLOC = std::allocator>
virtual const Variable* gum::learning::DBTranslator< ALLOC >::variable ( ) const
pure virtual

Member Data Documentation

◆ back_dico_

template<template< typename > class ALLOC = std::allocator>
Bijection< std::size_t, std::string, ALLOC< std::pair< float, std::string > > > gum::learning::DBTranslator< ALLOC >::back_dico_
mutableprotected

the bijection relating back translated values and their original strings.

Note that the translated values considered here are of type std::size_t because only the values for discrete variables need be stored, those for continuous variables are actually identity mappings.

Warning
only the values of the random variable are stored into this bijection. Missing values are not considered here.

Definition at line 388 of file DBTranslator.h.

◆ is_dictionary_dynamic_

template<template< typename > class ALLOC = std::allocator>
bool gum::learning::DBTranslator< ALLOC >::is_dictionary_dynamic_
protected

indicates whether the dictionary can be updated or not

Definition at line 373 of file DBTranslator.h.

◆ max_dico_entries_

template<template< typename > class ALLOC = std::allocator>
std::size_t gum::learning::DBTranslator< ALLOC >::max_dico_entries_
protected

the maximum number of entries that the dictionary is allowed to contain

Definition at line 376 of file DBTranslator.h.

◆ missing_symbols_

template<template< typename > class ALLOC = std::allocator>
Set< std::string, ALLOC< std::string > > gum::learning::DBTranslator< ALLOC >::missing_symbols_
protected

the set of missing symbols

Definition at line 379 of file DBTranslator.h.

◆ val_type_

template<template< typename > class ALLOC = std::allocator>
DBTranslatedValueType gum::learning::DBTranslator< ALLOC >::val_type_
protected

the type of the values translated by the translator

Definition at line 391 of file DBTranslator.h.


The documentation for this class was generated from the following file: