aGrUM  0.13.2
gum::learning::genericBNLearner::Database Class Reference

a helper to easily read databases More...

#include <genericBNLearner.h>

+ Collaboration diagram for gum::learning::genericBNLearner::Database:

Public Member Functions

template<typename GUM_SCALAR >
 Database (const std::string &filename, const BayesNet< GUM_SCALAR > &bn, const std::vector< std::string > &missing_symbols)
 
template<typename GUM_SCALAR >
 Database (const std::string &filename, Database &score_database, const BayesNet< GUM_SCALAR > &bn, const std::vector< std::string > &missing_symbols)
 
Constructors / Destructors
 Database (const std::string &file, const std::vector< std::string > &missing_symbols)
 default constructor More...
 
 Database (const DatabaseTable<> &db)
 default constructor More...
 
 Database (const std::string &filename, Database &score_database, const std::vector< std::string > &missing_symbols)
 default constructor with defined modalities for some variables More...
 
template<typename GUM_SCALAR >
 Database (const std::string &filename, const gum::BayesNet< GUM_SCALAR > &bn, const std::vector< std::string > &missing_symbols)
 default constructor for the aprioris More...
 
template<typename GUM_SCALAR >
 Database (const std::string &filename, Database &score_database, const gum::BayesNet< GUM_SCALAR > &bn, const std::vector< std::string > &missing_symbols)
 default constructor More...
 
 Database (const Database &from)
 copy constructor More...
 
 Database (Database &&from)
 move constructor More...
 
 ~Database ()
 destructor More...
 
Operators
Databaseoperator= (const Database &from)
 copy operator More...
 
Databaseoperator= (Database &&from)
 move operator More...
 
Accessors / Modifiers
DBRowGeneratorParserparser ()
 returns the parser for the database More...
 
std::vector< Size > & modalities () noexcept
 returns the modalities of the variables More...
 
const std::vector< std::string > & names () const noexcept
 returns the names of the variables in the database More...
 
NodeId idFromName (const std::string &var_name) const
 returns the node id corresponding to a variable name More...
 
const std::string & nameFromId (NodeId id) const
 returns the variable name corresponding to a given node id More...
 
const DatabaseTabledatabaseTable () const
 returns the internal database table More...
 
const std::vector< std::string > & missingSymbols () const
 returns the set of missing symbols taken into account More...
 

Protected Attributes

DatabaseTable __database
 the database itself More...
 
DBRowGeneratorParser__parser {nullptr}
 the parser used for reading the database More...
 
std::vector< Size__modalities
 the modalities of the variables More...
 
Bijection< std::string, NodeId__name2nodeId
 a hashtable assigning to each variable name its NodeId More...
 
Size __max_threads_number {1}
 the max number of threads authorized More...
 
Size __min_nb_rows_per_thread {100}
 the minimal number of rows to parse (on average) by thread More...
 

Detailed Description

a helper to easily read databases

Definition at line 121 of file genericBNLearner.h.

Constructor & Destructor Documentation

gum::learning::genericBNLearner::Database::Database ( const std::string &  file,
const std::vector< std::string > &  missing_symbols 
)
explicit

default constructor

Definition at line 63 of file genericBNLearner.cpp.

65  :
66  Database(genericBNLearner::__readFile(filename, missing_symbols)) {}
static DatabaseTable __readFile(const std::string &filename, const std::vector< std::string > &missing_symbols)
reads a file and returns a databaseVectInRam
Database(const std::string &file, const std::vector< std::string > &missing_symbols)
default constructor
gum::learning::genericBNLearner::Database::Database ( const DatabaseTable<> &  db)
explicit

default constructor

Definition at line 45 of file genericBNLearner.cpp.

References __database, __modalities, __name2nodeId, __parser, gum::learning::DatabaseTable< ALLOC >::domainSizes(), gum::learning::IDatabaseTable< T_DATA, ALLOC >::handler(), gum::BijectionImplementation< T1, T2, Alloc, Gen >::insert(), and gum::learning::IDatabaseTable< T_DATA, ALLOC >::variableNames().

45  :
46  __database(db) {
47  // get the variables names
48  const auto& var_names = __database.variableNames();
49  const std::size_t nb_vars = var_names.size();
50  __modalities.resize(nb_vars);
51  const auto domainSizes = __database.domainSizes();
52  for (std::size_t i = 0; i < nb_vars; ++i) {
53  __name2nodeId.insert(var_names[i], NodeId(i));
54  __modalities[i] = Size(domainSizes[i]);
55  }
56 
57  // create the parser
58  __parser =
59  new DBRowGeneratorParser<>(__database.handler(), DBRowGeneratorSet<>());
60  }
void insert(const T1 &first, const T2 &second)
Inserts a new association in the gum::Bijection.
unsigned long Size
In aGrUM, hashed values are unsigned long int.
Definition: types.h:50
unsigned int NodeId
Type for node ids.
Definition: graphElements.h:97
DatabaseTable __database
the database itself
DBVector< std::size_t > domainSizes() const
returns the domain sizes of all the variables in the database table
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
iterator handler() const
returns a new unsafe handler on the database
DBRowGeneratorParser * __parser
the parser used for reading the database
std::vector< Size > __modalities
the modalities of the variables
const DBVector< std::string > & variableNames() const noexcept
returns the variable names for all the columns of the database

+ Here is the call graph for this function:

gum::learning::genericBNLearner::Database::Database ( const std::string &  filename,
Database score_database,
const std::vector< std::string > &  missing_symbols 
)

default constructor with defined modalities for some variables

Parameters
filenameThe file to read.
modalitiesindicate for some nodes (not necessarily all the nodes of the BN) which modalities they should have and in which order these modalities should be stored into the nodes. For instance, if modalities = { 1 -> {True, False, Big} }, then the node of id 1 in the BN will have 3 modalities, the first one being True, the second one being False, and the third bein Big.
check_databaseIf true, the database will be checked.default constructor for the aprioris We must ensure that, when reading the apriori database, if the "apriori" rowFilter says that a given variable has value i (given by its fast translator), the corresponding "raw" value in the apriori database is the same as in the score/parameter database read before creating the apriori. This is compulsory to have aprioris that make sense.

Definition at line 99 of file genericBNLearner.cpp.

References __database, GUM_ERROR, gum::learning::IDatabaseTable< T_DATA, ALLOC >::nbVariables(), and gum::learning::IDatabaseTable< T_DATA, ALLOC >::variableNames().

102  :
103  __database(genericBNLearner::__readFile(filename, missing_symbols)) {
104  // check that there are at least as many variables in the a priori
105  // database as those in the score_database
106  if (__database.nbVariables() < apriori_database.__database.nbVariables()) {
107  GUM_ERROR(InvalidArgument,
108  "the a priori seems to have fewer variables "
109  "than the observed database");
110  }
111 
112  const std::vector< std::string >& apriori_vars =
113  apriori_database.__database.variableNames();
114  const std::vector< std::string >& score_vars = __database.variableNames();
115 
116  Size size = Size(apriori_vars.size());
117  for (Idx i = 0; i < size; ++i) {
118  if (apriori_vars[i] != score_vars[i]) {
119  GUM_ERROR(InvalidArgument,
120  "some a priori variables do not match "
121  "their counterpart in the score database");
122  }
123  }
124 
125  /*
126  ##### TODO: see what is the point of passing in argument score_database
127 
128  __raw_translators = score_database.__raw_translators;
129  auto raw_filter =
130  make_DB_row_filter(__database, __raw_translators, __generators);
131  __raw_translators = raw_filter.translatorSet();
132  score_database.__raw_translators = raw_filter.translatorSet();
133  */
134  }
unsigned long Size
In aGrUM, hashed values are unsigned long int.
Definition: types.h:50
DatabaseTable __database
the database itself
std::size_t nbVariables() const noexcept
returns the number of variables (columns) of the database
static DatabaseTable __readFile(const std::string &filename, const std::vector< std::string > &missing_symbols)
reads a file and returns a databaseVectInRam
const DBVector< std::string > & variableNames() const noexcept
returns the variable names for all the columns of the database
unsigned long Idx
Type for indexes.
Definition: types.h:43
#define GUM_ERROR(type, msg)
Definition: exceptions.h:66

+ Here is the call graph for this function:

template<typename GUM_SCALAR >
gum::learning::genericBNLearner::Database::Database ( const std::string &  filename,
const gum::BayesNet< GUM_SCALAR > &  bn,
const std::vector< std::string > &  missing_symbols 
)

default constructor for the aprioris

We must ensure that, when reading the apriori database, if the "apriori" rowFilter says that a given variable has value i (given by its fast translator), the corresponding "raw" value in the apriori database is the same as in the score/parameter database read before creating the apriori. This is compulsory to have aprioris that make sense.

Parameters
filenameThe fila to read.
score_databaseThe score database.
modalitiesindicate for some nodes (not necessarily all the nodes of the BN) which modalities they should have and in which order these modalities should be stored into the nodes. For instance, if modalities = { 1 -> {True, False, Big} }, then the node of id 1 in the BN will have 3 modalities, the first one being True, the second one being False, and the third bein Big.
template<typename GUM_SCALAR >
gum::learning::genericBNLearner::Database::Database ( const std::string &  filename,
Database score_database,
const gum::BayesNet< GUM_SCALAR > &  bn,
const std::vector< std::string > &  missing_symbols 
)

default constructor

gum::learning::genericBNLearner::Database::Database ( const Database from)

copy constructor

Definition at line 137 of file genericBNLearner.cpp.

References __database, __parser, and gum::learning::IDatabaseTable< T_DATA, ALLOC >::handler().

137  :
138  __database(from.__database), __modalities(from.__modalities),
139  __name2nodeId(from.__name2nodeId) {
140  // create the parser
141  __parser =
142  new DBRowGeneratorParser<>(__database.handler(), DBRowGeneratorSet<>());
143  }
DatabaseTable __database
the database itself
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
iterator handler() const
returns a new unsafe handler on the database
DBRowGeneratorParser * __parser
the parser used for reading the database
std::vector< Size > __modalities
the modalities of the variables

+ Here is the call graph for this function:

gum::learning::genericBNLearner::Database::Database ( Database &&  from)

move constructor

Definition at line 146 of file genericBNLearner.cpp.

References __database, __parser, and gum::learning::IDatabaseTable< T_DATA, ALLOC >::handler().

146  :
147  __database(std::move(from.__database)),
148  __modalities(std::move(from.__modalities)),
149  __name2nodeId(std::move(from.__name2nodeId)) {
150  // create the parser
151  __parser =
152  new DBRowGeneratorParser<>(__database.handler(), DBRowGeneratorSet<>());
153  }
DatabaseTable __database
the database itself
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
iterator handler() const
returns a new unsafe handler on the database
DBRowGeneratorParser * __parser
the parser used for reading the database
std::vector< Size > __modalities
the modalities of the variables

+ Here is the call graph for this function:

gum::learning::genericBNLearner::Database::~Database ( )

destructor

Definition at line 156 of file genericBNLearner.cpp.

References __parser, and operator=().

156 { delete __parser; }
DBRowGeneratorParser * __parser
the parser used for reading the database

+ Here is the call graph for this function:

template<typename GUM_SCALAR >
gum::learning::genericBNLearner::Database::Database ( const std::string &  filename,
const BayesNet< GUM_SCALAR > &  bn,
const std::vector< std::string > &  missing_symbols 
)

Definition at line 28 of file genericBNLearner_tpl.h.

References gum::learning::genericBNLearner::__checkFileName(), __database, __modalities, __name2nodeId, __parser, gum::DAGmodel::dag(), gum::learning::DatabaseTable< ALLOC >::domainSizes(), gum::learning::IDBInitializer< ALLOC >::fillDatabase(), GUM_ERROR, gum::learning::IDatabaseTable< T_DATA, ALLOC >::handler(), gum::learning::IDatabaseTable< T_DATA, ALLOC >::hasMissingValues(), gum::BijectionImplementation< T1, T2, Alloc, Gen >::insert(), gum::HashTable< Key, Val, Alloc >::insert(), gum::learning::DatabaseTable< ALLOC >::insertTranslator(), gum::Variable::name(), gum::learning::IDatabaseTable< T_DATA, ALLOC >::nbVariables(), gum::BayesNet< GUM_SCALAR >::variable(), gum::learning::DatabaseTable< ALLOC >::variable(), and gum::learning::IDBInitializer< ALLOC >::variableNames().

31  {
32  // assign to each column name in the database its position
34  DBInitializerFromCSV<> initializer(filename);
35  const auto& xvar_names = initializer.variableNames();
36  std::size_t nb_vars = xvar_names.size();
37  HashTable< std::string, std::size_t > var_names(nb_vars);
38  for (std::size_t i = std::size_t(0); i < nb_vars; ++i)
39  var_names.insert(xvar_names[i], i);
40 
41  // we use the bn to insert the translators into the database table
42  try {
43  for (auto node : bn.dag()) {
44  const Variable& var = bn.variable(node);
45  __database.insertTranslator(var, var_names[var.name()], missing_symbols);
46  }
47  } catch (NotFound&) {
48  GUM_ERROR(MissingVariableInDatabase,
49  "the database does not contain variable ");
50  }
51 
52  // fill the database
53  initializer.fillDatabase(__database);
54 
55  // check that the database does not contain any missing value
57  GUM_ERROR(MissingValueInDatabase,
58  "For the moment, the BNLearner is unable to cope "
59  "with missing values in databases");
60 
61  // get the domain sizes of the variables
62  for (auto dom : __database.domainSizes())
63  __modalities.push_back(dom);
64 
65  nb_vars = __database.nbVariables();
66  for (std::size_t i = std::size_t(0); i < nb_vars; ++i)
68 
69  // create the parser
70  __parser =
71  new DBRowGeneratorParser<>(__database.handler(), DBRowGeneratorSet<>());
72  }
void insert(const T1 &first, const T2 &second)
Inserts a new association in the gum::Bijection.
static void __checkFileName(const std::string &filename)
checks whether the extension of a CSV filename is correct
DatabaseTable __database
the database itself
bool hasMissingValues() const
indicates whether the database contains some missing values
DBVector< std::size_t > domainSizes() const
returns the domain sizes of all the variables in the database table
std::size_t nbVariables() const noexcept
returns the number of variables (columns) of the database
const Variable & variable(const std::size_t k, const bool k_is_input_col=false) const
returns either the kth variable of the database table or that corresponding to the kth column of the ...
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
iterator handler() const
returns a new unsafe handler on the database
DBRowGeneratorParser * __parser
the parser used for reading the database
void insertTranslator(const DBTranslator< ALLOC > &translator, const std::size_t input_column)
insert a new translator into the database table
std::vector< Size > __modalities
the modalities of the variables
const std::string & name() const
returns the name of the variable
#define GUM_ERROR(type, msg)
Definition: exceptions.h:66

+ Here is the call graph for this function:

template<typename GUM_SCALAR >
gum::learning::genericBNLearner::Database::Database ( const std::string &  filename,
Database score_database,
const BayesNet< GUM_SCALAR > &  bn,
const std::vector< std::string > &  missing_symbols 
)

Definition at line 76 of file genericBNLearner_tpl.h.

80  :
81  __database(genericBNLearner::__readFile(filename, bn, missing_symbols)) {}
DatabaseTable __database
the database itself
static DatabaseTable __readFile(const std::string &filename, const std::vector< std::string > &missing_symbols)
reads a file and returns a databaseVectInRam

Member Function Documentation

template<typename GUM_SCALAR >
BayesNet< GUM_SCALAR > gum::learning::genericBNLearner::Database::__BNVars ( ) const
private

Definition at line 85 of file genericBNLearner_tpl.h.

References __database, gum::BayesNet< GUM_SCALAR >::add(), gum::learning::IDatabaseTable< T_DATA, ALLOC >::nbVariables(), and gum::learning::DatabaseTable< ALLOC >::variable().

85  {
86  BayesNet< GUM_SCALAR > bn;
87  const std::size_t nb_vars = __database.nbVariables();
88  for (std::size_t i = 0; i < nb_vars; ++i) {
89  const DiscreteVariable& var =
90  dynamic_cast< const DiscreteVariable& >(__database.variable(i));
91  bn.add(var);
92  }
93  return bn;
94  }
DatabaseTable __database
the database itself
std::size_t nbVariables() const noexcept
returns the number of variables (columns) of the database
const Variable & variable(const std::size_t k, const bool k_is_input_col=false) const
returns either the kth variable of the database table or that corresponding to the kth column of the ...

+ Here is the call graph for this function:

INLINE const DatabaseTable & gum::learning::genericBNLearner::Database::databaseTable ( ) const

returns the internal database table

Definition at line 71 of file genericBNLearner_inl.h.

References __database.

71  {
72  return __database;
73  }
DatabaseTable __database
the database itself
INLINE NodeId gum::learning::genericBNLearner::Database::idFromName ( const std::string &  var_name) const

returns the node id corresponding to a variable name

Definition at line 54 of file genericBNLearner_inl.h.

References __name2nodeId, GUM_ERROR, and gum::BijectionImplementation< T1, T2, Alloc, Gen >::second().

Referenced by gum::learning::genericBNLearner::addForbiddenArc(), gum::learning::genericBNLearner::addMandatoryArc(), gum::learning::genericBNLearner::eraseForbiddenArc(), gum::learning::genericBNLearner::eraseMandatoryArc(), and gum::learning::genericBNLearner::idFromName().

54  {
55  try {
56  return __name2nodeId.second(const_cast< std::string& >(var_name));
57  } catch (gum::NotFound) {
58  GUM_ERROR(MissingVariableInDatabase, "for variable " << var_name);
59  }
60  }
const T2 & second(const T1 &first) const
Returns the second value of a pair given its first value.
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
#define GUM_ERROR(type, msg)
Definition: exceptions.h:66

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

INLINE const std::vector< std::string > & gum::learning::genericBNLearner::Database::missingSymbols ( ) const

returns the set of missing symbols taken into account

Definition at line 78 of file genericBNLearner_inl.h.

References __database, and gum::learning::IDatabaseTable< T_DATA, ALLOC >::missingSymbols().

Referenced by gum::learning::genericBNLearner::__createApriori().

78  {
79  return __database.missingSymbols();
80  }
DatabaseTable __database
the database itself
const DBVector< std::string > & missingSymbols() const
returns the set of missing symbols

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

INLINE std::vector< Size > & gum::learning::genericBNLearner::Database::modalities ( )
noexcept
INLINE const std::string & gum::learning::genericBNLearner::Database::nameFromId ( NodeId  id) const

returns the variable name corresponding to a given node id

Definition at line 64 of file genericBNLearner_inl.h.

References __name2nodeId, and gum::BijectionImplementation< T1, T2, Alloc, Gen >::first().

Referenced by gum::learning::genericBNLearner::nameFromId().

64  {
65  return __name2nodeId.first(id);
66  }
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
const T1 & first(const T2 &second) const
Returns the first value of a pair given its second value.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

INLINE const std::vector< std::string > & gum::learning::genericBNLearner::Database::names ( ) const
noexcept

returns the names of the variables in the database

Definition at line 48 of file genericBNLearner_inl.h.

References __database, and gum::learning::IDatabaseTable< T_DATA, ALLOC >::variableNames().

Referenced by gum::learning::genericBNLearner::names().

48  {
49  return __database.variableNames();
50  }
DatabaseTable __database
the database itself
const DBVector< std::string > & variableNames() const noexcept
returns the variable names for all the columns of the database

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

genericBNLearner::Database & gum::learning::genericBNLearner::Database::operator= ( const Database from)

copy operator

Definition at line 159 of file genericBNLearner.cpp.

References __database, __modalities, __name2nodeId, __parser, and gum::learning::IDatabaseTable< T_DATA, ALLOC >::handler().

Referenced by ~Database().

159  {
160  if (this != &from) {
161  delete __parser;
162  __database = from.__database;
163  __modalities = from.__modalities;
164  __name2nodeId = from.__name2nodeId;
165 
166  // create the parser
167  __parser =
168  new DBRowGeneratorParser<>(__database.handler(), DBRowGeneratorSet<>());
169  }
170 
171  return *this;
172  }
DatabaseTable __database
the database itself
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
iterator handler() const
returns a new unsafe handler on the database
DBRowGeneratorParser * __parser
the parser used for reading the database
std::vector< Size > __modalities
the modalities of the variables

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

genericBNLearner::Database & gum::learning::genericBNLearner::Database::operator= ( Database &&  from)

move operator

Definition at line 175 of file genericBNLearner.cpp.

References __database, __modalities, __name2nodeId, __parser, and gum::learning::IDatabaseTable< T_DATA, ALLOC >::handler().

175  {
176  if (this != &from) {
177  delete __parser;
178  __database = std::move(from.__database);
179  __modalities = std::move(from.__modalities);
180  __name2nodeId = std::move(from.__name2nodeId);
181 
182  // create the parser
183  __parser =
184  new DBRowGeneratorParser<>(__database.handler(), DBRowGeneratorSet<>());
185  }
186 
187  return *this;
188  }
DatabaseTable __database
the database itself
Bijection< std::string, NodeId > __name2nodeId
a hashtable assigning to each variable name its NodeId
iterator handler() const
returns a new unsafe handler on the database
DBRowGeneratorParser * __parser
the parser used for reading the database
std::vector< Size > __modalities
the modalities of the variables

+ Here is the call graph for this function:

INLINE DBRowGeneratorParser & gum::learning::genericBNLearner::Database::parser ( )

returns the parser for the database

Definition at line 37 of file genericBNLearner_inl.h.

References __parser.

Referenced by gum::learning::genericBNLearner::__createApriori(), gum::learning::genericBNLearner::__createParamEstimator(), gum::learning::genericBNLearner::__createScore(), gum::learning::genericBNLearner::useMDL(), gum::learning::genericBNLearner::useNML(), and gum::learning::genericBNLearner::useNoCorr().

37  {
38  return *__parser;
39  }
DBRowGeneratorParser * __parser
the parser used for reading the database

+ Here is the caller graph for this function:

Member Data Documentation

DatabaseTable gum::learning::genericBNLearner::Database::__database
protected

the database itself

Definition at line 244 of file genericBNLearner.h.

Referenced by __BNVars(), Database(), databaseTable(), missingSymbols(), names(), and operator=().

Size gum::learning::genericBNLearner::Database::__max_threads_number {1}
protected

the max number of threads authorized

Definition at line 259 of file genericBNLearner.h.

Size gum::learning::genericBNLearner::Database::__min_nb_rows_per_thread {100}
protected

the minimal number of rows to parse (on average) by thread

Definition at line 263 of file genericBNLearner.h.

std::vector< Size > gum::learning::genericBNLearner::Database::__modalities
protected

the modalities of the variables

Definition at line 250 of file genericBNLearner.h.

Referenced by Database(), modalities(), and operator=().

Bijection< std::string, NodeId > gum::learning::genericBNLearner::Database::__name2nodeId
protected

a hashtable assigning to each variable name its NodeId

Definition at line 253 of file genericBNLearner.h.

Referenced by Database(), idFromName(), nameFromId(), and operator=().

DBRowGeneratorParser* gum::learning::genericBNLearner::Database::__parser {nullptr}
protected

the parser used for reading the database

Definition at line 247 of file genericBNLearner.h.

Referenced by Database(), operator=(), parser(), and ~Database().


The documentation for this class was generated from the following files: