![]() |
aGrUM
0.20.3
a C++ library for (probabilistic) graphical models
|
The class that computes countings of observations from the database. More...
#include <agrum/BN/learning/scores_and_tests/recordCounter.h>
Public Member Functions | |
Constructors / Destructors | |
RecordCounter (const DBRowGeneratorParser< ALLOC > &parser, const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > > &ranges, const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > &nodeId2columns=Bijection< NodeId, std::size_t, ALLOC< std::size_t > >(), const allocator_type &alloc=allocator_type()) | |
default constructor More... | |
RecordCounter (const DBRowGeneratorParser< ALLOC > &parser, const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > &nodeId2columns=Bijection< NodeId, std::size_t, ALLOC< std::size_t > >(), const allocator_type &alloc=allocator_type()) | |
default constructor More... | |
RecordCounter (const RecordCounter< ALLOC > &from) | |
copy constructor More... | |
RecordCounter (const RecordCounter< ALLOC > &from, const allocator_type &alloc) | |
copy constructor with a given allocator More... | |
RecordCounter (RecordCounter< ALLOC > &&from) | |
move constructor More... | |
RecordCounter (RecordCounter< ALLOC > &&from, const allocator_type &alloc) | |
move constructor with a given allocator More... | |
virtual RecordCounter< ALLOC > * | clone () const |
virtual copy constructor More... | |
virtual RecordCounter< ALLOC > * | clone (const allocator_type &alloc) const |
virtual copy constructor with a given allocator More... | |
virtual | ~RecordCounter () |
destructor More... | |
Operators | |
RecordCounter< ALLOC > & | operator= (const RecordCounter< ALLOC > &from) |
copy operator More... | |
RecordCounter< ALLOC > & | operator= (RecordCounter< ALLOC > &&from) |
move operator More... | |
Accessors / Modifiers | |
void | clear () |
clears all the last database-parsed countings from memory More... | |
void | setMaxNbThreads (const std::size_t nb) const |
changes the max number of threads used to parse the database More... | |
std::size_t | nbThreads () const |
returns the number of threads used to parse the database More... | |
void | setMinNbRowsPerThread (const std::size_t nb) const |
changes the number min of rows a thread should process in a multithreading context More... | |
std::size_t | minNbRowsPerThread () const |
returns the minimum of rows that each thread should process More... | |
const std::vector< double, ALLOC< double > > & | counts (const IdCondSet< ALLOC > &ids, const bool check_discrete_vars=false) |
returns the counts over all the variables in an IdCondSet More... | |
template<template< typename > class XALLOC> | |
void | setRanges (const std::vector< std::pair< std::size_t, std::size_t >, XALLOC< std::pair< std::size_t, std::size_t > > > &new_ranges) |
sets new ranges to perform the countings More... | |
void | clearRanges () |
reset the ranges to the one range corresponding to the whole database More... | |
const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > > & | ranges () const |
returns the current ranges More... | |
template<typename GUM_SCALAR > | |
void | setBayesNet (const BayesNet< GUM_SCALAR > &new_bn) |
assign a new Bayes net to all the counter's generators depending on a BN More... | |
allocator_type | getAllocator () const |
returns the allocator used More... | |
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > & | nodeId2Columns () const |
returns the mapping from ids to column positions in the database More... | |
const DatabaseTable< ALLOC > & | database () const |
returns the database on which we perform the counts More... | |
Public Types | |
using | allocator_type = ALLOC< NodeId > |
type for the allocators passed in arguments of methods More... | |
The class that computes countings of observations from the database.
This class is the one to be called by scores and independence tests to compute the countings of observations from tabular datasets they need. The countings are performed the following way: when asked for the countings over a set X = {X_1,...,X_n} of variables, the RecordCounter first checks whether it already contains some countings over a set Y of variables containing X. If this is the case, then it extracts from the countings over Y those over X (this is usually way faster than determining the countings by parsing the database). Otherwise, it determines the countings over X by parsing in a parallel way the database. Only the result of the last database-parsed countings is available for the subset counting determination. As an example, if we create a RecordCounter and ask it the countings over {A,B,C}, it will parse the database and provide the countings. Then, if we ask it countings over B, it will use the table over {A,B,C} to produce the countings we look for. Then, asking for countings over {A,C} will be performed the same way. Now, asking countings over {B,C,D} will require another database parsing. Finally, if we ask for countings over A, a new database parsing will be performed because only the countings over {B,C,D} are now contained in the RecordCounter.
Definition at line 112 of file recordCounter.h.
using gum::learning::RecordCounter< ALLOC >::allocator_type = ALLOC< NodeId > |
type for the allocators passed in arguments of methods
Definition at line 115 of file recordCounter.h.
gum::learning::RecordCounter< ALLOC >::RecordCounter | ( | const DBRowGeneratorParser< ALLOC > & | parser, |
const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > > & | ranges, | ||
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > & | nodeId2columns = Bijection< NodeId, std::size_t, ALLOC< std::size_t > >() , |
||
const allocator_type & | alloc = allocator_type() |
||
) |
default constructor
parser | the parser used to parse the database |
ranges | a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The countings are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database. |
nodeId2Columns | a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable. |
alloc | the allocator used to allocate the structures within the RecordCounter. |
gum::learning::RecordCounter< ALLOC >::RecordCounter | ( | const DBRowGeneratorParser< ALLOC > & | parser, |
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > & | nodeId2columns = Bijection< NodeId, std::size_t, ALLOC< std::size_t > >() , |
||
const allocator_type & | alloc = allocator_type() |
||
) |
default constructor
parser | the parser used to parse the database |
nodeId2Columns | a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable. |
alloc | the allocator used to allocate the structures within the RecordCounter. |
gum::learning::RecordCounter< ALLOC >::RecordCounter | ( | const RecordCounter< ALLOC > & | from | ) |
copy constructor
gum::learning::RecordCounter< ALLOC >::RecordCounter | ( | const RecordCounter< ALLOC > & | from, |
const allocator_type & | alloc | ||
) |
copy constructor with a given allocator
gum::learning::RecordCounter< ALLOC >::RecordCounter | ( | RecordCounter< ALLOC > && | from | ) |
move constructor
gum::learning::RecordCounter< ALLOC >::RecordCounter | ( | RecordCounter< ALLOC > && | from, |
const allocator_type & | alloc | ||
) |
move constructor with a given allocator
|
virtual |
destructor
void gum::learning::RecordCounter< ALLOC >::clear | ( | ) |
clears all the last database-parsed countings from memory
void gum::learning::RecordCounter< ALLOC >::clearRanges | ( | ) |
reset the ranges to the one range corresponding to the whole database
|
virtual |
virtual copy constructor
|
virtual |
virtual copy constructor with a given allocator
const std::vector< double, ALLOC< double > >& gum::learning::RecordCounter< ALLOC >::counts | ( | const IdCondSet< ALLOC > & | ids, |
const bool | check_discrete_vars = false |
||
) |
returns the counts over all the variables in an IdCondSet
ids | the idset of the variables over which we perform countings. |
check_discrete_vars | The record counter can only produce correct results on sets of discrete variables. By default, the method does not check whether the variables corresponding to the IdCondSet are actually discrete. If check_discrete_vars is set to true, then this check is performed before computing the counting vector. In this case, if a variable is not discrete, a TypeError exception is raised. |
const DatabaseTable< ALLOC >& gum::learning::RecordCounter< ALLOC >::database | ( | ) | const |
returns the database on which we perform the counts
allocator_type gum::learning::RecordCounter< ALLOC >::getAllocator | ( | ) | const |
returns the allocator used
std::size_t gum::learning::RecordCounter< ALLOC >::minNbRowsPerThread | ( | ) | const |
returns the minimum of rows that each thread should process
std::size_t gum::learning::RecordCounter< ALLOC >::nbThreads | ( | ) | const |
returns the number of threads used to parse the database
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > >& gum::learning::RecordCounter< ALLOC >::nodeId2Columns | ( | ) | const |
returns the mapping from ids to column positions in the database
RecordCounter< ALLOC >& gum::learning::RecordCounter< ALLOC >::operator= | ( | const RecordCounter< ALLOC > & | from | ) |
copy operator
RecordCounter< ALLOC >& gum::learning::RecordCounter< ALLOC >::operator= | ( | RecordCounter< ALLOC > && | from | ) |
move operator
const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > >& gum::learning::RecordCounter< ALLOC >::ranges | ( | ) | const |
returns the current ranges
void gum::learning::RecordCounter< ALLOC >::setBayesNet | ( | const BayesNet< GUM_SCALAR > & | new_bn | ) |
assign a new Bayes net to all the counter's generators depending on a BN
Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.
void gum::learning::RecordCounter< ALLOC >::setMaxNbThreads | ( | const std::size_t | nb | ) | const |
changes the max number of threads used to parse the database
void gum::learning::RecordCounter< ALLOC >::setMinNbRowsPerThread | ( | const std::size_t | nb | ) | const |
changes the number min of rows a thread should process in a multithreading context
When Method counts executes several threads to perform countings on the rows of the database, the MinNbRowsPerThread indicates how many rows each thread should at least process. This is used to compute the number of threads actually run. This number is equal to the min between the max number of threads allowed and the number of records in the database divided by nb.
void gum::learning::RecordCounter< ALLOC >::setRanges | ( | const std::vector< std::pair< std::size_t, std::size_t >, XALLOC< std::pair< std::size_t, std::size_t > > > & | new_ranges | ) |
sets new ranges to perform the countings
ranges | a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The countings are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database. |