aGrUM  0.20.3
a C++ library for (probabilistic) graphical models
gum::learning::RecordCounter< ALLOC > Class Template Reference

The class that computes countings of observations from the database. More...

#include <agrum/BN/learning/scores_and_tests/recordCounter.h>

Public Member Functions

Constructors / Destructors
 RecordCounter (const DBRowGeneratorParser< ALLOC > &parser, const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > > &ranges, const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > &nodeId2columns=Bijection< NodeId, std::size_t, ALLOC< std::size_t > >(), const allocator_type &alloc=allocator_type())
 default constructor More...
 
 RecordCounter (const DBRowGeneratorParser< ALLOC > &parser, const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > &nodeId2columns=Bijection< NodeId, std::size_t, ALLOC< std::size_t > >(), const allocator_type &alloc=allocator_type())
 default constructor More...
 
 RecordCounter (const RecordCounter< ALLOC > &from)
 copy constructor More...
 
 RecordCounter (const RecordCounter< ALLOC > &from, const allocator_type &alloc)
 copy constructor with a given allocator More...
 
 RecordCounter (RecordCounter< ALLOC > &&from)
 move constructor More...
 
 RecordCounter (RecordCounter< ALLOC > &&from, const allocator_type &alloc)
 move constructor with a given allocator More...
 
virtual RecordCounter< ALLOC > * clone () const
 virtual copy constructor More...
 
virtual RecordCounter< ALLOC > * clone (const allocator_type &alloc) const
 virtual copy constructor with a given allocator More...
 
virtual ~RecordCounter ()
 destructor More...
 
Operators
RecordCounter< ALLOC > & operator= (const RecordCounter< ALLOC > &from)
 copy operator More...
 
RecordCounter< ALLOC > & operator= (RecordCounter< ALLOC > &&from)
 move operator More...
 
Accessors / Modifiers
void clear ()
 clears all the last database-parsed countings from memory More...
 
void setMaxNbThreads (const std::size_t nb) const
 changes the max number of threads used to parse the database More...
 
std::size_t nbThreads () const
 returns the number of threads used to parse the database More...
 
void setMinNbRowsPerThread (const std::size_t nb) const
 changes the number min of rows a thread should process in a multithreading context More...
 
std::size_t minNbRowsPerThread () const
 returns the minimum of rows that each thread should process More...
 
const std::vector< double, ALLOC< double > > & counts (const IdCondSet< ALLOC > &ids, const bool check_discrete_vars=false)
 returns the counts over all the variables in an IdCondSet More...
 
template<template< typename > class XALLOC>
void setRanges (const std::vector< std::pair< std::size_t, std::size_t >, XALLOC< std::pair< std::size_t, std::size_t > > > &new_ranges)
 sets new ranges to perform the countings More...
 
void clearRanges ()
 reset the ranges to the one range corresponding to the whole database More...
 
const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > > & ranges () const
 returns the current ranges More...
 
template<typename GUM_SCALAR >
void setBayesNet (const BayesNet< GUM_SCALAR > &new_bn)
 assign a new Bayes net to all the counter's generators depending on a BN More...
 
allocator_type getAllocator () const
 returns the allocator used More...
 
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > & nodeId2Columns () const
 returns the mapping from ids to column positions in the database More...
 
const DatabaseTable< ALLOC > & database () const
 returns the database on which we perform the counts More...
 

Public Types

using allocator_type = ALLOC< NodeId >
 type for the allocators passed in arguments of methods More...
 

Detailed Description

template<template< typename > class ALLOC = std::allocator>
class gum::learning::RecordCounter< ALLOC >

The class that computes countings of observations from the database.

This class is the one to be called by scores and independence tests to compute the countings of observations from tabular datasets they need. The countings are performed the following way: when asked for the countings over a set X = {X_1,...,X_n} of variables, the RecordCounter first checks whether it already contains some countings over a set Y of variables containing X. If this is the case, then it extracts from the countings over Y those over X (this is usually way faster than determining the countings by parsing the database). Otherwise, it determines the countings over X by parsing in a parallel way the database. Only the result of the last database-parsed countings is available for the subset counting determination. As an example, if we create a RecordCounter and ask it the countings over {A,B,C}, it will parse the database and provide the countings. Then, if we ask it countings over B, it will use the table over {A,B,C} to produce the countings we look for. Then, asking for countings over {A,C} will be performed the same way. Now, asking countings over {B,C,D} will require another database parsing. Finally, if we ask for countings over A, a new database parsing will be performed because only the countings over {B,C,D} are now contained in the RecordCounter.

Here is an example of how to use the RecordCounter class:
// here, write the code to construct your database, e.g.:
gum::learning::DBInitializerFromCSV<> initializer( "file.csv" );
const auto& var_names = initializer.variableNames();
const std::size_t nb_vars = var_names.size();
for (std::size_t i = 0; i < nb_vars; ++i) {
translator_set.insertTranslator(translator, i);
}
// create the parser of the database
gum::learning::DBRowGeneratorParser<> parser(database.handler(), genset);
// create the record counter
// get the counts:
gum::learning::IdCondSet<> ids ( 0, gum::vector<gum::NodeId> {2,1} );
const std::vector< double >& counts1 = counter.counts ( ids );
// change the rows from which we compute the counts:
// they should now be made on rows [500,600) U [1050,1125) U [100,150)
std::vector<std::pair<std::size_t,std::size_t>> new_ranges
{ std::pair<std::size_t,std::size_t>(500,600),
std::pair<std::size_t,std::size_t>(1050,1125),
std::pair<std::size_t,std::size_t>(100,150) };
counter.setRanges ( new_ranges );
const std::vector< double >& counts2 = counter.counts ( ids );

Definition at line 112 of file recordCounter.h.

Member Typedef Documentation

◆ allocator_type

template<template< typename > class ALLOC = std::allocator>
using gum::learning::RecordCounter< ALLOC >::allocator_type = ALLOC< NodeId >

type for the allocators passed in arguments of methods

Definition at line 115 of file recordCounter.h.

Constructor & Destructor Documentation

◆ RecordCounter() [1/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::RecordCounter< ALLOC >::RecordCounter ( const DBRowGeneratorParser< ALLOC > &  parser,
const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > > &  ranges,
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > &  nodeId2columns = BijectionNodeId, std::size_t, ALLOC< std::size_t > >(),
const allocator_type alloc = allocator_type() 
)

default constructor

Parameters
parserthe parser used to parse the database
rangesa set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The countings are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.
nodeId2Columnsa mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.
allocthe allocator used to allocate the structures within the RecordCounter.
Warning
If nodeId2columns is not empty, then only the counts over the ids belonging to this bijection can be computed: applying method counts() over other ids will raise exception NotFound.

◆ RecordCounter() [2/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::RecordCounter< ALLOC >::RecordCounter ( const DBRowGeneratorParser< ALLOC > &  parser,
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > > &  nodeId2columns = BijectionNodeId, std::size_t, ALLOC< std::size_t > >(),
const allocator_type alloc = allocator_type() 
)

default constructor

Parameters
parserthe parser used to parse the database
nodeId2Columnsa mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.
allocthe allocator used to allocate the structures within the RecordCounter.
Warning
If nodeId2columns is not empty, then only the counts over the ids belonging to this bijection can be computed: applying method counts() over other ids will raise exception NotFound.

◆ RecordCounter() [3/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::RecordCounter< ALLOC >::RecordCounter ( const RecordCounter< ALLOC > &  from)

copy constructor

◆ RecordCounter() [4/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::RecordCounter< ALLOC >::RecordCounter ( const RecordCounter< ALLOC > &  from,
const allocator_type alloc 
)

copy constructor with a given allocator

◆ RecordCounter() [5/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::RecordCounter< ALLOC >::RecordCounter ( RecordCounter< ALLOC > &&  from)

move constructor

◆ RecordCounter() [6/6]

template<template< typename > class ALLOC = std::allocator>
gum::learning::RecordCounter< ALLOC >::RecordCounter ( RecordCounter< ALLOC > &&  from,
const allocator_type alloc 
)

move constructor with a given allocator

◆ ~RecordCounter()

template<template< typename > class ALLOC = std::allocator>
virtual gum::learning::RecordCounter< ALLOC >::~RecordCounter ( )
virtual

destructor

Member Function Documentation

◆ clear()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::RecordCounter< ALLOC >::clear ( )

clears all the last database-parsed countings from memory

◆ clearRanges()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::RecordCounter< ALLOC >::clearRanges ( )

reset the ranges to the one range corresponding to the whole database

◆ clone() [1/2]

template<template< typename > class ALLOC = std::allocator>
virtual RecordCounter< ALLOC >* gum::learning::RecordCounter< ALLOC >::clone ( ) const
virtual

virtual copy constructor

◆ clone() [2/2]

template<template< typename > class ALLOC = std::allocator>
virtual RecordCounter< ALLOC >* gum::learning::RecordCounter< ALLOC >::clone ( const allocator_type alloc) const
virtual

virtual copy constructor with a given allocator

◆ counts()

template<template< typename > class ALLOC = std::allocator>
const std::vector< double, ALLOC< double > >& gum::learning::RecordCounter< ALLOC >::counts ( const IdCondSet< ALLOC > &  ids,
const bool  check_discrete_vars = false 
)

returns the counts over all the variables in an IdCondSet

Parameters
idsthe idset of the variables over which we perform countings.
check_discrete_varsThe record counter can only produce correct results on sets of discrete variables. By default, the method does not check whether the variables corresponding to the IdCondSet are actually discrete. If check_discrete_vars is set to true, then this check is performed before computing the counting vector. In this case, if a variable is not discrete, a TypeError exception is raised.
Returns
a vector containing the multidimensional contingency table over all the variables corresponding to the ids passed in argument (both at the left hand side and right hand side of the conditioning bar of the IdCondSet). The first dimension is that of the first variable in the IdCondSet, i.e., when its value increases by 1, the offset in the output vector also increases by 1. The second dimension is that of the second variable in the IdCondSet, i.e., when its value increases by 1, the offset in the ouput vector increases by the domain size of the first variable. For the third variable, the offset corresponds to the product of the domain sizes of the first two variables, and so on.
Warning
The vector returned by the function may differ from one call to another. So, care must be taken. E,g. a code like:
const std::vector< double, ALLOC<double> >&
counts = counter.counts(ids);
counts = counter.counts(other_ids);
may be erroneous because the two calls to method counts() may return references to different vectors. The correct way of using method counts() is always to call it declaring a new reference variable:
const std::vector< double, ALLOC<double> >& counts =
counter.counts(ids);
const std::vector< double, ALLOC<double> >& other_counts =
counter.counts(other_ids);
Exceptions
TypeErroris raised if check_discrete_vars is set to true (i.e., we check that all variables in the IdCondSet are discrete) and if at least one variable is not of a discrete nature.

◆ database()

template<template< typename > class ALLOC = std::allocator>
const DatabaseTable< ALLOC >& gum::learning::RecordCounter< ALLOC >::database ( ) const

returns the database on which we perform the counts

◆ getAllocator()

template<template< typename > class ALLOC = std::allocator>
allocator_type gum::learning::RecordCounter< ALLOC >::getAllocator ( ) const

returns the allocator used

◆ minNbRowsPerThread()

template<template< typename > class ALLOC = std::allocator>
std::size_t gum::learning::RecordCounter< ALLOC >::minNbRowsPerThread ( ) const

returns the minimum of rows that each thread should process

◆ nbThreads()

template<template< typename > class ALLOC = std::allocator>
std::size_t gum::learning::RecordCounter< ALLOC >::nbThreads ( ) const

returns the number of threads used to parse the database

◆ nodeId2Columns()

template<template< typename > class ALLOC = std::allocator>
const Bijection< NodeId, std::size_t, ALLOC< std::size_t > >& gum::learning::RecordCounter< ALLOC >::nodeId2Columns ( ) const

returns the mapping from ids to column positions in the database

Warning
An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

◆ operator=() [1/2]

template<template< typename > class ALLOC = std::allocator>
RecordCounter< ALLOC >& gum::learning::RecordCounter< ALLOC >::operator= ( const RecordCounter< ALLOC > &  from)

copy operator

◆ operator=() [2/2]

template<template< typename > class ALLOC = std::allocator>
RecordCounter< ALLOC >& gum::learning::RecordCounter< ALLOC >::operator= ( RecordCounter< ALLOC > &&  from)

move operator

◆ ranges()

template<template< typename > class ALLOC = std::allocator>
const std::vector< std::pair< std::size_t, std::size_t >, ALLOC< std::pair< std::size_t, std::size_t > > >& gum::learning::RecordCounter< ALLOC >::ranges ( ) const

returns the current ranges

◆ setBayesNet()

template<template< typename > class ALLOC = std::allocator>
template<typename GUM_SCALAR >
void gum::learning::RecordCounter< ALLOC >::setBayesNet ( const BayesNet< GUM_SCALAR > &  new_bn)

assign a new Bayes net to all the counter's generators depending on a BN

Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.

◆ setMaxNbThreads()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::RecordCounter< ALLOC >::setMaxNbThreads ( const std::size_t  nb) const

changes the max number of threads used to parse the database

◆ setMinNbRowsPerThread()

template<template< typename > class ALLOC = std::allocator>
void gum::learning::RecordCounter< ALLOC >::setMinNbRowsPerThread ( const std::size_t  nb) const

changes the number min of rows a thread should process in a multithreading context

When Method counts executes several threads to perform countings on the rows of the database, the MinNbRowsPerThread indicates how many rows each thread should at least process. This is used to compute the number of threads actually run. This number is equal to the min between the max number of threads allowed and the number of records in the database divided by nb.

◆ setRanges()

template<template< typename > class ALLOC = std::allocator>
template<template< typename > class XALLOC>
void gum::learning::RecordCounter< ALLOC >::setRanges ( const std::vector< std::pair< std::size_t, std::size_t >, XALLOC< std::pair< std::size_t, std::size_t > > > &  new_ranges)

sets new ranges to perform the countings

Parameters
rangesa set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The countings are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.

The documentation for this class was generated from the following file: