aGrUM
0.13.2

The class that computes countings of observations from the database. More...
#include <recordCounter.h>
Public Member Functions  
Constructors / Destructors  
template<typename RowGeneratorParser >  
RecordCounter (const RowGeneratorParser &parser, const std::vector< Size > &var_modalities, Size min_range=0, Size max_range=std::numeric_limits< Size >::max())  
default constructor More...  
RecordCounter (const RecordCounter< IdSetAlloc, CountAlloc > &from)  
copy constructor More...  
RecordCounter (RecordCounter< IdSetAlloc, CountAlloc > &&from)  
move constructor More...  
~RecordCounter ()  
destructor More...  
Accessors / Modifiers  
Idx  addNodeSet (const std::vector< Idx, IdSetAlloc > &ids) 
add a new nodeset to count More...  
Size  DBParsedSize () noexcept 
returns the size of the database taken into account by the counter More...  
void  setRange (Size min_range, Size max_range) 
sets the range of records taken into account by the counter More...  
void  countOnSubDatabase () 
performs countings from the database by cutting it into several pieces More...  
void  countSubsets () 
performs the countings of the ids' subsets from those of their supersets More...  
void  count () 
perform the countings of all the sets of ids More...  
const std::vector< double, CountAlloc > &  getCounts (Idx idset) const noexcept 
returns the counts performed for a given idSet More...  
void  clearNodeSets () noexcept 
resets the counter, i.e., remove all its sets of ids and counting vectors More...  
const std::vector< Size > &  modalities () const 
returns the modalities of the variables in the database More...  
void  setMaxNbThreads (Size nb) noexcept 
sets the maximum number of threads used to perform countings More...  
Friends  
template<typename I , typename C >  
class  Counter 
The class that computes countings of observations from the database.
This class is the one to be called by scores and independence tests to compute countings of observations from tabular databases. It calls as many RecordCounterThreads as possible to do the job in parallel.
Definition at line 247 of file recordCounter.h.

private 
the possible states of a set of ids
Enumerator  

NOT_SUBSET  
STRICT_SUBSET  
COPY_SET  
EMPTY_SET 
Definition at line 357 of file recordCounter.h.
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::RecordCounter  (  const RowGeneratorParser &  parser, 
const std::vector< Size > &  var_modalities,  
Size  min_range = 0 , 

Size  max_range = std::numeric_limits< Size >::max() 

) 
default constructor
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::RecordCounter  (  const RecordCounter< IdSetAlloc, CountAlloc > &  from  ) 
copy constructor
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::RecordCounter  (  RecordCounter< IdSetAlloc, CountAlloc > &&  from  ) 
move constructor
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::~RecordCounter  (  ) 
destructor

private 
determine which sets are subsets

private 

privatenoexcept 
returns the counting performed
Idx gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::addNodeSet  (  const std::vector< Idx, IdSetAlloc > &  ids  ) 
add a new nodeset to count

noexcept 
resets the counter, i.e., remove all its sets of ids and counting vectors
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::count  (  ) 
perform the countings of all the sets of ids
This method selects the most appropriate parallel counting method and performs it.
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::countOnSubDatabase  (  ) 
performs countings from the database by cutting it into several pieces
This method implements a parallel counting strategy which consists of cutting the database into a set of more or less equalsize pieces and to call one RecordCounterThread for each such piece. The latter then perform countings for all the sets of ids nonincluded into other sets (i.e., proper supersets). When all the database has been parsed, the countings are aggregated to result in countings over the whole database.
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::countSubsets  (  ) 
performs the countings of the ids' subsets from those of their supersets

noexcept 
returns the size of the database taken into account by the counter

noexcept 
returns the counts performed for a given idSet
const std::vector< Size >& gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::modalities  (  )  const 
returns the modalities of the variables in the database

noexcept 
sets the maximum number of threads used to perform countings
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::setRange  (  Size  min_range, 
Size  max_range  
) 
sets the range of records taken into account by the counter
min_range  he number of the first record to be taken into account during learning 
max_range  the number of the record after the last one taken into account 

friend 
Definition at line 326 of file recordCounter.h.

private 
a vector for computing the countings of the IdSets which are subsets
These countings are derived from the countings of the supersets
Definition at line 369 of file recordCounter.h.

private 
a hashtable associating to each IdSet its index in __set2thread_id
This table indicates for each distinct set its index in __set2thread_id or, equivalently, in __nodesets. By distinct set, we mean that, when several sets are identical, only the first one inserted by addNodeSet is put into the hashtable. This table is used as a helper in __computeSubsets to determine quickly the indices of supersets in __nodesets (the content of __idset2index is similar to that of __idsets, except that it is faster to search within hashtables pointer to idsets rather than idsets themselves.
Definition at line 380 of file recordCounter.h.

private 
the set of ordered vectors of ids + their indices in __nodesets
The goal of this structure is essentially to store the vectors of ordered ids as IdSet<> in such a way that the memory locations of these idsets never change, even if we add new IdSets (this feature is guaranteed by aGrUM's hashtables). In addition, it allows to easily detect identical IdSets (such detection allows for fast computations). As such, the indices stored as values in the hashtable are the indices in __nodesets ONLY of one copy of identical sets, the others are deduced from it. IdSets are used to quickly determine which sets are included into others.
Definition at line 341 of file recordCounter.h.

private 
the number of the record after the last one taken into account
Definition at line 417 of file recordCounter.h.

private 
the max number of threads authorized
Definition at line 406 of file recordCounter.h.

private 
the minimal number of rows to parse (on average) by thread
Definition at line 410 of file recordCounter.h.

private 
the number of the first record to be taken into account during learning
Definition at line 414 of file recordCounter.h.

private 
the modalities of the variables
Definition at line 329 of file recordCounter.h.

private 
the number of thread counter used by the last count ()
Definition at line 400 of file recordCounter.h.

private 
the vector of the unordered ids' vectors used to generate the idsets
When the user add nodes (i.e., vectors of ids), those are unordered and will be processed (counted) as such. However, in order to determine which vectors of ids are contained into other vectors, we create IdSets from them (those are ordered vectors that will enable fast subset computations)
Definition at line 349 of file recordCounter.h.

private 
a table associating to each IdSet its index in the threadRecordCounters
For the IdSets which are subsets of other IdSets, the index corresponds to that of its superset in __set2thread_id
Definition at line 387 of file recordCounter.h.

private 
a table indicating whether each IdSet is a subset of another idSet
Definition at line 365 of file recordCounter.h.

private 
a partial lattice indicating the relations between subsets and supersets
In this lattice, an arc X > Y indicates that X is a superset of Y. As such, we should perform countings of X before deducing those of Y.
Definition at line 393 of file recordCounter.h.

private 
the set of ThreadCounters
Definition at line 397 of file recordCounter.h.

private 
a table associating to each variable the IdSets that contain it
This table is used to quickly compute the IdSets that are contained in other IdSets
Definition at line 354 of file recordCounter.h.