aGrUM  0.13.2
gum::learning::RecordCounter< IdSetAlloc, CountAlloc > Class Template Reference

The class that computes countings of observations from the database. More...

#include <recordCounter.h>

+ Collaboration diagram for gum::learning::RecordCounter< IdSetAlloc, CountAlloc >:

Public Member Functions

Constructors / Destructors
template<typename RowGeneratorParser >
 RecordCounter (const RowGeneratorParser &parser, const std::vector< Size > &var_modalities, Size min_range=0, Size max_range=std::numeric_limits< Size >::max())
 default constructor More...
 
 RecordCounter (const RecordCounter< IdSetAlloc, CountAlloc > &from)
 copy constructor More...
 
 RecordCounter (RecordCounter< IdSetAlloc, CountAlloc > &&from)
 move constructor More...
 
 ~RecordCounter ()
 destructor More...
 
Accessors / Modifiers
Idx addNodeSet (const std::vector< Idx, IdSetAlloc > &ids)
 add a new nodeset to count More...
 
Size DBParsedSize () noexcept
 returns the size of the database taken into account by the counter More...
 
void setRange (Size min_range, Size max_range)
 sets the range of records taken into account by the counter More...
 
void countOnSubDatabase ()
 performs countings from the database by cutting it into several pieces More...
 
void countSubsets ()
 performs the countings of the ids' subsets from those of their supersets More...
 
void count ()
 perform the countings of all the sets of ids More...
 
const std::vector< double, CountAlloc > & getCounts (Idx idset) const noexcept
 returns the counts performed for a given idSet More...
 
void clearNodeSets () noexcept
 resets the counter, i.e., remove all its sets of ids and counting vectors More...
 
const std::vector< Size > & modalities () const
 returns the modalities of the variables in the database More...
 
void setMaxNbThreads (Size nb) noexcept
 sets the maximum number of threads used to perform countings More...
 

Friends

template<typename I , typename C >
class Counter
 

Detailed Description

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
class gum::learning::RecordCounter< IdSetAlloc, CountAlloc >

The class that computes countings of observations from the database.

This class is the one to be called by scores and independence tests to compute countings of observations from tabular databases. It calls as many RecordCounterThreads as possible to do the job in parallel.

Definition at line 247 of file recordCounter.h.

Member Enumeration Documentation

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
enum gum::learning::RecordCounter::SetState
private

the possible states of a set of ids

Enumerator
NOT_SUBSET 
STRICT_SUBSET 
COPY_SET 
EMPTY_SET 

Definition at line 357 of file recordCounter.h.

357  {
358  NOT_SUBSET, // this is a proper nonempty superset
359  STRICT_SUBSET, // the set is included into another one
360  COPY_SET, // this set is a copy of another one
361  EMPTY_SET // the set is empty
362  };

Constructor & Destructor Documentation

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
template<typename RowGeneratorParser >
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::RecordCounter ( const RowGeneratorParser &  parser,
const std::vector< Size > &  var_modalities,
Size  min_range = 0,
Size  max_range = std::numeric_limits< Size >::max() 
)

default constructor

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::RecordCounter ( const RecordCounter< IdSetAlloc, CountAlloc > &  from)

copy constructor

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::RecordCounter ( RecordCounter< IdSetAlloc, CountAlloc > &&  from)

move constructor

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::~RecordCounter ( )

destructor

Member Function Documentation

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__computeSubsets ( )
private

determine which sets are subsets

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__countOneSubset ( Idx  i)
private
template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
std::vector< std::vector< double, CountAlloc > >& gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__getCounts ( )
privatenoexcept

returns the counting performed

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Idx gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::addNodeSet ( const std::vector< Idx, IdSetAlloc > &  ids)

add a new nodeset to count

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::clearNodeSets ( )
noexcept

resets the counter, i.e., remove all its sets of ids and counting vectors

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::count ( )

perform the countings of all the sets of ids

This method selects the most appropriate parallel counting method and performs it.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::countOnSubDatabase ( )

performs countings from the database by cutting it into several pieces

This method implements a parallel counting strategy which consists of cutting the database into a set of more or less equal-size pieces and to call one RecordCounterThread for each such piece. The latter then perform countings for all the sets of ids non-included into other sets (i.e., proper supersets). When all the database has been parsed, the countings are aggregated to result in countings over the whole database.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::countSubsets ( )

performs the countings of the ids' subsets from those of their supersets

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Size gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::DBParsedSize ( )
noexcept

returns the size of the database taken into account by the counter

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
const std::vector< double, CountAlloc >& gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::getCounts ( Idx  idset) const
noexcept

returns the counts performed for a given idSet

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
const std::vector< Size >& gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::modalities ( ) const

returns the modalities of the variables in the database

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::setMaxNbThreads ( Size  nb)
noexcept

sets the maximum number of threads used to perform countings

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
void gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::setRange ( Size  min_range,
Size  max_range 
)

sets the range of records taken into account by the counter

Parameters
min_rangehe number of the first record to be taken into account during learning
max_rangethe number of the record after the last one taken into account

Friends And Related Function Documentation

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
template<typename I , typename C >
friend class Counter
friend

Definition at line 326 of file recordCounter.h.

Member Data Documentation

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
std::vector< std::vector< double, CountAlloc > > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__countings
private

a vector for computing the countings of the IdSets which are subsets

These countings are derived from the countings of the supersets

Definition at line 369 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
HashTable< const IdSet< IdSetAlloc >*, Idx > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__idset2index
private

a hashtable associating to each IdSet its index in __set2thread_id

This table indicates for each distinct set its index in __set2thread_id or, equivalently, in __nodesets. By distinct set, we mean that, when several sets are identical, only the first one inserted by addNodeSet is put into the hashtable. This table is used as a helper in __computeSubsets to determine quickly the indices of supersets in __nodesets (the content of __idset2index is similar to that of __idsets, except that it is faster to search within hashtables pointer to idsets rather than idsets themselves.

Definition at line 380 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Bijection< IdSet< IdSetAlloc >, Idx > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__idsets
private

the set of ordered vectors of ids + their indices in __nodesets

The goal of this structure is essentially to store the vectors of ordered ids as IdSet<> in such a way that the memory locations of these idsets never change, even if we add new IdSets (this feature is guaranteed by aGrUM's hashtables). In addition, it allows to easily detect identical IdSets (such detection allows for fast computations). As such, the indices stored as values in the hashtable are the indices in __nodesets ONLY of one copy of identical sets, the others are deduced from it. IdSets are used to quickly determine which sets are included into others.

Definition at line 341 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Size gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__max_range
private

the number of the record after the last one taken into account

Definition at line 417 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Size gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__max_threads_number {1}
private

the max number of threads authorized

Definition at line 406 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Size gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__min_nb_rows_per_thread {100}
private

the minimal number of rows to parse (on average) by thread

Definition at line 410 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Size gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__min_range
private

the number of the first record to be taken into account during learning

Definition at line 414 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
const std::vector< Size >* gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__modalities {nullptr}
private

the modalities of the variables

Definition at line 329 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
Size gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__nb_thread_counters {0}
private

the number of thread counter used by the last count ()

Definition at line 400 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
std::vector< const std::vector< Idx, IdSetAlloc >* > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__nodesets
private

the vector of the unordered ids' vectors used to generate the idsets

When the user add nodes (i.e., vectors of ids), those are unordered and will be processed (counted) as such. However, in order to determine which vectors of ids are contained into other vectors, we create IdSets from them (those are ordered vectors that will enable fast subset computations)

Definition at line 349 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
std::vector< std::pair< const IdSet< IdSetAlloc >*, Idx > > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__set2thread_id
private

a table associating to each IdSet its index in the threadRecordCounters

For the IdSets which are subsets of other IdSets, the index corresponds to that of its superset in __set2thread_id

Definition at line 387 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
std::vector< SetState > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__set_state
private

a table indicating whether each IdSet is a subset of another idSet

Definition at line 365 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
DAG gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__subset_lattice
private

a partial lattice indicating the relations between subsets and supersets

In this lattice, an arc X -> Y indicates that X is a superset of Y. As such, we should perform countings of X before deducing those of Y.

Definition at line 393 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
std::vector< RecordCounterThreadBase< IdSetAlloc, CountAlloc >* > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__thread_counters
private

the set of ThreadCounters

Definition at line 397 of file recordCounter.h.

template<typename IdSetAlloc = std::allocator< Idx >, typename CountAlloc = std::allocator< double >>
HashTable< Idx, std::vector< const IdSet< IdSetAlloc >* > > gum::learning::RecordCounter< IdSetAlloc, CountAlloc >::__var2idsets
private

a table associating to each variable the IdSets that contain it

This table is used to quickly compute the IdSets that are contained in other IdSets

Definition at line 354 of file recordCounter.h.


The documentation for this class was generated from the following file: