aGrUM  0.13.2
How to use Bayesian Networks

Bayesian Networks are a probabilistic graphical model in which nodes are random variables and the probability distribution is defined by the product:

\(P(X_1, \ldots, X_2) = \prod_{i=1}^{n} P(X_i | \pi(X_i))\),

where \(\pi(X_i)\) is the parent of \(X_i\).

The Bayesian Network module in aGrUM can help you do the following operations:

  • Model Bayesian Networks, from graph to local distributions.
  • Execute probabilistic inference from a wide range of algorithms.
  • Load and save Bayesian Networks in different file formats.

The Bayesian Networks module list all classes for using Bayesian Networks with aGrUM.

We will use the classic Asia network to illustrate how the gum::BayesNetFactory class works.

Creating Bayesian Networks

The following code illustrates how to create the Asia network using the gum::BayesNet class. To create an instance of a Bayesian Network, you simply need to call the gum::BayesNet class constructor.

auto bn = gum::BayesNet<double>("Asia");

Use the gum::BayesNet::add( const gum::DiscreteVariable& ) method to add variables in the Bayesian Network. The following variables are available in aGrUM:

// Variables are added by copy to the BayesNet, so you can use a single
// gum::LabelizedVariable to add all varaibles with the same domain
"template", "A variable of the Asia Bayesian Network", 0 );
var.addLabel( "True" );
var.addLabel( "False" );
var.setName( "Visit to Asia" );
auto visitToAsia = bn.add( var );
var.setName( "Smoker" );
auto smoker = bn.add( var );
var.setName( "Has Tuberculosis" );
auto hasTuberculosis = bn.add( var );
var.setName( "Has Lung Cancer" );
auto hasLungCancer = bn.add( var );
var.setName( "Has Bronchitis" );
auto hasBronchitis = bn.add( var );
var.setName( "Tuberculosis or Cancer" );
auto tubOrCancer = bn.add( var );
var.setName( "XRay Result" );
auto xray = bn.add( var );
var.setName( "Dyspnea" );
auto dyspnea = bn.add( var );

You can also use the gum::BayesNet::idFromName( const std::string& ) method to retrieve variable's id from its name.

Use the gum::BayesNet::addArc( gum::NodeId, gum::NodeId ) to add arcs between node in the Bayesian Network.

bn.addArc( visitToAsia, hasTuberculosis );
bn.addArc( hasTuberculosis, tubOrCancer );
bn.addArc( smoker, hasLungCancer );
bn.addArc( smoker, hasBronchitis );
bn.addArc( hasLungCancer, tubOrCancer );
bn.addArc( tubOrCancer, xray );
bn.addArc( tubOrCancer, dyspnea );
bn.addArc( hasBronchitis, dyspnea );

Finally, use the gumm:BayesNet::cpt( gum::NodeId ) to access a variable's conditional probability table. See How to use the MultiDim hierarchy to learn how to fill gum::Potential. Here we use the gum::Potential::fillwith( const std::vector& ) method.

bn.cpt( visitToAsia ).fillWith( { 0.1f, 0.9f } );
bn.cpt( smoker ).fillWith( { 0.7f, 0.3f } );
bn.cpt( hasTuberculosis ).fillWith( {
// True | False == hasTuberculosis
0.05f, 0.01f, // visitToAsia == True
0.95f, 0.99f // visitToAsia == False
} );
bn.cpt( hasLungCancer ).fillWith( {
// True | False == hasLungCancer
0.10f, 0.90f, // smoker == True
0.01f, 0.99f // smoker == False
} );
bn.cpt( tubOrCancer ).fillWith( {
// True | False == tubOrCancer
1.00f, 0.00f, // hasTuberculosis == True, hasLungCancer == True
1.00f, 0.00f, // hasTuberculosis == False, hasLungCancer == True
1.00f, 0.00f, // hasTuberculosis == True, hasLungCancer == False
0.00f, 1.00f // hasTuberculosis == False, hasLungCancer == False
} );
bn.cpt( xray ).fillWith( {
// True | False == xray
0.98f, 0.02f, // tubOrCancer == 0
0.05f, 0.95f // tubOrCancer == 1
} );
bn.cpt( dyspnea ).fillWith( {
// True | False == dyspnea
0.90f, 0.10f, // tubOrCancer == True, hasBronchitis == True
0.70f, 0.30f, // tubOrCancer == False, hasBronchitis == True
0.80f, 0.20f, // tubOrCancer == True, hasBronchitis == False
0.10f, 0.90f // tubOrCancer == False, hasBronchitis == False
} );

Filling conditionnal probability tables can be hard and you should use the commenting trick as above to help you with large tables. It is important to remember that the std::vector is used to fill a multidimensionnal table where each line should sum to 1, i.e. each line stores \(P(X_i | \pi(X_i)\).

Using the gum::BayesNetFactory class

The gum::ByesNetFactory class is usefull when writing serializers and deserailizers for the gum::BayesNet class. You can also use it to create gum:BayesNet directly in C++, you may however find that using directly the gum::BayesNet class simpler.

Instantiating the factory

The gum::BayesNetFactory expects a pointer toward a gum::BayesNet. The factory will not release this pointer, so you should be careful to release it yourself.

auto asia = gum::BayesNet<double>();
auto factory = gum::BayesNetFactory<double>(&asia);

Most methods follow a start / end pattern . Until the end method is called, there is no guarantee that the element is added or partially added to the gum::BayesNet.

Adding nodes

To add a node, you must use the gum::BayesNetFactory::startVariableDeclaration() and gum::BayesNetFactory::endVariableDeclaration() methods. You must provide several informations to correctly add a node to the gum::BayesNet, otherwise a gum::OperationNotAllowed will be raised.

When declaring a variable you must:

  • Have finished any previous declaration using the respective end method.
  • Give it a name using gum::BayesNetFactory::variableName(std::string).
  • Add at least two modalities using gum::BayesNetFactory::addModality(std::string).

Here is a list of legal method calls while declaring a variable:

Here is a code sample where we declare the "Visit To Asia" variable in the Asia Network example:

// Visit to Asia
factory.startVariableDeclaration();
factory.variableName( "Visit To Asia" );
factory.variableDescription(
"True if patient visited Asia in the past months" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// Smoker
factory.startVariableDeclaration();
factory.variableName( "Smoker" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// Has Tuberculosis
factory.startVariableDeclaration();
factory.variableName( "Has Tuberculosis" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// Has Lung Cancer
factory.startVariableDeclaration();
factory.variableName( "Has Lung Cancer" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// Tuberculosis or Cancer
factory.startVariableDeclaration();
factory.variableName( "Tuberculosis or Cancer" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// Has Bronchitis
factory.startVariableDeclaration();
factory.variableName( "Has Bronchitis" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// XRay Result
factory.startVariableDeclaration();
factory.variableName( "XRay Result" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
// Dyspnea
factory.startVariableDeclaration();
factory.variableName( "Dyspnea" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();

The gum::BayesNetFactory::endVariableDeclaration() method returns the variable's gum::NodeId in the gum::BayesNet.

Adding arcs

To add an arc you must use the gum::BayesNetFactory::startParentsDeclaration( const std::string& ) and gum::BayesNetFactory::endParentsDeclaration() methods.

Here is a list of legal method calls while declaring parents:

Note that you may not add all parents in one shot and that calling both start end methods without adding any parent will not result in an error.

// Parents of Has Tuberculosis
factory.startParentsDeclaration( "Has Tuberculosis" );
factory.addParent( "Visit To Asia" );
factory.endParentsDeclaration();
// Parents of Has Lung Cancer
factory.startParentsDeclaration( "Has Lung Cancer" );
factory.addParent( "Smoker" );
factory.endParentsDeclaration();
// Parents of Tuberculosis or Cancer
factory.startParentsDeclaration( "Tuberculosis or Cancer" );
factory.addParent( "Has Tuberculosis" );
factory.addParent( "Has Lung Cancer" );
factory.endParentsDeclaration();
// Parents of Has Bronchitis
factory.startParentsDeclaration( "Has Bronchitis" );
factory.addParent( "Smoker" );
factory.endParentsDeclaration();
// Parents of XRay Result
factory.startParentsDeclaration( "XRay Result" );
factory.addParent( "Tuberculosis or Cancer" );
factory.endParentsDeclaration();
// Parents of Dyspnea
factory.startParentsDeclaration( "Dyspnea" );
factory.addParent( "Tuberculosis or Cancer" );
factory.addParent( "Has Bronchitis" );
factory.endParentsDeclaration();

Defining Conditional Probability Tables

The gum::BayesNetFactory class offers three ways to define conditional probability tables (CPT): raw, factorized and delegated.

Raw CPT definition

From a user perspective, raw definitions are useful to define small CPT, like root nodes. However, they do not scale well if the CPT dimension is too high and you should prefer Factorized CPT definition if you need to define large CPT. On the other hand, raw definitions are very useful when automatically filling CPT from some source (file, database, another CPT, ...).

Two methods can be used to define raw CPT:

Defining the conditional probability table for the root node "Visit To Asia" in the Asia Network example can be achieved as follow:

factory.startRawProbabilityDeclaration("VisitToAsia");
auto variables = std::vector<std::string>{ "VisitToAsia" };
auto values = std::vector<float>{ 0.01f, 0.99f };
factory.rawConditionalTable(variables, values);
factory.endRawProbabilityDeclaration();

Defining the conditional probability table for a node with parents:

factory.startRawProbabilityDeclaration("Tuberculosis or Cancer");
variables = std::vector<std::string>{
"Tuberculosis or Cancer",
"Has Tuberculosis",
"Has Lung Cancer"
};
values = std::vector<float>
// True || False => Has Lung Cancer
// True | False || True | False => Has Tuberculosis
{ 0.00f, 0.00f, 0.00f, 1.00f, // False
1.00f, 1.00f, 1.00f, 0.00f }; // True
factory.rawConditionalTable(variables, values);
factory.endRawProbabilityDeclaration();

Factorized CPT definition

Factorized definitions are useful when dealing with sparse CPT. It can also be used when writing the raw CPT is error prone. The gum::BayesNetFactory::startFactorizedProbabilityDeclaration(const std::string&) is used to start a definition and gum::BayesNetFactory::endFactorizedProbabilityDeclaration(const std::string&) to end it.

A factorized definition is made of consecutive factorized entries. Each entry set parents modalities and defines a distribution given those modalities. If some parents are left undefined, then the distribution will be assigned to each possible outcome of those parents.

To start declaring a factorized entry call the gum::BayesNetFactory::startFactorizedEntry() and to end it call gum::BayesNetFactory::endFactorizedEntry().

In the following example, we define the CPT for the "Dyspnea" variable in the Asia Network:

factory.startFactorizedProbabilityDeclaration("Tuberculosis or Cancer");
// Setting [ 1.00, 0.00 ] as the default distribution
factory.startFactorizedEntry();
// Tuberculosis or Cancer -> True | False
values = std::vector<float>{ 1.00f, 0.00f };
factory.setVariableValues( values );
factory.endFactorizedEntry();
factory.startFactorizedEntry();
factory.setParentModality( "Has Lung Cancer", "False" );
factory.setParentModality( "Has Tuberculosis", "False" );
// Tuberculosis or Cancer -> True | False
values = std::vector<float>{ 0.00f, 1.00f };
factory.setVariableValues( values );
factory.endFactorizedEntry();
factory.endFactorizedProbabilityDeclaration();

While adding values in a factorized definition, two methods are available:

The unchecked version will not check if the vector matches the variable's domain size. The checked version will raise a gum::OperationNotAllowed if such situation.

Delegated CPT definition

Delegated definitions let the user define himself the gum::DiscreteVariable and gum::MultiDimAdressable added to the gum::BayesNet. You should only use such method if you familiar with the multidim hierarchy and require specific multidimensional arrays, like gum::MultiDimNoisyORCompound, gum::aggregator::Count, etc.

Probabilistic Inference

All inference algorithms implement the gum::BayesNetInference class. The main methods for inference are:

auto asia = gum::BayesNet<double>( "Asia" );
// Constructing the BayesNet...
// Choose one among available inference algorithms
auto inference = gum::ShaferShenoyInference<double>( asia );
auto id = asia.idFromName( "Has Lung Cancer" );
auto marginal = inference.posterior( id );
// We can add some evidence
// Index 0 is False, 1 True
inference.addHardEvidence( bn.idFromName( "Visit to Asia"), 0 );
inference.addHardEvidence( bn.idFromName( "Dyspnea"), 0 );
auto updated_marginal = inference.posterior( id );

More advance methods can be used for special use case:

  • gum::BayesNetInference::insertEvidence( const List<const Potential<GUM_SCALAR>*>&).
  • gum::BayesNetInference::eraseEvidence( const Potential<GUM_SCALAR>* ).

Inference Algorithms

Here is a list of exact inference algorithms:

And this is the list of approximate inference algorithms:

  • gum::GibbsInference.

Finally, a list of utility algorithms used by some inference algorithms:

Serialization

There are several file format currently supported for gum::BayesNet serialization and deserialization. The all either implement gum::BNReader for serialization or gum::BNWriter for deserialization.

The gum::BNReader class

The main methods for deserializing an instance of gum::BayesNet are:

auto asia = gum::BayesNet<double>("Asia");
// One implementation of the gum::BNReader class
auto reader = gum::BIFReader<double>( &asia, "/path/to/bif/file.bif" );
try {
reader.proceed();
} catch ( gum::IOError& e ) {
// A gum::IOError will be erased if an error occured
}

The gum::BNWriter class

The main methods for serializing an instance of gum::BayesNet are:

auto asia = gum::BayesNet<double>("Asia");
// Constructing BayesNet...
// One implementation of the gum::BNWriter class
auto writer = gum::BIFWriter<double>();
try {
// This will print the asia BayesNet on the standard output stream
writer.write( std::cout, asia );
// This will write the asia BayesNet in the given file
writer.write( "/tmp/asiaNetwork.bif", asia );
} catch ( gum::IOError& e ) {
// A gum::IOError will be raised if an error occured
}

Be aware that the file will be created if it does not exists. If it does exist, its content will be erased.

List of supported format

The BIF format:

The BIF XML format:

The DSL format:

The CNF format (no reader in this format):

The NET format: