Abstract / Description of output
Bayesian
treatments
of
learning
in
neural
networks
are
typically
based
either
on
local
Gaussian
approximations
to
a
mode
of
the
posterior
weight
distribution,
or
on
Markov
chain Monte
Carlo
simulations.
A
third
approach,
called
ensemble
learning,
was
introduced
by
Hinton
and
van
Camp
(1993).
It
aims
to
approximate
the
posterior
distribution
by
minimizing
the
Kullback-Leibler
divergence
between
the
true
posterior
and
a
parametric
approximating
distribution.
However,
the
derivation
of
a
deterministic
algorithm
relied
on
the
use
of
a
Gaussian
approximating
distribution
with
a
diagonal
covariance
matrix
and
so
was
unable
to
capture
the
posterior
correlations
between
parameters.
In
this
paper,
we
show
how
the
ensemble
learning
approach
can
be
extended
to
full-
covariance
Gaussian
distributions
while
remaining
computationally
tractable.
We
also
extend
the
framework
to
deal
with
hyperparameters,
leading
to
a
simple
re-estimation
procedure.
Initial
results
from
a
standard
benchmark
problem
are
encouraging.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 10 (NIPS 1997) |
Editors | M.I. Jordan, M.J. Kearns, S.A. Solla |
Publisher | MIT Press |
Pages | 395-401 |
Number of pages | 7 |
Publication status | Published - 1998 |