This is the last snapshot before I graduated from Caltech.
AdaBoost_ERP mentioned
in my paper Multiclass Boosting
with Repartitioning.
CrossVal,
vFoldCrossVal, and HoldoutCrossVal.
They can also be used as learning models. See test/testsvm.cpp
for a demonstration.
Ordinal_BLE which was developed in the early stage of
the paper Ordinal Regression by
Extended Binary Classification.
It is outdated and will probably be rewritten in
the future to keep up with the paper.
SVM changes: Support vectors are copied out from LIBSVM
(which can save some memory usage);
Kernel and thus SVM can be saved/loaded.
DataFeeder can add flipping noise (set_train_noise()).
reset());
Possible to convert between some boosting models;
(multiclass) Allow data sets to be loaded after an ECOC table is set.
MultiClass_ECOC and
AdaBoost_ECOC. See test/multi.cpp
for an example with one-vs-all.
margin()/signed_margin()
functions scattered in different classes to four functions in
LearnModel: margin(), margin_of(),
min_margin(), and margin_norm().
The first three give unnormalized margins and the last is the
normalization term.
DataFeeder. It is handy
when data split/normalization is needed.
SVM need a small modification.
MultiClass_ECOC.
SVM::w_norm(),
invalid cache in Boosting::initialize(),
better Boosting::get_output() when no cache is used,
and a typo in RBF::matrix.
Perceptron added. Implemented several
perceptron learning algorithms mentioned in my paper
Perceptron Learning with Random
Coordinate Descent.
LPBoost added.
Hsuan-Tien Lin
contributed the code which uses
GLPK.
Kernel class was added since kernels
can be used for algorithms other than SVM.
SVM: Can be cloned. More insideinformation can be obtained, such as the 2-norm of the weight vector, the support vectors, and the coefficients.
Boosting, including
AdaBoost and CGBoost.
load_data() can auto-detect the input dimension.
randn().
c_error() and r_error() are in
LearnModel.
Pulse bug (introduced in 0.1 beta) fixed.
Pulse may fail to choose the optimal hypothesis under
some conditions.
MgnBoost (Breiman's arc-gv) added. Test code
is added to test/adabst.cpp.
boosting::margin() gives the margin of an
individual training example, or the minimal margin of the training set.
Stump. Incomplete.
I haven't tested the new code with Visual C++.NET.
Aggregation renamed to Aggregating.
CGBoost added. CGBoost is better
than AdaBoost in optimizing cost functions. For details please refer
to the CGBoost
technical report (note that small modifications are required
in _conjugate_gradient (optimize.h) in order to
set β=0 for the first several iterations).
SVM and test code testsvm added.
LIBSVM,
modified to support weighted training examples, is used for actual
work. Currently only SVM classification with RBF kernel is supported.
Serialization/unserialization has not been implemented yet.
lemga::cost (cost.h) added.
I try to separate the cost functions from the learning/optimization
methods, and this is a temporary solution before functors are used.
Pulse and Stump.
Training a pulse function now takes O(n) time.
Pulse parameters.
dataset::replace
and member Boosting::min_err.
save() and load() replaced by a much
better serialization/unserialization implementation. Operator >>
is used for saving models, and << for loading. create(istream)
can create an unknown-type object from an input stream. (Thus the base
model in class Aggregation is no longer needed when loading
models.)
load_data() accepts an input stream instead of a FILE*
handle.
Pulse added. It is a multi-transition
phase (step) function. The best hypothesis with number of transitions
equal to or less than a given limit is returned. When the limit is 1,
pulse is almost the same as stump (the only difference is that pulse
may return a hypothesis with no transitions at all). The code has been
tuned so that it is even faster than stump when the limit is 1.
_line_search will early stop if
a non-descending direction is met. This change affects conjugate
gradient, boosting in the functional space, and the training of neural
networks. For example, convex boosting now returns a very large number
as cost when empty to avoid non-descendingat the first step.
REGISTER_CREATOR simplifies the object creator
registration.
_gd_weightdecay) added.
id() returns a const string instead of
char*.
AdaBoost with Pulse)
and a model file checker (showlm) added.
copy() renamed to clone().
Bagging bag(n_in, n_out) simply becomes Bagging bag.
Boosting via BoostWgt and
_boost_gd) reimplemented so that conjugate gradient
and some variants of gradient descent are possible (in the functional
space).
_register_creator);
constructors accepting istream as argument added (interface
only); version() was renamed to id() and no
longer contains version information.
vectorop became lemga::op.
It serves only for generic optimization in Lemga; only a small set of
operations is needed for optimization
_shared_ptr.
create() added as one virtual constructor.
AdaBoost
removed.
Cascade class added.
The code rewriting is almost done and I've tested Lemga in one project (alphaBoost) with GCC 2.96, 3.0.1, and 3.2.1. Models and algorithms currently coded are: