- Introduction:

Ordinal regression (ranking) resides uniquely between multiclass classification and metric regression in the area of supervised learning. The ordinal regression problem can be thought as a multiclass problem with an ordering preference, or as a regression problem without knowing much about the underlying metric. It has many applications in social science and information retrieval to match human preferences, and has caught much attention in the machine learning community in recent years. (see some introduction slides in my talk: MLSS '06, Caltech EE Pizza Meeting) - Thresholded ensemble model:

We propose a thresholded ensemble model for ordinal regression problems. The model consists of a weighted ensemble of confidence functions and an ordered vector of thresholds. We derive novel large-margin bounds of common error functions, such as the classification error and the absolute error. We propose new and simple boosting approaches for constructing thresholded ensembles. These approaches have comparable performance to state-of-the-art algorithms, but enjoy the benefit of faster training. (see our paper: ALT '06 for details). - Reduction:

We propose a new reduction framework that systematically transforms ordinal regression problems to the more well-studied binary classification problems. Then, any binary classification algorithm can be automatically extended for ordinal regression, and improvements for binary classification can be immediately inherited for ordinal regression. In addition, many existing ordinal regression algorithms and models, including the thresholded ensemble model, perceptron ranking, and support vector ordinal regression, can be viewed as special cases of the reduction framework. In the theoretical perspective, the framework enlightens simple and intuitive proof for ordinal regression theorems, and in the practical aspect, the framework performs very well on benchmark data sets. (see our paper: NIPS '06 for details).

- From Group Research Page:

There are two important phenomena about real world learning problems. First, they may contain noisy or mislabeled data, which can mislead the learning algorithm. Second, they may have data that are too complex, and make it hard for the algorithm to extract the essence. In both cases, the performance of the algorithm degrades due to the incorrect or complex data given. In order to obtain a better generalization ability, we want to prune those unfavorable data before launching the learning procedure. This would also help set up analysis tools in areas that we have abundant data, such as computational finance.

We have found that several learning algorithms, such as the rho-Learning scenario, AdaBoost, and Support Vector Machines, can offer some help in identifying unfavorable data (see our paper: PKDD '05). We are trying to justify the data selection framework in the learning aspect, and to build useful selection tools by understanding the behavior of different learning algorithms in various environments.

- Some Studies:

I keep working with Neural Network for regression (function fitting) and its behavior in noisy environments. Traditional statistical studies have provided useful tools for analyzing regression results. I hope that by understanding the behavior of Neural Networks through these tools, we can also understand the affect of unfavorable data, and thus find ways to deal with them.

- General:

Support Vector Machine is an algorithm for pattern classification/regression. The two key ideas inside, "large margin" and "kernel mapping", make SVM a powerful machine learning tool. I was in the group of Prof. Lin in National Taiwan University. We studied various aspects about SVMs both in theory and in practice. I am still interested in things related to this area.

- Infinite Ensemble Learning:

Traditional ensemble learning algorithms, such as boosting, aggregates a finite number of hypotheses. However, it is not clear whether an ensemble with infinite number of hypotheses shall be used, and it is a challenging problem to construct such ensemble.

Our work applies SVM optimization machinery and the kernel trick to ensemble learning. The kernel trick makes it possible to embed an infinite number of hypotheses into a simple kernel computation. The work results in a novel learning framework that constructs an infinite ensemble. The framework provides further understanding for designing new kernels, explaining existing kernels, and comparing boosting with SVM.

When our SVM-based framework and AdaBoost (a popular boosting algorithm) are designed with the same set of hypotheses, our framework outperforms AdaBoost experimentally. Further analysis shows that the sparsity/finiteness property of AdaBoost is the key for explaining the difference. On the other hand, the novel framework aggregates an infinite number of hypotheses, and does not suffer from the limitation of sparsity/finiteness. (see our papers: my Master's Thesis, JMLR '07 for details)

Feel free to contact me: **"htlin" at "csie.ntu.edu.tw"**