In practice, learning from data is often hampered by the limited training examples. In this paper, as the size of training data varies, we empirically investigate several probability estimation tree algorithms over eighteen binary classification problems. Nine metrics are used to evaluate their performances. Our aggregated results show that ensemble trees consistently outperform single trees. Confusion factor trees(CFT) register poor calibration even as training size increases, which shows that CFTs are potentially biased if data sets have small noise. We also provide analysis on the observed performance of the tree algorithms.