An investigation of statistical learning curves: do we always need big data? (2017)
AuthorsLi, Yangshow all
The rapid revolutionary rapid Big Data technology has attracted increasing attention and widely been used in many industries. It is not only benefiting our life dramatically, but also posing new challenges to us at the same time. In many situations, dealing with these big and complex data can extremely difficult. However, do we really always need big data?
This thesis attempted to investigate whether do we need a large dataset to build a model with acceptable accuracy, how the number of observations affect the performance of statistical predictive methods and use learning curves to describe this relationship. Some popular statis- tical learning methods were considered and applied on 3 large datasets. An efficient parallel coding strategy in R was also provided.