Douban Big Data Research
Yang, Jack and Brian Yecies, 2016. Mining Chinese social media UGC: a big-data framework for analyzing Douban movie reviews. Journal of Big Data 3 (3): 1-23.
Abstract: Analysis of online user-generated content is receiving attention for its wide applications from both academic researchers and industry stakeholders. In this pilot study, we address common Big Data problems of time constraints and memory costs involved with using standard single-machine hardware and software. A novel Big Data processing framework is proposed to investigate a niche subset of user-generated popular culture content on Douban, a well-known Chinese-language online social network. Huge data samples are harvested via an asynchronous scraping crawler. We also discuss how to manipulate heterogeneous features from raw samples to facilitate analysis of various film details, review comments, and user profiles on Douban with specific regard to a wave of South Korean films (2003–2014), which have increased in popularity among Chinese film fans. In addition, an improved Apriori algorithm based on MapReduce is proposed for content-mining functions. An exploratory simulation of results demonstrates the flexibility and applicability of the proposed framework for extracting relevant information from complex social media data, knowledge which can in turn be extended beyond this niche dataset and used to inform producers and distributors of films, television shows, and other digital media content.