Table of Contents
crawlingPeopleRelatedWeiboOriginalPost(pid, person_name)
Description
Extract people related original weibo post from sina weibo and insert to database.
Parameters
| Parameter | Necessity | Type | Description | 
|---|---|---|---|
| pid | required | int | person id | 
| person_name | required | string | person name need to crawl | 
Output
| Parameters | Type | Description | 
|---|---|---|
| status | string | show the crawler running status | 
Implementation
- masterStart(). Create multiple processes to begin crawling data.
 - wapLogIn(). Log in sina Account.
 - weiBoWapSearch(person_name, pid). Use person name and person id to search person related weibo
- extractTopic(person_name, person_id ). Extract weibo text and insert to database.
 
 
Related Work
None
Issues About The Crawler
- Sina Weibo API is not not so effective, it need to be authorized but the crawler would not pass sina's examine and verify.
 - Using browser’s cookies to log in sina account.
 - Using the url weibo.cn instead of www.weibo.com to crawl data, because the latter one’s tweet data is sealed in javascript and it’s difficult to extract.
 - Using multiple proxies to prevent sina block our ip.
 - For speeding up the crawler, using multiple processes and accounts.