API | Description |
---|---|
getRelatedPerson(personName) | Return a list of people whose name is related with personName |
getPersonName(pid) | |
getPersonAge(pid) | |
getPersonBirthday(pid) | |
getPersonGender(pid) | |
getPersonHometown(pid) | |
getPersonIntroduction(pid) | |
getPersonTimeline(pid, duration) | |
getPersonTimeline(pid, latestEventNumber) | |
getSchoolMatesSocialNetwork(pid, duration, neighbour_number, layer) | |
getWorkMatesSocialNetwork(pid, duration, neighbour_number, layer) | |
getFellowTownsmenMatesSocialNetwork(pid, duration, neighbour_number, layer) | |
getFriendsSocialNetwork(pid, duration, neighbour_number, layer) |
API | Description |
---|---|
getRelatedCompany(companyName) | Return a list of people whose name is related with personName |
getCompanyStockName(stock_id) | |
getCompanyAddress(stock_id) | |
getCompanyFullName(stock_id) | |
getCompanyIndustry(stock_id) | |
getCompanyRevenue(stock_id, year) | |
getCompanyPerformance(stock_id, year) | |
getCompanyStockReturn(stock_id, year) | |
getCompanyEmployeeChangeTimeline(stock_id, position_rank, duration) | |
getEmployeesSocialNetwork(stock_id, position_list, duration, neighbour_number, layer) | |
getCompanySocialNetwork(stock_id, duration, neighbour_number, layer) | |
getSectorSocialNetwork(stock_id, sector, duration, neighbour_number, layer) |
We use Lucene to create the index of our company and person data, and then we could implement the search function.
Lucene – http://lucene.apache.org/
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
There are two main steps to do name entity disambiguation:
First, we use the train set to extract filter keywords to a certain person.
Then, use the extracted filter keywords to judge each posts is related to the person in our database or not.
Related works
Two main functions:
Two columns of search results, one for person and one for company.
Parse cookie from cookie sqlite (For Firefox and Chrome browser)
Operate the data base (select, insert)
The Crawler for company
The Crawler for person
The Crawler master, control the crawler and make it run continually
linux run command:
‘nohup /local/fdm/python27/bin/python /local/fdm/weibobug/sinaBug.py &’
Describe data and data scheme in CLANS database.