Table of Contents
Query Categorization
Pre-processing
- Stemming
- Abbreviation Extension
- Stopword filtering
- Use Stop-word list
- Misspelled words
- Location-based queries
- NER for location detection
- Part-of-speech (POS) tagging
- Named entity recognition (NER)
- Person (e.g., Bill Gates)
- Location (e.g., Hong Kong)
- Thing (e.g., Table)
Knowledge Base
- Lexicon (e.g., DBpedia person, location, organization, and product lists)
- Stop-word list (e.g, of, the)
- Abbreviation list (e.g., ad for advertisement)
Useful tools
Input Examples
- the chinese university of hk
- new york pizza
- How do I play mp3 using the java programming language
Crowdsourcing
- Top 1000 queries ⇒ label them into 32 categories
Centroid Method
- Function Query2Term(string query)
Input: a query, Output: terms of this query
Example1: the chinese university of hk → [the chinese university of hk]1 ([]i is the i-th term of this query)
Example2: new york pizza → [new york]1 [pizza]2
Example3: How do I play mp3 using the java programming language → [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6
- Function Term2Centroid(string terms)
Input: terms of a query, Output: centroid of this query
Example1: [the chinese university of hk]1→ the chinese university of hk
Example2: [new york]1 [pizza]2 → pizza
Example3: [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 → mp3
- Function synonym(string keyword)
Input: a word, Output: a set of synonyms of this term in WordNet
Example: synonym(car)
auto, automobile, machine, motorcar
Similarity-based Method
- Function catURL(string category, string engine, int n)
Input: a category, Output: top n URLs from search engines (e.g., Google)
Example: catURL(cuhk, Google, 3)
www.cuhk.edu.hk/
www.cuhk.edu.hk/chinese/
www.cuhk.edu.hk/gss/
- Function keywordsURL(string URL)
Input: a URL, Output: key words of Web pages for this URL
Example: keywordsURL(http://www.cuhk.edu.hk/english/)
research, education, shatin, campus, college, etc
- Function synonym(string keyword)
Input: a word, Output: a set of synonyms of this term in WordNet
Example: synonym(car)
auto, automobile, machine, motorcar