Opinion Mining and Sentiment Analysis
Introduction
What is opinion mining?
Informally: Extract the opinions given in a piece of text.
Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques.
What's the big deal with opinion mining?
Motivating Scenario
- People who wants to buy a camera
- Look for comments and reviews
- People who just bought a camera
- Comment on it
- Write down the usage experience
- Camera Manufacturer
- Get feedback from customer
- Improve their products
- Adjust Marketing Strategies
Big business, right?
Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?)
People
Research Issues
Opinion Extraction
Identify the segments of text that contain opinions.
e.g. Opinions are in boldface
I have just entered into dslr world with 400d, before I used slr cameras.
400d is extremly well made, precise and overall feeling is vey good.
Sentiment Classification / Subjectivity Analyzes
Decide the sentiment orientation of a given piece of opinion.
What is Sentiment Orientation?
- Polarity
- Positive (e.g. This camera is great!)
- Negative (e.g. The battery life is too short.)
- Neutral
- Polarity Scale?
- (Most Negative) -10 … -5 … 0 (Neutral) … 5 … 10 (Most Positive)
e.g. The picture quality is good. (A positive opinion) e.g. The battery life is short. (A negative opinion)
Feature-Opinion Association
A problem proposed by Kam Tong CHAN. The problem is related to natural language processing:
Given a text with target features and opinions extracted, decide which opinions comment on which features.
It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing)
Consider the phrase “pretty little girls' school”,
- Does the school look little?
- Do the girls look little?
- Do the girls look pretty?
- Does the school look pretty?
Advanced Issues
Target Identification
Which one (or Who) is being commented?
e.g. He is a kind person.
Who is “he”?
e.g. The camera is great!
Which camera model are you talking about?
Source Identification
Given a review text, identify who made the comment.
Achieving this will allow us to build a Question-Answering System.
e.g. Who support Obama to be the next U.S. president?
Opinion Summarization and Visualization
Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object.
e.g. For Camera
- Picture Quality (+ve: 290, -ve 73)
- Ease of use (+ve: 57, -ve: 10)
- etc.
Opinion Spam Detection
Detect whether opinions that are written by spammers.
Why there are opinion spams?
- Someone may write something to promote its own image / products
- Someone may write something to hurt their enemies
Others
Linguistic Tools for Opinion Mining
[Domain-Specific] Sentiment lexicon
A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one.
- is there a way to generate it automatically from a large corpus?
Ontology
Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain.
- Who ontologies can be incorporated in opinion mining? e.g.:
- Opinion Summarization
- Processing Comparative Statements
- Is there a way to generate them automatically?
- Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like?
Scalability
- Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions?
Related Software Packages for Opinion Mining
- WordNet, SentiWordNet
- Thesaurus
- Python
- NLTK (Natural Language Processing Toolkits)
- Numpy, Scipy
- Matplotlib
- Text Processing Tools
- Sentence Splitters
- POS (Part-of-speech) Taggers
- Stemmers
- Crawler
Opinion Mining Related Resources
Research Papers
- Sentiment Classification bibliography
http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html
- ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics
Datasets
- Movie Review Data
http://www.cs.cornell.edu/people/pabo/movie%2Dreview%2Ddata/
- Customer Review Data
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
- MPQA Opinion Corpus
Tools
- SentiWordNet
http://sentiwordnet.isti.cnr.it/
- NLTK - Natural Language Processing Toolkits for Python
- WordNet
Web Resources
- The Sentiment & Affect Yahoo! Group
http://groups.yahoo.com/group/SentimentAI
- GI - General Inquirer
http://www.webuse.umd.edu:9090/ http://www.webuse.umd.edu:9090/tags/
- LDC Catalog
http://www.ldc.upenn.edu/Catalog/
- Opinmind
- Data Mining Resources
Related Conferences
- SIGIR - ACM SIGIR Special Interest Group on Information Retrieval
- CIKM - Conference on Information and Knowledge Management
- IDEAL - International Conference on Intelligent Data Engineering and Automated Learning
- SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
- AAAI - Association for the Advancement of Artificial Intelligence
- WWW - International World Wide Web Conferences
- TREC - Text REtrieval Conference
- ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing
http://www.acl-ijcnlp-2009.org/
- WSDM - ACM International Conference on Web Search and Data Mining
- SIGDAT / EMNLP - Conference on Empirical Methods in Natural Language Processing
http://www.cs.jhu.edu/~yarowsky/sigdat.html
- WI - ACM International Conference on Web Intelligence
- SIGWEB