CSCI5510 Big Data Analytics
Breaking News
- September 3, 2013. The course homepage is migrated to https://www.cse.cuhk.edu.hk/csci5510/wiki/ permanently.
- September 2, 2013. The new semester begins.
- September 2, 2013. News group address: cuhk.cse.csci5510
- September 2, 2013. The first tutorial will be conducted on Sept. 10. There is no tutorial in the first week.
- September 3, 2013. The tutorial class room is YIA LT7.
20013-14 Term 1
Lecture | Tutorial | |
---|---|---|
Time | M2-4, 9:30 am - 12:30 pm | T3 10:30 am - 11:15 am |
Venue | KKB101 | YIA LT7 |
The Golden Rule of CSCI5510: No member of the CSCI5510 community shall take unfair advantage of any other member of the CSCI5510 community.
Course Description
This course aims at teaching students the state-of-the-art big data analytics, including techniques, software, applications, and perspectives with massive data. The class will cover, but not be limited to, the following topics: distributed file systems such as Google File System, Hadoop Distributed File System, CloudStore, and map-reduce technology; similarity search techniques for big data such as minhash, locality-sensitive hashing; specialized processing and algorithms for data streams; big data search and query technology; big graph analysis; recommendation systems for Web applications. The applications may involve business applications such as online marketing, computational advertising, location-based services, social networks, recommender systems, healthcare services, also covered are scientific and astrophysics applications such as environmental sensor applications, nebula search and query, etc.
本課程旨在教導學生最先進的針對大數據的分析,包括技術、軟件、應用和遠景。本課程內容將包括,但不限於以下內容:分佈式文件系統如谷歌文件系統,Hadoop文件系統,CloudStore等和Map-reduce技術;大數據的相似搜索技術,如最小哈希,局部敏感哈希等;針對數據流的專門處理方法和算法;大數據的搜索和查詢技術;互聯網應用中的廣告管理和推薦系統。本課涉及的應用程序可能包括商業應用程序,如網絡營銷、計算廣告、基於位置的服務、社交網絡、推薦系統、醫療保健服務和科學及天體物理學領域的應用,如環境傳感器的應用,星雲搜索和查詢等。
Learning Objectives
- To understand the current key issues on big data and the associated business/scientific data applications
- To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming capability
- To interpret business models and scientific computing results
- Able to apply software tools for big data analytics
Learning Outcomes
At the end of the course of studies, students will have acquired the ability to
- Understand the key issues on big data and the associated applications in intelligent business and scientific computing.
- Acquire fundamental enabling techniques and scalable algorithms in big data analytics.
- Interpret business models and scientific computing paradigms, and apply software tools for big data analytics.
- Achieve adequate perspectives of big data analytics in marketing, financial services, health services, social networking, astrophysics exploration, and environmental sensor applications, etc.
Learning Activities
- Lectures
- Tutorials
- Web resources
- Projects
- Presentations
- Lab Reports
- Examinations
Personnel
Lecturer | Lecturer | Tutor | Tutor | |
---|---|---|---|---|
Name | Irwin King | Michael R. Lyu | Guang Ling | Chen Cheng |
king AT cse.cuhk.edu.hk | lyu AT cse.cuhk.edu.hk | gling AT cse.cuhk.edu.hk | ccheng AT cse.cuhk.edu.hk | |
Office | Rm 908 | Rm 927 | Rm 1024 | Rm 1024 |
Telephone | 3943 8398 | 3943 8429 | 3943 4252 | 3943 4252 |
Office Hour(s) | TBA | 10:00-12:00 Tuesday | TBA | TBA |
Note: This class will be taught in English. Homework assignments and examinations will be conducted in English.
Syllabus
The pdf files are created in Acrobat 6.0. Please obtain the correct version of the Acrobat Reader from Adobe.
Week | Date | Topics | Tutorials | Homework & Events | Resources |
---|---|---|---|---|---|
1 | 2/9 | Introduction and Motivation 01.pptx | No Tutorial | Ch. 1 of MMDS | |
2 | 9/9 | MapReduce 02-MapReduce.pdf | | | Ch. 2 of MMDS Ch. 6 of MMDS |
3 | 16/9 | Locality Sensitive Hashing 03-lsh.pdf | | Ch. 3 of MMDS | |
4 | 23/9 | Mining Data Streams 04-stream.pdf | Ch. 4 of MMDS | ||
5 | 30/9 | Scalable Clustering 05-clustering.pdf | Ch. 7 of MMDS | ||
6 | 7/10 | Dimensionality Reduction 06-DR.pdf | Ch. 11 of MMDS | ||
7 | 14/10 | Public Holiday | |||
8 | 21/10 | Recommender systems/Matrix Factorization 07-mf.pdf | Ch. 9 of MMDS | ||
9 | 28/10 | Massive Link Analysis 08-link.pdf | Ch. 5 of MMDS | ||
10 | 4/11 | Mid-term | |||
11 | 11/11 | Analysis of Massive Graph 09-graph.pdf | Ch. 10 of MMDS | ||
12 | 18/11 | Large Scale SVM 10-svm.pdf | | SVM tutorial | |
13 | 25/11 | Online Learning 11-ol.pdf | Online learning survey |
Class Project
Class Project Presentation Schedule
- TBA
Class Project Presentation Requirements
Examination Matters
Examination Schedule
Time | Venue | Notes | |
---|---|---|---|
Midterm Examination | Nov. 4, 9:30am-12:00 noon | TBA | TBA |
Final Examination | TBA | TBA | TBA |
Written Midterm Matters
- The midterm will test your knowledge of the materials.
- Answer all questions using the answer booklet. There will be more available at the venue if needed.
- Write legibly. Anything we cannot decipher will be considered incorrect.
- One A4-sized cheat-sheet page.
Grade Assessment Scheme
Homework Assignments | Mid-term Examination | Project |
---|---|---|
20% | 30% | 50% |
- Assignments (20%)
- Written assignments
- Coding
- Mid-term Examination (30%)
- Project (50%)
- Proposal
- Presentations
- Report
Reference Books
FAQ
- Q: What is departmental guideline for plagiarism?
A: If a student is found plagiarizing, his/her case will be reported to the Department Discipline Committee. If the case is proven after deliberation, the student will automatically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others' works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender. A student caught plagiarizing during tests or examinations will be reported to the Faculty Office and appropriate disciplinary authorities for further action, in addition to failing the course.