Table of Contents

CSCI5510 Big Data Analytics

Breaking News

20013-14 Term 1

Lecture Tutorial
Time M2-4, 9:30 am - 12:30 pm T3 10:30 am - 11:15 am
Venue KKB101 YIA LT7

The Golden Rule of CSCI5510: No member of the CSCI5510 community shall take unfair advantage of any other member of the CSCI5510 community.

Course Description

This course aims at teaching students the state-of-the-art big data analytics, including techniques, software, applications, and perspectives with massive data. The class will cover, but not be limited to, the following topics: distributed file systems such as Google File System, Hadoop Distributed File System, CloudStore, and map-reduce technology; similarity search techniques for big data such as minhash, locality-sensitive hashing; specialized processing and algorithms for data streams; big data search and query technology; big graph analysis; recommendation systems for Web applications. The applications may involve business applications such as online marketing, computational advertising, location-based services, social networks, recommender systems, healthcare services, also covered are scientific and astrophysics applications such as environmental sensor applications, nebula search and query, etc.

本課程旨在教導學生最先進的針對大數據的分析,包括技術、軟件、應用和遠景。本課程內容將包括,但不限於以下內容:分佈式文件系統如谷歌文件系統,Hadoop文件系統,CloudStore等和Map-reduce技術;大數據的相似搜索技術,如最小哈希,局部敏感哈希等;針對數據流的專門處理方法和算法;大數據的搜索和查詢技術;互聯網應用中的廣告管理和推薦系統。本課涉及的應用程序可能包括商業應用程序,如網絡營銷、計算廣告、基於位置的服務、社交網絡、推薦系統、醫療保健服務和科學及天體物理學領域的應用,如環境傳感器的應用,星雲搜索和查詢等。

Learning Objectives

  1. To understand the current key issues on big data and the associated business/scientific data applications
  2. To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming capability
  3. To interpret business models and scientific computing results
  4. Able to apply software tools for big data analytics

Learning Outcomes

At the end of the course of studies, students will have acquired the ability to

  1. Understand the key issues on big data and the associated applications in intelligent business and scientific computing.
  2. Acquire fundamental enabling techniques and scalable algorithms in big data analytics.
  3. Interpret business models and scientific computing paradigms, and apply software tools for big data analytics.
  4. Achieve adequate perspectives of big data analytics in marketing, financial services, health services, social networking, astrophysics exploration, and environmental sensor applications, etc.

Learning Activities

  1. Lectures
  2. Tutorials
  3. Web resources
  4. Projects
  5. Presentations
  6. Lab Reports
  7. Examinations

Personnel

Lecturer Lecturer Tutor Tutor
Name Irwin King Michael R. Lyu Guang Ling Chen Cheng
Email king AT cse.cuhk.edu.hk lyu AT cse.cuhk.edu.hk gling AT cse.cuhk.edu.hk ccheng AT cse.cuhk.edu.hk
Office Rm 908 Rm 927 Rm 1024 Rm 1024
Telephone 3943 8398 3943 8429 3943 4252 3943 4252
Office Hour(s) TBA 10:00-12:00 Tuesday TBA TBA

Note: This class will be taught in English. Homework assignments and examinations will be conducted in English.

Syllabus

The pdf files are created in Acrobat 6.0. Please obtain the correct version of the Acrobat Reader from Adobe.

Week Date Topics Tutorials Homework & Events Resources
1 2/9 Introduction and Motivation

01.pptx
No Tutorial Ch. 1 of MMDS
2 9/9 MapReduce

02-MapReduce.pdf




Ch. 2 of MMDS
Ch. 6 of MMDS
3 16/9 Locality Sensitive Hashing

03-lsh.pdf


Ch. 3 of MMDS
4 23/9 Mining Data Streams

04-stream.pdf
Ch. 4 of MMDS
5 30/9 Scalable Clustering

05-clustering.pdf
Ch. 7 of MMDS
6 7/10 Dimensionality Reduction

06-DR.pdf
Ch. 11 of MMDS
7 14/10 Public Holiday
8 21/10 Recommender systems/Matrix Factorization

07-mf.pdf
Ch. 9 of MMDS
9 28/10 Massive Link Analysis

08-link.pdf
Ch. 5 of MMDS
10 4/11 Mid-term
11 11/11 Analysis of Massive Graph

09-graph.pdf
Ch. 10 of MMDS
12 18/11 Large Scale SVM

10-svm.pdf

SVM tutorial
13 25/11 Online Learning

11-ol.pdf
Online learning survey

Class Project

Class Project Presentation Schedule

Class Project Presentation Requirements

Examination Matters

Examination Schedule

Time Venue Notes
Midterm Examination Nov. 4, 9:30am-12:00 noon TBA TBA
Final Examination TBA TBA TBA

Written Midterm Matters

  1. The midterm will test your knowledge of the materials.
  2. Answer all questions using the answer booklet. There will be more available at the venue if needed.
  3. Write legibly. Anything we cannot decipher will be considered incorrect.
  4. One A4-sized cheat-sheet page.

Grade Assessment Scheme

Homework
Assignments
Mid-term
Examination
Project
20% 30% 50%
  1. Assignments (20%)
    1. Written assignments
    2. Coding
  2. Mid-term Examination (30%)
  3. Project (50%)
    1. Proposal
    2. Presentations
    3. Report

Reference Books

FAQ

  1. Q: What is departmental guideline for plagiarism?
    A: If a student is found plagiarizing, his/her case will be reported to the Department Discipline Committee. If the case is proven after deliberation, the student will automatically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others' works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender. A student caught plagiarizing during tests or examinations will be reported to the Faculty Office and appropriate disciplinary authorities for further action, in addition to failing the course.

Resources

Big Data Analytics

Graph Mining

Link Analysis

Learning to Rank

Recommender Systems

Human Computation/Social Games

Opinion Mining/Sentiment Analysis

Visualization

Programming