CS-GY 9223 or 6333

Massive Data Analysis

Juliana Freire

Polytechnic School of Engineering

Lecture time and location: Mondays, 1pm-3:25pm at 2 Metrotech Center, room 9.011
Office hours time and location: Mon 10:00am-11:00am @ 2 Metrotech Center, room 10.097
Broadway, room 1005

TA: Aditi Nakta (an1535 at nyu.edu)
Office hours: Tuesday, 1 - 3 pm - 2 MTC room 10.98D

Course Overview

Big Data requires the storage, organization, and processing of data at a scale and efficiency that go well beyond the capabilities of conventional information technologies. In this course, we will study the state of the art in big data management: we will learn about algorithms, techniques and tools needed to support big data processing. In addition, we will examine real applications that require massive data analysis and how they can be implemented on Big Data platforms. The course will consist of lectures based both on textbook material and scientific papers. It will also include programming assignments that will provide students with hands-on experience on building data-intensive applications using existing Big Data platforms, including Amazon AWS. Besides lectures given by the instructor, we will also have guest lectures by experts in some of the topics we will cover.

Syllabus and Estimated Times

The course consists of three main modules where we will tentatively cover the following topics: The schedule for classes, lecture notes, and required reading will be available at http://www.vistrails.org/index.php/Course:_Massive_Data_Analysis_2014

Principal Texts

The readings for this course will consist of research papers and two recent books that are freely-available for download on the Web:

Workload and Requirements

The workload will consist of online quizzes, using the Gradiance system and programming assignments.

For programming assignments, the instructor and graders will run your code and your grade will depend on the correctness of the outputs. Therefore, you must strictly follow the guidelines given for the programming assignments to ensure we will be able to run them.

Programming assignments must be done individually, unless otherwise noted. Students must design and program their own solutions -- copying from other students or any other source is not acceptable.

Students are required to follow the following rules about academic honesty: http://www.cs.nyu.edu/web/Academic/Graduate/academic_dishonesty.html

Lateness policy: Late quizzes, assignments, or project will not be accepted without a note from your physician or from your employer.


The grade for the course will be based on:

Gradiance Quizzes

You will need to access Gradiance for your quizzes at http://www.newgradiance.com/services. Here's a link to a guide on how to use Gradiance: http://www.gradiance.com/pub/stud-guide.html.

Register and use the class token 1AEF5F24. Make sure to use your official NYU email and id when you register.

The quizzes appear to be sets of mutiple-choice questions. But you should think of the questions as if you were asked to work an ordinary, "long-answer" question. Work that question and keep the answer handy on a piece of paper. The multiple-choice question will typically sample your knowledge of the correct answer. You can try the work as many times as you like, and we hope everyone will eventually get 100%. Also notice that you have to wait 10 minutes between openings, so brute-force random guessing will not work.


If you need to reach the instructor, send email to bigdata.nyu AT gmail DOT com
The class mailing list is: http://www.cs.nyu.edu/mailman/listinfo/csci_ga_2568_001_sp14


We thank Amazon for the AWS in Education Coursework grant which gives students taking this course access to their cloud infrastructure.
Juliana Freire
Last modified: Tue Oct 28 14:19:23 EDT 2014