CS9223 - Massive Data Analysis - Fall 2013
Instructors: Juliana Freire (juliana AT poly DOT edu) and Jerome Simeon (simeon AT us DOT ibm DOT com)
Class time and location: Monday 12:00pm-2:30pm @ 2MTC 9.007
Office hours time and location: Tuesday 1:00pm-2:00pm @ 2MTC room 10.097
Our class wiki is: http://www.vistrails.org/index.php/Course:_Big_Data_Analysis.
Big Data requires the storage,
organization, and processing of data at a scale and efficiency that go
well beyond the capabilities of conventional information
technologies. In this course, we will review the state of the art in
massive data analysis. In addition to covering the specifics of
different platforms, models, and languages, we will also look at real
applications that perform massive data analysis and how they can be
implemented on Big Data platforms. Topics we will discuss include:
Map reduce/Hadoop, NoSQL stores, languages such as Pig Latin and JAQL,
large-scale data mining and visualization. The course will consist of
lectures based both on textbook material and scientific papers. It
will also include programming assignments that will provide students
with hands-on experience on building data-intensive applications using
existing Big Data tools and platforms.
Besides lectures given by the instructors, we will also have
guest lectures by experts in statistics, information retrieval and
The readings for this course will consist of research papers and two recent books that are freely-available for download on the Web:
A course in database systems, covering application programming in SQL and other database-related languages such as XQuery; a course on algorithms and data structures; good programming skills.
The topics we will cover include:
A preliminary schedule for the classes and required reading is available at http://www.vistrails.org/index.php/Course:_Big_Data_Analysis
- Map-Reduce and its ecosystem
- Databases systems and big data
- Scalable algorithms and system for mining tasks such as
similarity search, graph analysis, clustering, frequent itemset mining, as
well as for visualization and information retrieval.
Assignments handed in on or before the due time will be graded
for full credit. No late assignment will be accepted.
Programming assignments must follow the guidelines given and they
will be graded based on their outputs.
- Quizzes: We will have Gradiance quizzes.
- Programming assignments
The grade for the course will be based on:
- Programming Assignments (50%)
- Quizzes (15%)
- Final Exam (35%)
You will need to access Gradiance for your quizzes at http://www.newgradiance.com/services. Here's a link to a guide on how to use Gradiance: http://www.gradiance.com/pub/stud-guide.html.
Register and use the class token 00B06796
The quizzes appear to be sets of mutiple-choice questions. But you
should think of the questions as if you were asked to work an
ordinary, "long-answer" question. Work that question and keep the
answer handy on a piece of paper. The multiple-choice question will
typically sample your knowledge of the correct answer.
You can try the work as many times as you like, and we hope everyone
will eventually get 100%. Also notice that you have to wait 10
minutes between openings, so brute-force random guessing will not
If you need to reach the instructors, send email to firstname.lastname@example.org
We thank Amazon for the AWS in Education Coursework grant that allowed the students to use their cloud infrastructure.
Last modified: Fri Oct 25 16:34:28 EDT 2013