STSCI 5065

STSCI 5065

Course information provided by the Courses of Study 2019-2020.

Concepts, challenges, and industry trends of big data, with a focus on the Hadoop system. Topics include: basics of the Apache Hadoop platform and Hadoop ecosystem; the Hadoop distributed file system (HDFS); MapReduce or its alternative, a parallel programming model for distributed processing of large data sets; common big data tools, such as Pig (a procedural data processing language for Hadoop parallel computation), Hive (a declarative SQL-like language to handle Hadoop jobs), HBase (the most popular NoSQL database), and YARN; case studies; and  integration of Hadoop with statistical software packages, e.g., SAS and R.

When Offered Spring.

Permission Note Enrollment preference given to: students in the MPS program in Applied Statistics.
Prerequisites/Corequisites Prerequisite: knowledge of a general purpose computer programming language, such as JAVA, Python, Ruby, or C++, or at least taking STSCI 4060 in parallel with this course;  STSCI 5060 or basic SQL knowledge; STSCI 5010 or basic knowledge of SAS programming; STSCI 4520 or STSCI 4030 or basic knowledge of R programming.

View Enrollment Information

Syllabi: none
  •   Regular Academic Session.  Choose one lecture and one laboratory.

  • 3 Credits Stdnt Opt

  • 11722 STSCI 5065   LEC 001

    • MW Malott Hall 251
    • Jan 21 - May 5, 2020
    • Yang, X

  • Instruction Mode: Hybrid - Online & In Person
    Prerequisites: Knowledge of a general purpose computer programming language, such as JAVA, Python, Ruby, or C++, or at least taking STSC 4060 in parallel with this course; STSCI 5060 or basic SQL knowledge; STSCI 5010 or basic knowledge of SAS programming; STSCI 3520 or STSCI 4030 or basic knowledge of R programming.

  • 11724 STSCI 5065   LAB 401

    • F Malott Hall 251
    • Jan 21 - May 5, 2020
    • Yang, X

  • Instruction Mode: Hybrid - Online & In Person