hadoop
  • Hadoop and Big Data
  • Introduction to Hadoop and Big Data
  • HDFS and MapReduce
  • Pig
  • Hive
  • Data storage
  • Data ingestion
  • Apache Spark
  • Apache Spark - DataFrames
  • Introduction to Data Engineering
  • Links
  • References
Powered by GitBook
On this page

Links

Course Materials

https://legacy.gitbook.com/book/juheck/hadoop-and-big-data/details

Course S3 bucket - No public access

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/

Databricks CE account creation page

https://accounts.cloud.databricks.com/registration.html#signup/community

Spark Demo Notebook

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/demo_spark.dbc

Spark Exercises

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/exercise_spark_rdd.dbc

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/exercise_spark_dataframes.dbc

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/exercise_spark_dataframes2.dbc

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/exercise_spark_dataframes3.dbc

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/Classroom-Setup.dbc

https://s3-us-west-1.amazonaws.com/julienheck/hadoop/7_spark/DBTest-Setup-Stub.dbc

Datasets

movielens 100k

  • u.data: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.data

  • u.item: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.item

  • u.user: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.user

  • u.genre: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.genre

  • u.occupation: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.occupation

crime data Los Angeles

  • Crime Data from 2010 to present: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/crime_data_la/Crime_Data_from_2010_to_Present.csv

  • crime-data-la: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/crime_data_la/crime_data_la.csv

  • crime-data-code-name: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/crime_data_la/crime_data_code_name.csv

  • crime-data-area-name: https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/crime_data_la/crime_data_area_name.csv

PreviousIntroduction to Data EngineeringNextReferences

Last updated 6 years ago