Spark for Data Science with Python : From 0 to 1

Cart (0 Items)
Subtotal: $0.00
Your cart is empty!

Displaying 1-4 of 4 result(s).

About Course

- What's Spark? If you are an analyst or a data scientist, you're used to having multiple systems for working with data. SQL, Python, R, Java, etc. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.
- Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease.
- Machine Learning and Data Science : Spark's core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We'll cover a variety of datasets and algorithms including PageRank, MapReduce and Graph datasets.

Curriculum

Section 1: You, This Course and Us
	Lecture 1	You, This Course and Us	02:15
Section 2: Introduction to Spark
	Lecture 2	What does Donald Rumsfeld have to do with data analysis?	08:45
	Lecture 3	Why is Spark so cool?	12:23
	Lecture 4	An introduction to RDDs - Resilient Distributed Datasets	09:39
	Lecture 5	Built-in libraries for Spark	15:37
	Lecture 6	Installing Spark	06:42
	Lecture 7	The PySpark Shell	04:50
	Lecture 8	Transformations and Actions	13:33
	Lecture 9	See it in Action : Munging Airlines Data with PySpark - I	10:13
Section 3: Resilient Distributed Datasets
	Lecture 10	RDD Characteristics: Partitions and Immutability	12:35
	Lecture 11	RDD Characteristics: Lineage, RDDs know where they came from	06:06
	Lecture 12	What can you do with RDDs?	11:08
	Lecture 13	Create your first RDD from a file	16:10
	Lecture 14	Average distance travelled by a flight using map() and reduce() operations	05:50
	Lecture 15	Get delayed flights using filter(), cache data using persist()	05:23
	Lecture 16	Average flight delay in one-step using aggregate()	15:10
	Lecture 17	Frequency histogram of delays using countByValue()	03:26
	Lecture 18	See it in Action : Analyzing Airlines Data with PySpark - II	06:25
Section 4: Advanced RDDs: Pair Resilient Distributed Datasets
	Lecture 19	Special Transformations and Actions	14:45
	Lecture 20	Average delay per airport, use reduceByKey(), mapValues() and join()	18:11
	Lecture 21	Average delay per airport in one step using combineByKey()	11:53
	Lecture 22	Get the top airports by delay using sortBy()	04:34
	Lecture 23	Lookup airport descriptions using lookup(), collectAsMap(), broadcast()	14:03
	Lecture 24	See it in Action : Analyzing Airlines Data with PySpark - III	04:58
Section 5: Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
	Lecture 25	Get information from individual processing nodes using accumulators	13:35
	Lecture 26	See it in Action : Using an Accumulator variable	02:40
	Lecture 27	Long running programs using spark-submit	05:58
	Lecture 28	See it in Action : Running a Python script with Spark-Submit	03:58
	Lecture 29	Behind the scenes: What happens when a Spark script runs?	14:30
	Lecture 30	Running MapReduce operations	13:44
	Lecture 31	See it in Action : MapReduce with Spark	02:05
Section 6: Java and Spark
	Lecture 32	The Java API and Function objects	15:58
	Lecture 33	Pair RDDs in Java	04:49
	Lecture 34	Running Java code	03:49
	Lecture 35	Installing Maven	02:20
	Lecture 36	See it in Action : Running a Spark Job with Java	05:08
Section 7: PageRank: Ranking Search Results
	Lecture 37	What is PageRank?	16:44
	Lecture 38	The PageRank algorithm	06:15
	Lecture 39	Implement PageRank in Spark	12:01
	Lecture 40	Join optimization in PageRank using Custom Partitioning	07:27
	Lecture 41	See it Action : The PageRank algorithm using Spark	03:46
Section 8: Spark SQL
	Lecture 42	Dataframes: RDDs + Tables	16:04
	Lecture 43	See it in Action : Dataframes and Spark SQL	04:49
Section 9: MLlib in Spark: Build a recommendations engine
	Lecture 44	Collaborative filtering algorithms	12:19
	Lecture 45	Latent Factor Analysis with the Alternating Least Squares method	11:39
	Lecture 46	Music recommendations using the Audioscrobbler dataset	07:51
	Lecture 47	Implement code in Spark using MLlib	16:05
Section 10: Spark Streaming
	Lecture 48	Introduction to streaming	09:55
	Lecture 49	Implement stream processing in Spark using Dstreams	10:54
	Lecture 50	Stateful transformations using sliding windows	09:26
	Lecture 51	See it in Action : Spark Streaming	04:17
Section 11: Graph Libraries
	Lecture 52	The Marvel social network using Graphs	18:01

What you will get from this course?

- Use Spark for a variety of analytics and Machine Learning tasks
- Implement complex algorithms like PageRank or Music Recommendations
- Work with a variety of datasets from Airline delays to Twitter, Web graphs, Social networks and Product Ratings
- Use all the different features and libraries of Spark : RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming and GraphX
Lot's of cool stuff ..
- Music Recommendations using Alternating Least Squares and the Audioscrobbler dataset
- Dataframes and Spark SQL to work with Twitter data
- Using the PageRank algorithm with Google web graph dataset
- Using Spark Streaming for stream processing
- Working with graph data using the Marvel Social network dataset
All the Spark basic and advanced features:
- Resilient Distributed Datasets, Transformations (map, filter, flatMap), Actions (reduce, aggregate)
- Pair RDDs , reduceByKey, combineByKey
- Broadcast and Accumulator variables
- Spark for MapReduce
- The Java API for Spark
- Spark SQL, Spark Streaming, MLlib and GraphFrames (GraphX for Python)

Who should buy this course?

- Analysts who want to leverage Spark for analyzing interesting datasets
- Data Scientists who want a single engine for analyzing and modelling data as well as productionizing it.
- Engineers who want to use a distributed computing engine for batch or stream processing or both.
- The course assumes knowledge of Python. You can write Python code directly in the PySpark shell. If you already have IPython Notebook installed, we'll show you how to configure it for Spark
- For the Java section, we assume basic knowledge of Java. An IDE which supports Maven, like IntelliJ IDEA/Eclipse would be helpful
- All examples work with or without Hadoop. If you would like to use Spark with Hadoop, you'll need to have Hadoop installed (either in pseudo-distributed or cluster mode).

Discussion

Provided by

L

Loonycorn

Loonycorn is us, Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh. Between the four of us, we have studied at Stanford, IIM Ahmedabad, the IITs and have spent years (decades, actually) working in tech, in the Bay Area, New York, Singapore and Bangalore.

Spark for Data Science with Python : From 0 to 1

About Course

Curriculum

What you will get from this course?

Who should buy this course?

About Kachhua.com

Policy Info

Usefull

Other