Hands-On Big Data Analytics with PySpark [2019]

English | April 23rd, 2019 | ISBN: 183864413X | 182 pages | 5.36 MB

Copyright © 2019 Packt Publishing

Authors:

  • Colibri Digital is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie.
  • Rudy Lai is the founder of QuantCopy, a sales acceleration start-up using AI to write sales emails to prospective customers.
  • Bartłomiej Potaczek is a software engineer working for Schibsted Tech

What you will learn
• Get practical big data experience while working on messy datasets
• Analyze patterns with Spark SQL to improve your business intelligence
• Use PySpark’s interactive shell to speed up development time
• Create highly concurrent Spark programs by leveraging immutability
• Discover ways to avoid the most expensive operation in the Spark API: the shuffle operation
• Re-design your jobs to use reduceByKey instead of groupBy
• Create robust processing pipelines by testing Apache Spark jobs

Link download ebook