PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Setting up Spark 2.2 on CDH 5

I did not have the chance to install CDH 5.13 yet, but CDH 5.12 ships with a relatively old Spark 1.6.

It is a good idea to upgrade it to latest Spark 2.2 and cloudera provides a special package for this purpose. You can download it and view the installation instructions here and here. You can also download the installation file directly from my site. Spark 2.2 can also coexist with the older 1.6 version.


You will need CDH 5.8-5.12 (5.13 is not officially supported at the time of writing this post), JSK 8, Cloudera manager 5.8.3 or above and scala 2.11.

See here how to install scala. If you have a large cluster you may want to install using a configuration management tool like puppet or chef.


Copy the CSD file to /opt/cloudera/csd on the host where Cloudera manager runs.

cd /opt/cloudera/csd
chown cloudera-scm:cloudera-scm SPARK2_ON_YARN-2.2.0.cloudera1.jar
chmod 644 SPARK2_ON_YARN-2.2.0.cloudera1.jar

Restart SCM server:

service cloudera-scm-server restart

Also restart Cloudere management services.

Now, add the parcel repository for Spark 2.2:

Go to Hosts -> Parcels. You will see a new line, showing Spark2:

View full size image

Download, distribute and activate it.

From the cluster page, click the actions button and choose “add service”:

Choose Spark2:

View full size image

In the next page, choose the services that Spark2 depends on. I chose to include Hive in case I will want to access Hive tables from Spark:

View full size image

The next page takes care of TLS encryption data, if you did not set up TLS in your cluster you should just skip it.

Now assign roles. You should assign gateway roles to all the hosts in the cluster:

View full size image

After some processing, your new Spark2 service will start. Go back to the cluster page and restart any stale services.

FInally, you can see Spark and Spark2 running at the same time:

You may delete Spark 1.6 now if you do not need it (for example, running Hive on spark is only supported with Spark 1.6).

Let’s block ads! (Why?)