Linux,Hadoop Tips and Tricks: 2017

Monday, September 4, 2017

spark job - Caused by: java.lang.OutOfMemoryError: Java heap space

User reading a large size json file (280GB) using sqlContext.json.

val loadJsonFile = sqlContext.jsonFile("/data/filename.txt")

Error

org.apache.spark.SparkException: Job aborted
Caused by: java.lang.OutOfMemoryError: Java heap space

Executed application with an increased driver memory for fix the issue

spark-shell --driver-memory 3g

Thursday, August 31, 2017

java.io.IOException: Incompatible clusterIDs

Datanode service not starting up and showing below error in log

WARN org.apache.hadoop.hdfs.server.common.Storage

java.io.IOException: Incompatible clusterIDs in /data/dn: namenode clusterID = cluster5; datanode clusterID = cluster3

This is issue because of the datanode was previously part of a different cluster and added to present cluster. For resolve the issue

Backup the existing current directory, just incase we need it.

1. mv /data/dn/current /data/dn/current_bk 2. Then restart the datanode and see if it joins the cluster OK. 3. If the datanode joins the cluster OK, remove all the current_bk directory to allow their space to be used by new blocks.