Monday, September 4, 2017

spark job - Caused by: java.lang.OutOfMemoryError: Java heap space

User reading a large size json file (280GB) using sqlContext.json.  

val loadJsonFile = sqlContext.jsonFile("/data/filename.txt") 

Error

org.apache.spark.SparkException: Job aborted
Caused by: java.lang.OutOfMemoryError: Java heap space 

Executed  application with an increased driver memory for fix the issue

spark-shell --driver-memory 3g

Thursday, August 31, 2017

java.io.IOException: Incompatible clusterIDs


Datanode service not starting up and showing below error in log

WARN org.apache.hadoop.hdfs.server.common.Storage 
java.io.IOException: Incompatible clusterIDs in /data/dn: namenode clusterID = cluster5; datanode clusterID = cluster3 

This is issue because of  the datanode was previously part of a different cluster and added to present cluster. For resolve the issue

Backup the existing current directory, just incase we need it.
1. mv /data/dn/current /data/dn/current_bk 2. Then restart the datanode and see if it joins the cluster OK. 3. If the datanode joins the cluster OK, remove all the current_bk directory to allow their space to be used by new blocks.