Problem Scenario 92 : You have been given a spark scala application, which is bundled in jar named hadoopexam.jar.
Your application class name is com.hadoopexam.MyTask
You want that while submitting your application should launch a driver on one of the cluster node.
Please complete the following command to submit the application.
spark-submit XXX -master yarn \
YYY SSPARK HOME/lib/hadoopexam.jar 10
Problem Scenario 95 : You have to run your Spark application on yarn with each executor Maximum heap size to be 512MB and Number of processorcores to allocate on each executor will be 1 and Your main application required three values as input arguments V1 V2 V3.
Please replace XXX, YYY, ZZZ
./bin/spark-submit -class com.hadoopexam.MyTask --master yarn-cluster--num-executors 3 --driver-memory 512m XXX YYY lib/hadoopexam.jarZZZ
Problem Scenario 96 : Your spark application required extra Java options as below. -XX:+PrintGCDetails-XX:+PrintGCTimeStamps
Please replace the XXX values correctly
./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=talse --conf XXX hadoopexam.jar
Problem Scenario 76 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderid , order_date , ordercustomerid, order_status}
.....
Please accomplish following activities.
1. Copy "retail_db.orders" table to hdfs in a directory p91_orders.
2. Once data is copied to hdfs, using pyspark calculate the number of order for each status.
3. Use all the following methods to calculate the number of order for each status. (You need to know all these functions and its behavior for real exam)
- countByKey()
-groupByKey()
- reduceByKey()
-aggregateByKey()
- combineByKey()
Problem Scenario 94 : You have to run your Spark application on yarn with each executor 20GB and number of executors should be 50.Please replace XXX, YYY, ZZZ
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
-class com.hadoopexam.MyTask \
xxx\
-deploy-mode cluster \ # can be client for client mode
YYY\
222 \
/path/to/hadoopexam.jar \
1000
Submit Cancel