MultipleChoice
Problem Scenario 71 :
Write down a Spark script using Python,
In which it read a file 'Content.txt' (On hdfs) with following content.
After that split each row as (key, value), where key is first word in line and entire line as value.
Filter out the empty lines.
And save this key value in 'problem86' as Sequence file(On hdfs)
Part 2 : Save as sequence file , where key as null and entire line as value. Read back the stored sequence files.
Content.txt
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
OptionsMultipleChoice
Problem Scenario 70 : Write down a Spark Application using Python, In which it read a file 'Content.txt' (On hdfs) with following content. Do the word count and save the results in a directory called 'problem85' (On hdfs)
Content.txt
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
OptionsMultipleChoice
Problem Scenario 69 : Write down a Spark Application using Python,
In which it read a file 'Content.txt' (On hdfs) with following content.
And filter out the word which is less than 2 characters and ignore all empty lines.
Once doen store the filtered data in a directory called 'problem84' (On hdfs)
Content.txt
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
OptionsMultipleChoice
Problem Scenario 9 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.
1. Import departments table in a directory.
2. Again import departments table same directory (However, directory already exist hence it should not overrride and append the results)
3. Also make sure your results fields are terminated by '|' and lines terminated by '\n\
OptionsMultipleChoice
Problem Scenario 41 : You have been given below code snippet.
val aul = sc.parallelize(List (("a" , Array(1,2)), ("b" , Array(1,2))))
val au2 = sc.parallelize(List (("a" , Array(3)), ("b" , Array(2))))
Apply the Spark method, which will generate below output.
Array[(String, Array[lnt])] = Array((a,Array(1, 2)), (b,Array(1, 2)), (a(Array(3)), (b,Array(2)))
OptionsFillInTheBlank
Problem Scenario 67 : You have been given below code snippet.
lines = sc.parallelize(['lts fun to have fun,','but you have to know how.'])
M = lines.map( lambda x: x.replace(',7 ').replace('.',' 'J.replaceC-V ').lower())
r2 = r1.flatMap(lambda x: x.split())
r3 = r2.map(lambda x: (x, 1))
operation1
r5 = r4.map(lambda x:(x[1],x[0]))
r6 = r5.sortByKey(ascending=False)
r6.take(20)
Write a correct code snippet for operationl which will produce desired output, shown below. [(2, 'fun'), (2, 'to'), (2, 'have'), (1, its'), (1, 'know1), (1, 'how1), (1, 'you'), (1, 'but')]
MultipleChoice
Problem Scenario 94 : You have to run your Spark application on yarn with each executor 20GB and number of executors should be 50.Please replace XXX, YYY, ZZZ
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
-class com.hadoopexam.MyTask \
xxx\
-deploy-mode cluster \ # can be client for client mode
YYY\
222 \
/path/to/hadoopexam.jar \
1000
Options