You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?
Files starting with '_' are considered 'hidden' like unix files starting with '.'.
# characters are allowed in HDFS file names.
Analyze each scenario below and indentify which best describes the behavior of the default partitioner?
The default partitioner computes a hash value for the key and assigns the partition based on this result.
The default Partitioner implementation is called HashPartitioner. It uses the hashCode() method of the key objects modulo the number of partitions total to determine which partition to send a given (key, value) pair to.
In Hadoop, the default partitioner is HashPartitioner, which hashes a record's key to determine which partition (and thus which reducer) the record belongs in.The number of partition is then equal to the number of reduce tasks for the job.
You need to move a file titled ''weblogs'' into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?
You need to create a job that does frequency analysis on input dat
a. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?
Which HDFS command copies an HDFS file named foo to the local filesystem as localFoo?
Currently there are no comments in this discussion, be the first to comment!