Which of the following DataFrame methods is classified as a transformation?
transactionsDf.select('storeId').dropDuplicates().count()
Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.
transactionsDf.select(count('storeId')).dropDuplicates()
No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.
transactionsDf.dropDuplicates().agg(count('storeId'))
Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates
instead.
transactionsDf.distinct().select('storeId').count()
Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count
not represent the number of unique values in that column.
transactionsDf.select(distinct('storeId')).count()
False. There is no distinct method in pyspark.sql.functions.
Rosalind
2 months agoJosphine
20 days agoJerilyn
1 months agoJin
1 months agoDarell
2 months agoHerman
2 months agoElvera
23 days agoFelicitas
1 months agoIvette
1 months agoAdelina
2 months agoReena
21 hours agoAvery
5 days agoAugustine
7 days agoLashawnda
14 days agoRegenia
2 months agoTamekia
2 months agoRobt
2 months agoHalina
2 months agoTom
28 days agoFelice
29 days agoTalia
2 months agoTiera
2 months agoJonelle
3 months agoDavida
3 months agoVashti
3 months ago