New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 2 Question 29 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 29
Topic #: 2
[All Professional Data Engineer Questions]

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Joseph
4 months ago
More data might not fix the issue if it's overfitting.
upvoted 0 times
...
Ty
4 months ago
Regularization techniques could really help here!
upvoted 0 times
...
Verlene
4 months ago
Shouldn't increasing test size help? Not sure about that.
upvoted 0 times
...
Viva
4 months ago
Definitely sounds like overfitting to me.
upvoted 0 times
...
Misty
5 months ago
RMSE on train being higher than test is weird!
upvoted 0 times
...
Erick
5 months ago
I’m a bit confused about the train-test split; wouldn’t increasing the test size just make the training data smaller?
upvoted 0 times
...
Nickie
5 months ago
Collecting more data sounds good, but I feel like the model might just be overfitting instead.
upvoted 0 times
...
Ming
5 months ago
I think we practiced a question where increasing model complexity led to worse performance. Maybe regularization is the way to go?
upvoted 0 times
...
Lettie
5 months ago
I remember something about overfitting, but I'm not sure if that's the issue here since the test RMSE is lower.
upvoted 0 times
...
Asuncion
5 months ago
I'm a bit confused on how to approach this. The pairwise testing requirement is throwing me off, and I'm not sure which orthogonal array would be the best fit. I'll need to review my notes on this topic.
upvoted 0 times
...
Adria
5 months ago
This looks like a straightforward calculation problem. I'll need to figure out the total revenue and total costs, then find the difference to get the profit.
upvoted 0 times
...
Ma
5 months ago
Ah, I know this one! WMIC is the Windows Management Instrumentation Command-line tool, and it can definitely be used to inspect process details, including command-line arguments. I'm confident this is the right answer.
upvoted 0 times
...
Ryann
5 months ago
I'm a little confused by the options here. I know smoke detectors detect smoke, but what's VESDA? I'll have to guess on this one.
upvoted 0 times
...

Save Cancel