getcertified4sure.com

Professional-Data-Engineer Exam

Top Tips Of Replace Professional-Data-Engineer Test Questions




Master the Professional-Data-Engineer Google Professional Data Engineer Exam content and be ready for exam day success quickly with this Ucertify Professional-Data-Engineer actual test. We guarantee it!We make it a reality and give you real Professional-Data-Engineer questions in our Google Professional-Data-Engineer braindumps.Latest 100% VALID Google Professional-Data-Engineer Exam Questions Dumps at below page. You can use our Google Professional-Data-Engineer braindumps and pass your exam.

Online Google Professional-Data-Engineer free dumps demo Below:

NEW QUESTION 1

Cloud Bigtable is Google's Big Data database service.

  • A. Relational
  • B. mySQL
  • C. NoSQL
  • D. SQL Server

Answer: C

Explanation:
Cloud Bigtable is Google's NoSQL Big Data database service. It is the same database that Google uses for services, such as Search, Analytics, Maps, and Gmail.
It is used for requirements that are low latency and high throughput including Internet of Things (IoT), user analytics, and financial data analysis.
Reference: https://cloud.google.com/bigtable/

NEW QUESTION 2

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

  • A. Increase the share of the test sample in the train-test split.
  • B. Try to collect more data and increase the size of your dataset.
  • C. Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
  • D. Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.

Answer: D

NEW QUESTION 3

You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules:
Professional-Data-Engineer dumps exhibit No interaction by the user on the site for 1 hour
Professional-Data-Engineer dumps exhibit Has added more than $30 worth of products to the basket
Professional-Data-Engineer dumps exhibit Has not completed a transaction
You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

  • A. Use a fixed-time window with a duration of 60 minutes.
  • B. Use a sliding time window with a duration of 60 minutes.
  • C. Use a session window with a gap time duration of 60 minutes.
  • D. Use a global window with a time based trigger with a delay of 60 minutes.

Answer: D

NEW QUESTION 4

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You’ve loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.
What should you do?

  • A. Select random samples from the tables using the RAND() function and compare the samples.
  • B. Select random samples from the tables using the HASH() function and compare the samples.
  • C. Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sortin
  • D. Compare the hashes of each table.
  • E. Create stratified random samples using the OVER() function and compare equivalent samples from each table.

Answer: B

NEW QUESTION 5

Your company needs to upload their historic data to Cloud Storage. The security rules don’t allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing
on-premises applications every day. What should they do?

  • A. Execute gsutil rsync from the on-premises servers.
  • B. Use Cloud Dataflow and write the data to Cloud Storage.
  • C. Write a job template in Cloud Dataproc to perform the data transfer.
  • D. Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.

Answer: B

NEW QUESTION 6

You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now automatically creates a new table daily in BigQuery in the format app_events_YYYYMMDD. You want to query all of the tables for the past 30 days in legacy SQL. What should you do?

  • A. Use the TABLE_DATE_RANGE function
  • B. Use the WHERE_PARTITIONTIME pseudo column
  • C. Use WHERE date BETWEEN YYYY-MM-DD AND YYYY-MM-DD
  • D. Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD

Answer: A

Explanation:
Reference:
https://cloud.google.com/blog/products/gcp/using-bigquery-and-firebase-analytics-to-understandyour-mobile-ap

NEW QUESTION 7

You are building a model to make clothing recommendations. You know a user’s fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available.
How should you use this data to train the model?

  • A. Continuously retrain the model on just the new data.
  • B. Continuously retrain the model on a combination of existing data and the new data.
  • C. Train on the existing data while using the new data as your test set.
  • D. Train on the new data while using the existing data as your test set.

Answer: D

NEW QUESTION 8

You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?

  • A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.
  • B. Create encryption keys in Cloud Key Management Servic
  • C. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
  • D. Create encryption keys locall
  • E. Upload your encryption keys to Cloud Key Management Servic
  • F. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
  • G. Create encryption keys in Cloud Key Management Servic
  • H. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.

Answer: C

NEW QUESTION 9

What Dataflow concept determines when a Window's contents should be output based on certain criteria being met?

  • A. Sessions
  • B. OutputCriteria
  • C. Windows
  • D. Triggers

Answer: D

Explanation:
Triggers control when the elements for a specific key and window are output. As elements arrive, they are put into one or more windows by a Window transform and its associated WindowFn, and then passed to the associated Trigger to determine if the Windows contents should be output.
Reference:
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/windowing/Tri

NEW QUESTION 10

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

  • A. Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
  • B. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
  • C. Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
  • D. Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.

Answer: D

NEW QUESTION 11

Which of these statements about exporting data from BigQuery is false?

  • A. To export more than 1 GB of data, you need to put a wildcard in the destination filename.
  • B. The only supported export destination is Google Cloud Storage.
  • C. Data can only be exported in JSON or Avro format.
  • D. The only compression option available is GZIP.

Answer: C

Explanation:
Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.
Reference: https://cloud.google.com/bigquery/docs/exporting-data

NEW QUESTION 12

When using Cloud Dataproc clusters, you can access the YARN web interface by configuring a browser to connect through a proxy.

  • A. HTTPS
  • B. VPN
  • C. SOCKS
  • D. HTTP

Answer: C

Explanation:
When using Cloud Dataproc clusters, configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through an SSH tunnel.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#interfaces

NEW QUESTION 13

You work for an economic consulting firm that helps companies identify economic trends as they happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you can combine it with other data in BigQuery as cheaply as possible. What should you do?

  • A. Load the data every 30 minutes into a new partitioned table in BigQuery.
  • B. Store and update the data in a regional Google Cloud Storage bucket and create a federated data source in BigQuery
  • C. Store the data in Google Cloud Datastor
  • D. Use Google Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Cloud Datastore
  • E. Store the data in a file in a regional Google Cloud Storage bucke
  • F. Use Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.

Answer: A

NEW QUESTION 14

You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Stackdriver Log Viewer. What are the two most likely causes of this problem? Choose 2 answers.

  • A. Publisher throughput quota is too small.
  • B. Total outstanding messages exceed the 10-MB maximum.
  • C. Error handling in the subscriber code is not handling run-time errors properly.
  • D. The subscriber code cannot keep up with the messages.
  • E. The subscriber code does not acknowledge the messages that it pulls.

Answer: CD

NEW QUESTION 15

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

  • A. Subsample your test dataset.
  • B. Subsample your training dataset.
  • C. Increase the number of input features to your model.
  • D. Increase the number of layers in your neural network.

Answer: D

Explanation:
Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-9f5d1c6f407d

NEW QUESTION 16

Your United States-based company has created an application for assessing and responding to user actions. The primary table’s data volume grows by 250,000 records per second. Many third parties use your application’s APIs to build the functionality into their own frontend applications. Your application’s APIs should comply with the following requirements:
Professional-Data-Engineer dumps exhibit Single global endpoint
Professional-Data-Engineer dumps exhibit ANSI SQL support
Professional-Data-Engineer dumps exhibit Consistent access to the most up-to-date data What should you do?

  • A. Implement BigQuery with no region selected for storage or processing.
  • B. Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
  • C. Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.
  • D. Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.

Answer: B

NEW QUESTION 17

Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

  • A. Create a file on a shared file and have the application servers write all bid events to that fil
  • B. Process the file with Apache Hadoop to identify which user bid first.
  • C. Have each application server write the bid events to Cloud Pub/Sub as they occu
  • D. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.
  • E. Set up a MySQL database for each application server to write bid events int
  • F. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.
  • G. Have each application server write the bid events to Google Cloud Pub/Sub as they occu
  • H. Use a pull subscription to pull the bid events using Google Cloud Dataflo
  • I. Give the bid for each item to the userIn the bid event that is processed first.

Answer: C

NEW QUESTION 18

You’re training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you’ve discovered that the dataset contains latitude and longtitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you’d like to engineer a feature that incorporates this physical dependency.
What should you do?

  • A. Provide latitude and longtitude as input vectors to your neural net.
  • B. Create a numeric column from a feature cross of latitude and longtitude.
  • C. Create a feature cross of latitude and longtitude, bucketize at the minute level and use L1 regularization during optimization.
  • D. Create a feature cross of latitude and longtitude, bucketize it at the minute level and use L2 regularization during optimization.

Answer: B

Explanation:
Reference https://cloud.google.com/bigquery/docs/gis-data

NEW QUESTION 19

What are two of the characteristics of using online prediction rather than batch prediction?

  • A. It is optimized to handle a high volume of data instances in a job and to run more complex models.
  • B. Predictions are returned in the response message.
  • C. Predictions are written to output files in a Cloud Storage location that you specify.
  • D. It is optimized to minimize the latency of serving predictions.

Answer: BD

Explanation:
Online prediction
Optimized to minimize the latency of serving predictions. Predictions returned in the response message.
Batch prediction
Optimized to handle a high volume of instances in a job and to run more complex models. Predictions written to output files in a Cloud Storage location that you specify.
Reference:
https://cloud.google.com/ml-engine/docs/prediction-overview#online_prediction_versus_batch_prediction

NEW QUESTION 20

You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once, and must be ordered within windows of 1 hour. How should you design the solution?

  • A. Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.
  • B. Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.
  • C. Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.
  • D. Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.

Answer: C

NEW QUESTION 21

Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

  • A. Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes
  • B. Create a Stackdriver Monitoring dashboard based on the BigQuery metric slots/allocated_for_project
  • C. Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric
  • D. Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric

Answer: D

NEW QUESTION 22

You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?

  • A. Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP typ
  • B. Reload the data.
  • C. Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numericvalues from the column TS for each ro
  • D. Reference the column TS instead of the column DT from now on.
  • E. Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP value
  • F. Reference the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.
  • G. Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN typ
  • H. Reload all data in append mod
  • I. For each appended row, set the value of IS_NEW to tru
  • J. For future queries, reference the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.
  • K. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP value
  • L. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP typ
  • M. Reference the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now o
  • N. In the future, new data is loaded into the table NEW_CLICK_STREAM.

Answer: D

NEW QUESTION 23
......

100% Valid and Newest Version Professional-Data-Engineer Questions & Answers shared by 2passeasy, Get Full Dumps HERE: https://www.2passeasy.com/dumps/Professional-Data-Engineer/ (New 239 Q&As)