getcertified4sure.com

Professional-Data-Engineer Exam

Top Tips Of Most Up-to-date Professional-Data-Engineer Free Exam Questions




Want to know Actualtests Professional-Data-Engineer Exam practice test features? Want to lear more about Google Google Professional Data Engineer Exam certification experience? Study 100% Guarantee Google Professional-Data-Engineer answers to Renew Professional-Data-Engineer questions at Actualtests. Gat a success with an absolute guarantee to pass Google Professional-Data-Engineer (Google Professional Data Engineer Exam) test on your first attempt.

Free Professional-Data-Engineer Demo Online For Google Certifitcation:

NEW QUESTION 1

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

  • A. Grant the consultant the Viewer role on the project.
  • B. Grant the consultant the Cloud Dataflow Developer role on the project.
  • C. Create a service account and allow the consultant to log on with it.
  • D. Create an anonymized sample of the data for the consultant to work with in a different project.

Answer: C

NEW QUESTION 2

MJTelco is building a custom interface to share data. They have these requirements:
Professional-Data-Engineer dumps exhibit They need to do aggregations over their petabyte-scale datasets.
Professional-Data-Engineer dumps exhibit They need to scan specific time range rows with a very fast response time (milliseconds). Which combination of Google Cloud Platform products should you recommend?

  • A. Cloud Datastore and Cloud Bigtable
  • B. Cloud Bigtable and Cloud SQL
  • C. BigQuery and Cloud Bigtable
  • D. BigQuery and Cloud Storage

Answer: C

NEW QUESTION 3

Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

  • A. Set a single global window to capture all the data.
  • B. Set sliding windows to capture all the lagged data.
  • C. Use watermarks and timestamps to capture the lagged data.
  • D. Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Answer: B

NEW QUESTION 4

You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

  • A. Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
  • B. Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
  • C. Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
  • D. Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.

Answer: A

NEW QUESTION 5

You’ve migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average
200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you’d like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you’d like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?

  • A. Increase the size of your parquet files to ensure them to be 1 GB minimum.
  • B. Switch to TFRecords formats (app
  • C. 200MB per file) instead of parquet files.
  • D. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
  • E. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Answer: C

NEW QUESTION 6

Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

  • A. Put the data into Google Cloud Storage.
  • B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.
  • C. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
  • D. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.

Answer: B

NEW QUESTION 7

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:
Professional-Data-Engineer dumps exhibit Executing the transformations on a schedule
Professional-Data-Engineer dumps exhibit Enabling non-developer analysts to modify transformations
Professional-Data-Engineer dumps exhibit Providing a graphical tool for designing transformations
What should you do?

  • A. Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
  • B. Load each month’s CSV data into BigQuery, and write a SQL query to transform the data to a standard schem
  • C. Merge the transformed tables together with a SQL query
  • D. Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformatio
  • E. The Python code should be stored in a revision control system and modified as the incoming data’s schema changes
  • F. Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe.Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery

Answer: D

NEW QUESTION 8

What are two methods that can be used to denormalize tables in BigQuery?

  • A. 1) Split table into multiple tables; 2) Use a partitioned table
  • B. 1) Join tables into one table; 2) Use nested repeated fields
  • C. 1) Use a partitioned table; 2) Join tables into one table
  • D. 1) Use nested repeated fields; 2) Use a partitioned table

Answer: B

Explanation:
The conventional method of denormalizing data involves simply writing a fact, along with all its dimensions, into a flat table structure. For example, if you are dealing with sales transactions, you would write each individual fact to a record, along with the accompanying dimensions such as order and customer information.
The other method for denormalizing data takes advantage of BigQuery’s native support for nested and repeated structures in JSON or Avro input data. Expressing records using nested and repeated structures can provide a more natural representation of the underlying data. In the case of the sales order, the outer part of a JSON structure would contain the order and customer information, and the inner part of the structure would contain the individual line items of the order, which would be represented as nested, repeated elements.
Reference: https://cloud.google.com/solutions/bigquery-data-warehouse#denormalizing_data

NEW QUESTION 9

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

  • A. 500 TB
  • B. 1 GB
  • C. 1 TB
  • D. 500 GB

Answer: C

Explanation:
Cloud Bigtable is not a relational database. It does not support SQL queries, joins, or multi-row transactions. It is not a good solution for less than 1 TB of data.
Reference: https://cloud.google.com/bigtable/docs/overview#title_short_and_other_storage_options

NEW QUESTION 10

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

  • A. Export the records from the database as an Avro fil
  • B. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
  • C. Export the records from the database as an Avro fil
  • D. Copy the file onto a Transfer Appliance and send itto Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
  • E. Export the records from the database into a CSV fil
  • F. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storag
  • G. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console.
  • H. Export the records from the database as an Avro fil
  • I. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storag
  • J. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

Answer: A

NEW QUESTION 11

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30–90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?

  • A. Re-create the tables using DD
  • B. Partition the tables by a column containing a TIMESTAMP or DATE Type.
  • C. Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
  • D. Modify your pipeline to maintain the last 30–90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
  • E. Write an Apache Beam pipeline that creates a BigQuery table per da
  • F. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.

Answer: C

NEW QUESTION 12

You’re using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You’ve recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload.
What should you do?

  • A. Export Bigtable dump to GCS and run your analytical job on top of the exported files.
  • B. Add a second cluster to an existing instance with a multi-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.
  • C. Add a second cluster to an existing instance with a single-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.
  • D. Increase the size of your existing cluster twice and execute your analytics workload on your new resized cluster.

Answer: B

NEW QUESTION 13

Your company is loading comma-separated values (CSV) files into Google BigQuery. The data is fully imported successfully; however, the imported data is not matching byte-to-byte to the source file. What is the most likely cause of this problem?

  • A. The CSV data loaded in BigQuery is not flagged as CSV.
  • B. The CSV data has invalid rows that were skipped on import.
  • C. The CSV data loaded in BigQuery is not using BigQuery’s default encoding.
  • D. The CSV data has not gone through an ETL phase before loading into BigQuery.

Answer: B

NEW QUESTION 14

Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

  • A. Use Google Stackdriver Audit Logs to review data access.
  • B. Get the identity and access management IIAM) policy of each table
  • C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
  • D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Answer: C

NEW QUESTION 15

Cloud Dataproc charges you only for what you really use with billing.

  • A. month-by-month
  • B. minute-by-minute
  • C. week-by-week
  • D. hour-by-hour

Answer: B

Explanation:
One of the advantages of Cloud Dataproc is its low cost. Dataproc charges for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period.
Reference: https://cloud.google.com/dataproc/docs/concepts/overview

NEW QUESTION 16

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

  • A. Use federated data sources, and check data in the SQL query.
  • B. Enable BigQuery monitoring in Google Stackdriver and create an alert.
  • C. Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.
  • D. Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

Answer: D

NEW QUESTION 17

Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud?

  • A. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
  • B. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.
  • C. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
  • D. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional storage bucket as a final destination.

Answer: B

NEW QUESTION 18

Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2 answers)?

  • A. Weights
  • B. Biases
  • C. Continuous features
  • D. Input values

Answer: AB

Explanation:
A neural network is a simple mechanism that’s implemented with basic math. The only difference between the traditional programming model and a neural network is that you let the computer determine the parameters (weights and bias) by learning from training datasets.
Reference:
https://cloud.google.com/blog/big-data/2016/07/understanding-neural-networks-with-tensorflow-playground

NEW QUESTION 19

All Google Cloud Bigtable client requests go through a front-end server they are sent to a Cloud Bigtable node.

  • A. before
  • B. after
  • C. only if
  • D. once

Answer: A

Explanation:
In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.
The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.
When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.
Reference: https://cloud.google.com/bigtable/docs/overview

NEW QUESTION 20

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

  • A. Include ORDER BY DESK on timestamp column and LIMIT to 1.
  • B. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
  • C. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
  • D. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Answer: D

NEW QUESTION 21

Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requirements are:
Professional-Data-Engineer dumps exhibit The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured
Professional-Data-Engineer dumps exhibit Support for publish/subscribe semantics on hundreds of topics
Professional-Data-Engineer dumps exhibit Retain per-key ordering Which system should you choose?

  • A. Apache Kafka
  • B. Cloud Storage
  • C. Cloud Pub/Sub
  • D. Firebase Cloud Messaging

Answer: A

NEW QUESTION 22

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.
What should you do?

  • A. Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.
  • B. Place the MariaDB instances in an Instance Group with a Health Check.
  • C. Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.
  • D. Install the StackDriver Agent and configure the MySQL plugin.

Answer: C

NEW QUESTION 23
......

100% Valid and Newest Version Professional-Data-Engineer Questions & Answers shared by Downloadfreepdf.net, Get Full Dumps HERE: https://www.downloadfreepdf.net/Professional-Data-Engineer-pdf-download.html (New 239 Q&As)