getcertified4sure.com

Professional-Data-Engineer Exam

The Secret Of Google Professional-Data-Engineer Test




we provide Simulation Google Professional-Data-Engineer torrent which are the best for clearing Professional-Data-Engineer test, and to get certified by Google Google Professional Data Engineer Exam. The Professional-Data-Engineer Questions & Answers covers all the knowledge points of the real Professional-Data-Engineer exam. Crack your Google Professional-Data-Engineer Exam with latest dumps, guaranteed!

Online Professional-Data-Engineer free questions and answers of New Version:

NEW QUESTION 1

What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

  • A. Include multiple time series values within the row key
  • B. Keep the row keep as an 8 bit integer
  • C. Keep your row key reasonably short
  • D. Keep your row key as long as the field permits

Answer: C

Explanation:
A general guide is to, keep your row keys reasonably short. Long row keys take up additional memory and storage and increase the time it takes to get responses from the Cloud Bigtable server.
Reference: https://cloud.google.com/bigtable/docs/schema-design#row-keys

NEW QUESTION 2

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

  • A. PCollection
  • B. Transform
  • C. Pipeline
  • D. Sink API

Answer: B

Explanation:
In Google Cloud, the Dataflow SDK provides a transform component. It is responsible for the data processing operation. You can use conditional, for loops, and other complex programming structure to create a branching pipeline.
Reference: https://cloud.google.com/dataflow/model/programming-model

NEW QUESTION 3

You operate a database that stores stock trades and an application that retrieves average stock price for a given company over an adjustable window of time. The data is stored in Cloud Bigtable where the datetime of the stock trade is the beginning of the row key. Your application has thousands of concurrent users, and you notice that performance is starting to degrade as more stocks are added. What should you do to improve the performance of your application?

  • A. Change the row key syntax in your Cloud Bigtable table to begin with the stock symbol.
  • B. Change the row key syntax in your Cloud Bigtable table to begin with a random number per second.
  • C. Change the data pipeline to use BigQuery for storing stock trades, and update your application.
  • D. Use Cloud Dataflow to write summary of each day’s stock trades to an Avro file on Cloud Storage.Update your application to read from Cloud Storage and Cloud Bigtable to compute the responses.

Answer: A

NEW QUESTION 4

You are training a spam classifier. You notice that you are overfitting the training data. Which three actions can you take to resolve this problem? (Choose three.)

  • A. Get more training examples
  • B. Reduce the number of training examples
  • C. Use a smaller set of features
  • D. Use a larger set of features
  • E. Increase the regularization parameters
  • F. Decrease the regularization parameters

Answer: ADF

NEW QUESTION 5

An organization maintains a Google BigQuery dataset that contains tables with user-level datA. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?

  • A. Create and share an authorized view that provides the aggregate results.
  • B. Create and share a new dataset and view that provides the aggregate results.
  • C. Create and share a new dataset and table that contains the aggregate results.
  • D. Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.

Answer: D

Explanation:
Reference: https://cloud.google.com/bigquery/docs/access-control

NEW QUESTION 6

You are designing a cloud-native historical data processing system to meet the following conditions:
Professional-Data-Engineer dumps exhibit The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.
Professional-Data-Engineer dumps exhibit A streaming data pipeline stores new data daily.
Professional-Data-Engineer dumps exhibit Peformance is not a factor in the solution.
Professional-Data-Engineer dumps exhibit The solution design should maximize availability.
How should you design data storage for this solution?

  • A. Create a Cloud Dataproc cluster with high availabilit
  • B. Store the data in HDFS, and peform analysis as needed.
  • C. Store the data in BigQuer
  • D. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.
  • E. Store the data in a regional Cloud Storage bucke
  • F. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.
  • G. Store the data in a multi-regional Cloud Storage bucke
  • H. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.

Answer: C

NEW QUESTION 7

If you're running a performance test that depends upon Cloud Bigtable, all the choices except one below are recommended steps. Which is NOT a recommended step to follow?

  • A. Do not use a production instance.
  • B. Run your test for at least 10 minutes.
  • C. Before you test, run a heavy pre-test for several minutes.
  • D. Use at least 300 GB of data.

Answer: A

Explanation:
If you're running a performance test that depends upon Cloud Bigtable, be sure to follow these steps as you
plan and execute your test:
Use a production instance. A development instance will not give you an accurate sense of how a production instance performs under load.
Use at least 300 GB of data. Cloud Bigtable performs best with 1 TB or more of data. However, 300 GB of data is enough to provide reasonable results in a performance test on a 3-node cluster. On larger clusters, use 100 GB of data per node.
Before you test, run a heavy pre-test for several minutes. This step gives Cloud Bigtable a chance to balance data across your nodes based on the access patterns it observes.
Run your test for at least 10 minutes. This step lets Cloud Bigtable further optimize your data, and it helps ensure that you will test reads from disk as well as cached reads from memory.
Reference: https://cloud.google.com/bigtable/docs/performance

NEW QUESTION 8

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?

  • A. cron
  • B. Cloud Composer
  • C. Cloud Scheduler
  • D. Workflow Templates on Cloud Dataproc

Answer: D

NEW QUESTION 9

Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error:
# Syntax error : Expected end of statement but got “-“ at [4:11] SELECT age
FROM
bigquery-public-data.noaa_gsod.gsod WHERE
age != 99
AND_TABLE_SUFFIX = ‘1929’ ORDER BY
age DESC
Which table name will make the SQL statement work correctly?

  • A. ‘bigquery-public-data.noaa_gsod.gsod‘
  • B. bigquery-public-data.noaa_gsod.gsod*
  • C. ‘bigquery-public-data.noaa_gsod.gsod’*
  • D. ‘bigquery-public-data.noaa_gsod.gsod*`

Answer: D

NEW QUESTION 10

You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?

  • A. Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory
  • B. Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS
  • C. Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up
  • D. Allocate additional network interface card (NIC), and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage

Answer: A

NEW QUESTION 11

You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time. What should you do?

  • A. Send the data to Google Cloud Datastore and then export to BigQuery.
  • B. Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.
  • C. Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.
  • D. Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

Answer: B

NEW QUESTION 12

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity ‘Movie’ the property ‘actors’ and the property ‘tags’ have multiple values but the property ‘date released’ does not. A typical query would ask for all movies with actor=<actorname> ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?
Professional-Data-Engineer dumps exhibit

  • A. Option A
  • B. Option B.
  • C. Option C
  • D. Option D

Answer: A

NEW QUESTION 13

Which of these operations can you perform from the BigQuery Web UI?

  • A. Upload a file in SQL format.
  • B. Load data with nested and repeated fields.
  • C. Upload a 20 MB file.
  • D. Upload multiple files using a wildcard.

Answer: B

Explanation:
You can load data with nested and repeated fields using the Web UI. You cannot use the Web UI to:
- Upload a file greater than 10 MB in size
- Upload multiple files at the same time
- Upload a file in SQL format
All three of the above operations can be performed using the "bq" command. Reference: https://cloud.google.com/bigquery/loading-data

NEW QUESTION 14

Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?

  • A. Threading
  • B. Serialization
  • C. Dropout Methods
  • D. Dimensionality Reduction

Answer: C

Explanation:
Reference
https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505

NEW QUESTION 15

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

  • A. Add capacity (memory and disk space) to the database server by the order of 200.
  • B. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.
  • C. Normalize the master patient-record table into the patient table and the visits table, and create othernecessary tables to avoid self-join.
  • D. Partition the table into smaller tables, with one for each clini
  • E. Run queries against the smaller table pairs, and use unions for consolidated reports.

Answer: B

NEW QUESTION 16

You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do?

  • A. Deploy a Cloud Dataproc cluste
  • B. Use a standard persistent disk and 50% preemptible worker
  • C. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
  • D. Deploy a Cloud Dataproc cluste
  • E. Use an SSD persistent disk and 50% preemptible worker
  • F. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
  • G. Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instance
  • H. Install the Cloud Storage connector, and store the data in Cloud Storag
  • I. Change references in scripts from hdfs:// to gs://
  • J. Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances.Store data in HDF
  • K. Change references in scripts from hdfs:// to gs://

Answer: A

NEW QUESTION 17

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?

  • A. Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.
  • B. Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.
  • C. Use the NOW () function in BigQuery to record the event’s time.
  • D. Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Answer: B

NEW QUESTION 18

What is the HBase Shell for Cloud Bigtable?

  • A. The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables.
  • B. The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables.
  • C. The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances.
  • D. The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.

Answer: B

Explanation:
The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables. The Cloud Bigtable HBase client for Java makes it possible to use the HBase shell to connect to Cloud Bigtable.
Reference: https://cloud.google.com/bigtable/docs/installing-hbase-shell

NEW QUESTION 19

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

  • A. Use Cloud SQL for storag
  • B. Add secondary indexes to support query patterns.
  • C. Use Cloud SQL for storag
  • D. Use Cloud Dataflow to transform data to support query patterns.
  • E. Use Cloud Spanner for storag
  • F. Add secondary indexes to support query patterns.
  • G. Use Cloud Spanner for storag
  • H. Use Cloud Dataflow to transform data to support query patterns.

Answer: D

Explanation:
Reference: https://cloud.google.com/solutions/data-lifecycle-cloud-platform

NEW QUESTION 20

Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?

  • A. Preemptible workers cannot use persistent disk.
  • B. Preemptible workers cannot store data.
  • C. If a preemptible worker is reclaimed, then a replacement worker must be added manually.
  • D. A Dataproc cluster cannot have only preemptible workers.

Answer: BD

Explanation:
The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster: Processing only—Since preemptibles can be reclaimed at any time, preemptible workers do not store data.
Preemptibles added to a Cloud Dataproc cluster only function as processing nodes.
No preemptible-only clusters—To ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters.
Persistent disk size—As a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS.
The managed group automatically re-adds workers lost due to reclamation as capacity permits. Reference: https://cloud.google.com/dataproc/docs/concepts/preemptible-vms

NEW QUESTION 21

When you design a Google Cloud Bigtable schema it is recommended that you .

  • A. Avoid schema designs that are based on NoSQL concepts
  • B. Create schema designs that are based on a relational database design
  • C. Avoid schema designs that require atomicity across rows
  • D. Create schema designs that require atomicity across rows

Answer: C

Explanation:
All operations are atomic at the row level. For example, if you update two rows in a table, it's possible that one row will be updated successfully and the other update will fail. Avoid schema designs that require atomicity across rows.
Reference: https://cloud.google.com/bigtable/docs/schema-design#row-keys

NEW QUESTION 22

You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

  • A. Use Cloud Dataproc to run your transformation
  • B. Monitor CPU utilization for the cluste
  • C. Resize the number of worker nodes in your cluster via the command line.
  • D. Use Cloud Dataproc to run your transformation
  • E. Use the diagnose command to generate an operational output archiv
  • F. Locate the bottleneck and adjust cluster resources.
  • G. Use Cloud Dataflow to run your transformation
  • H. Monitor the job system lag with Stackdrive
  • I. Use the default autoscaling setting for worker instances.
  • J. Use Cloud Dataflow to run your transformation
  • K. Monitor the total execution time for a sampling of job
  • L. Configure the job to use non-default Compute Engine machine types when needed.

Answer: B

NEW QUESTION 23
......

P.S. Certleader now are offering 100% pass ensure Professional-Data-Engineer dumps! All Professional-Data-Engineer exam questions have been updated with correct answers: https://www.certleader.com/Professional-Data-Engineer-dumps.html (239 New Questions)