getcertified4sure.com

Professional-Data-Engineer Exam

Updated Professional-Data-Engineer Samples For Google Professional Data Engineer Exam Certification




Master the Professional-Data-Engineer Google Professional Data Engineer Exam content and be ready for exam day success quickly with this Exambible Professional-Data-Engineer sample question. We guarantee it!We make it a reality and give you real Professional-Data-Engineer questions in our Google Professional-Data-Engineer braindumps.Latest 100% VALID Google Professional-Data-Engineer Exam Questions Dumps at below page. You can use our Google Professional-Data-Engineer braindumps and pass your exam.

Google Professional-Data-Engineer Free Dumps Questions Online, Read and Test Now.

NEW QUESTION 1

The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types of cluster nodes?

  • A. Workers
  • B. Masters, workers, and parameter servers
  • C. Workers and parameter servers
  • D. Parameter servers

Answer: C

Explanation:
The CUSTOM tier is not a set tier, but rather enables you to use your own cluster specification. When you use this tier, set values to configure your processing cluster according to these guidelines:
You must set TrainingInput.masterType to specify the type of machine to use for your master node. You may set TrainingInput.workerCount to specify the number of workers to use.
You may set TrainingInput.parameterServerCount to specify the number of parameter servers to use.
You can specify the type of machine for the master node, but you can't specify more than one master node. Reference: https://cloud.google.com/ml-engine/docs/training-overview#job_configuration_parameters

NEW QUESTION 2

You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?

  • A. Create a table in BigQuery, and append the new samples for CPU and memory to the table
  • B. Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second
  • C. Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
  • D. Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.

Answer: D

NEW QUESTION 3

You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. Which solution should you choose?

  • A. Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.
  • B. Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.
  • C. Create a separate dataset in BigQuery that contains the relevant data to share, and provide third-party companies with access to the new dataset.
  • D. Create a Cloud Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.

Answer: B

NEW QUESTION 4

Which of the following statements about Legacy SQL and Standard SQL is not true?

  • A. Standard SQL is the preferred query language for BigQuery.
  • B. If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
  • C. One difference between the two query languages is how you specify fully-qualified table names (i.
  • D. table names that include their associated project name).
  • E. You need to set a query language for each dataset and the default is Standard SQL.

Answer: D

Explanation:
You do not set a query language for each dataset. It is set each time you run a query and the default query language is Legacy SQL.
Standard SQL has been the preferred query language since BigQuery 2.0 was released.
In legacy SQL, to query a table with a project-qualified name, you use a colon, :, as a separator. In standard SQL, you use a period, ., instead.
Due to the differences in syntax between the two query languages (such as with project-qualified table names), if you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
Reference:
https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

NEW QUESTION 5

Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?

  • A. Create a Google Cloud Dataflow job to process the data.
  • B. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
  • C. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
  • D. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
  • E. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

Answer: A

NEW QUESTION 6

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

  • A. An hourly watermark
  • B. An event time trigger
  • C. The with Allowed Lateness method
  • D. A processing time trigger

Answer: D

Explanation:
When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window.
Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data
element. Beam’s default trigger is event time-based.
Reference: https://beam.apache.org/documentation/programming-guide/#triggers

NEW QUESTION 7

You are developing an application on Google Cloud that will automatically generate subject labels for users’ blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

  • A. Call the Cloud Natural Language API from your applicatio
  • B. Process the generated Entity Analysis as labels.
  • C. Call the Cloud Natural Language API from your applicatio
  • D. Process the generated Sentiment Analysis as labels.
  • E. Build and train a text classification model using TensorFlo
  • F. Deploy the model using Cloud Machine Learning Engin
  • G. Call the model from your application and process the results as labels.
  • H. Build and train a text classification model using TensorFlo
  • I. Deploy the model using a KubernetesEngine cluste
  • J. Call the model from your application and process the results as labels.

Answer: B

NEW QUESTION 8

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

  • A. Transform text files to compressed Avro using Cloud Dataflo
  • B. Use BigQuery for storage and query.
  • C. Transform text files to compressed Avro using Cloud Dataflo
  • D. Use Cloud Storage and BigQuerypermanent linked tables for query.
  • E. Compress text files to gzip using the Grid Computing Tool
  • F. Use BigQuery for storage and query.
  • G. Compress text files to gzip using the Grid Computing Tool
  • H. Use Cloud Storage, and then import into Cloud Bigtable for query.

Answer: D

NEW QUESTION 9

What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?

  • A. create a third instance and sync the data from the two storage types via batch jobs
  • B. export the data from the existing instance and import the data into a new instance
  • C. run parallel instances where one is HDD and the other is SDD
  • D. the selection is final and you must resume using the same storage type

Answer: B

Explanation:
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance. Alternatively, you can write
a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another. Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd–

NEW QUESTION 10

You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?

  • A. Add a SideInput that returns a Boolean if the element is corrupt.
  • B. Add a ParDo transform in Cloud Dataflow to discard corrupt elements.
  • C. Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.
  • D. Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

Answer: B

NEW QUESTION 11

As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? Choose 2 answers.

  • A. Use Cloud Deployment Manager to automate access provision.
  • B. Introduce resource hierarchy to leverage access control policy inheritance.
  • C. Create distinct groups for various teams, and specify groups in Cloud IAM policies.
  • D. Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
  • E. For each Cloud Storage bucket or BigQuery dataset, decide which projects need acces
  • F. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.

Answer: AC

NEW QUESTION 12

When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these four values are required: project, region, name, and .

  • A. zone
  • B. node
  • C. label
  • D. type

Answer: A

Explanation:
At a minimum, you must specify four values when creating a new cluster with the projects.regions.clusters.create operation:
The project in which the cluster will be created The region to use
The name of the cluster
The zone in which the cluster will be created
You can specify many more details beyond these minimum requirements. For example, you can
also specify the number of workers, whether preemptible compute should be used, and the network settings.
Reference:
https://cloud.google.com/dataproc/docs/tutorials/python-library-example#create_a_new_cloud_dataproc_cluste

NEW QUESTION 13

Your company’s customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

  • A. Add a node to the MySQL cluster and build an OLAP cube there.
  • B. Use an ETL tool to load the data from MySQL into Google BigQuery.
  • C. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
  • D. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Answer: C

NEW QUESTION 14

Your company handles data processing for a number of different clients. Each client prefers to use their own suite of analytics tools, with some allowing direct query access via Google BigQuery. You need to secure the data so that clients cannot see each other’s data. You want to ensure appropriate access to the data. Which three steps should you take? (Choose three.)

  • A. Load data into different partitions.
  • B. Load data into a different dataset for each client.
  • C. Put each client’s BigQuery dataset into a different table.
  • D. Restrict a client’s dataset to approved users.
  • E. Only allow a service account to access the datasets.
  • F. Use the appropriate identity and access management (IAM) roles for each client’s users.

Answer: BDF

NEW QUESTION 15

You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which database should you choose?

  • A. Cloud SQL
  • B. Cloud Bigtable
  • C. Cloud Spanner
  • D. Cloud Datastore

Answer: A

NEW QUESTION 16

You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this? Choose 2 answers.

  • A. Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
  • B. Use managed exportm, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
  • C. Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.
  • D. Write an application that uses Cloud Datastore client libraries to read all the entitie
  • E. Treat each entity as a BigQuery table row via BigQuery streaming inser
  • F. Assign an export timestamp for each export, and attach it as an extra column for each ro
  • G. Make sure that the BigQuery table is partitioned using the export timestamp column.
  • H. Write an application that uses Cloud Datastore client libraries to read all the entitie
  • I. Format the exported data into a JSON fil
  • J. Apply compression before storing the data in Cloud Source Repositories.

Answer: CE

NEW QUESTION 17

Which of the following are feature engineering techniques? (Select 2 answers)

  • A. Hidden feature layers
  • B. Feature prioritization
  • C. Crossed feature columns
  • D. Bucketization of a continuous feature

Answer: CD

Explanation:
Selecting and crafting the right set of feature columns is key to learning an effective model. Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive
bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into.
Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.
Reference: https://www.tensorflow.org/tutorials/wide#selecting_and_engineering_features_for_the_model

NEW QUESTION 18

Which Google Cloud Platform service is an alternative to Hadoop with Hive?

  • A. Cloud Dataflow
  • B. Cloud Bigtable
  • C. BigQuery
  • D. Cloud Datastore

Answer: C

Explanation:
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query, and analysis.
Google BigQuery is an enterprise data warehouse. Reference: https://en.wikipedia.org/wiki/Apache_Hive

NEW QUESTION 19

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

  • A. Your gcloud does not have access to the BigQuery resources
  • B. BigQuery cannot be accessed from local machines
  • C. You are missing gcloud on your machine
  • D. Pipelines cannot be run locally

Answer: A

Explanation:
When reading from a Dataflow source or writing to a Dataflow sink using DirectPipelineRunner, the Cloud Platform account that you configured with the gcloud executable will need access to the corresponding source/sink
Reference:
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRun

NEW QUESTION 20

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

  • A. There are very few occurrences of mutations relative to normal samples.
  • B. There are roughly equal occurrences of both normal and mutated samples in the database.
  • C. You expect future mutations to have different features from the mutated samples in the database.
  • D. You expect future mutations to have similar features to the mutated samples in the database.
  • E. You already have labels for which samples are mutated and which are normal in the database.

Answer: BC

NEW QUESTION 21

Suppose you have a dataset of images that are each labeled as to whether or not they contain a human face. To create a neural network that recognizes human faces in images using this labeled dataset, what approach would likely be the most effective?

  • A. Use K-means Clustering to detect faces in the pixels.
  • B. Use feature engineering to add features for eyes, noses, and mouths to the input data.
  • C. Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.
  • D. Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two categories.

Answer: C

Explanation:
Traditional machine learning relies on shallow nets, composed of one input and one output layer, and at most one hidden layer in between. More than three layers (including input and output) qualifies as “deep” learning. So deep is a strictly defined, technical term that means more than one hidden layer.
In deep-learning networks, each layer of nodes trains on a distinct set of features based on the previous layer’s output. The further you advance into the neural net, the more complex the features your nodes can recognize, since they aggregate and recombine features from the
previous layer.
A neural network with only one hidden layer would be unable to automatically recognize high-level features of faces, such as eyes, because it wouldn't be able to "build" these features using previous hidden layers that detect low-level features, such as lines.
Feature engineering is difficult to perform on raw image data.
K- means Clustering is an unsupervised learning method used to categorize unlabeled data. Reference: https://deeplearning4j.org/neuralnet-overview

NEW QUESTION 22

Which is not a valid reason for poor Cloud Bigtable performance?

  • A. The workload isn't appropriate for Cloud Bigtable.
  • B. The table's schema is not designed correctly.
  • C. The Cloud Bigtable cluster has too many nodes.
  • D. There are issues with the network connection.

Answer: C

Explanation:
The Cloud Bigtable cluster doesn't have enough nodes. If your Cloud Bigtable cluster is overloaded, adding more nodes can improve performance. Use the monitoring tools to check whether the cluster is overloaded.
Reference: https://cloud.google.com/bigtable/docs/performance

NEW QUESTION 23
......

P.S. Surepassexam now are offering 100% pass ensure Professional-Data-Engineer dumps! All Professional-Data-Engineer exam questions have been updated with correct answers: https://www.surepassexam.com/Professional-Data-Engineer-exam-dumps.html (239 New Questions)