Top 20 GCP Vertex AI & Machine Learning Interview Questions
- What is Vertex AI?
- What is AutoML?
- What are Vertex AI Workbench?
- How do you train custom models?
- What are Vertex AI Pipelines?
- How do you deploy models?
- What is Feature Store?
- How do you use pre-built APIs?
- What is Model Registry?
- How do you implement batch predictions?
- What are Experiments and Metadata?
- How do you optimize training?
- What is Vertex AI Vector Search?
- How do you monitor models?
- What are custom containers?
- How do you implement MLOps?
- What is Generative AI on Vertex AI?
- How do you handle model versioning?
- What are training best practices?
- How do you design ML architecture?
☁ Google Cloud Interview Questions
📊 GCP Data Engineer
BigQuery, Dataflow, Pub/Sub, GCS
⚡ Cloud Functions
Serverless, Triggers, Cloud Run
🗃 BigQuery
Data Warehouse, ML, Analytics
📦 Cloud Storage & Data Lake
GCS, Dataplex, Data Catalog
🚀 Dataproc & Dataflow
Spark, Hadoop, Apache Beam
🔄 Workflows & Composer
Orchestration, Airflow, Scheduling
🔒 IAM & Identity
Roles, Service Accounts, Identity Platform
🤖 Vertex AI
ML Platform, AutoML, Pipelines
🛠 Cloud Build & Deploy
CI/CD, Artifact Registry, GKE
📨 Pub/Sub & Streaming
Messaging, Streaming, Event-Driven
🎯 Data Engineering Scenarios
Real-world Architecture Questions
1. What is Vertex AI?
Vertex AI is Google Cloud's unified ML platform for building, deploying, and managing ML models at scale.
Vertex AI Components:
+-------------------------------------------------------------+
| Vertex AI |
+-------------------------------------------------------------+
| +-----------------------------------------------------+ |
| | Build & Train | |
| | +-- Workbench (Jupyter notebooks) | |
| | +-- AutoML (no-code ML) | |
| | +-- Custom Training | |
| | +-- Pipelines (ML workflows) | |
| | +-- Experiments (tracking) | |
| +-----------------------------------------------------+ |
| +-----------------------------------------------------+ |
| | Manage & Deploy | |
| | +-- Model Registry | |
| | +-- Feature Store | |
| | +-- Endpoints (online prediction) | |
| | +-- Batch Prediction | |
| | +-- Model Monitoring | |
| +-----------------------------------------------------+ |
| +-----------------------------------------------------+ |
| | Foundation Models | |
| | +-- Gemini (multimodal) | |
| | +-- PaLM (text) | |
| | +-- Imagen (images) | |
| | +-- Codey (code) | |
| +-----------------------------------------------------+ |
+-------------------------------------------------------------+
# Enable Vertex AI
gcloud services enable aiplatform.googleapis.com
# Python SDK setup
from google.cloud import aiplatform
aiplatform.init(
project='my-project',
location='us-central1',
staging_bucket='gs://my-bucket'
)
2. What is AutoML?
AutoML enables training high-quality models without ML expertise or coding.
AutoML Types:
+-- AutoML Tabular - Structured data
+-- AutoML Image - Classification, detection
+-- AutoML Text - NLP tasks
+-- AutoML Video - Video analysis
+-- AutoML Forecasting - Time series
AutoML Tabular Example:
from google.cloud import aiplatform
# Create dataset
dataset = aiplatform.TabularDataset.create(
display_name='customer_churn',
gcs_source='gs://bucket/data.csv'
)
# Train AutoML model
job = aiplatform.AutoMLTabularTrainingJob(
display_name='churn_prediction',
optimization_prediction_type='classification',
optimization_objective='maximize-au-roc'
)
model = job.run(
dataset=dataset,
target_column='churn',
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=1000, # 1 hour
model_display_name='churn_model'
)
AutoML Image Classification:
# Create image dataset
dataset = aiplatform.ImageDataset.create(
display_name='product_images',
gcs_source='gs://bucket/images/import.csv',
import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification
)
# Train model
job = aiplatform.AutoMLImageTrainingJob(
display_name='product_classifier',
prediction_type='classification',
model_type='CLOUD'
)
model = job.run(
dataset=dataset,
model_display_name='product_model',
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=8000
)
3. What are Vertex AI Workbench?
Workbench Types:
+-- Managed Notebooks - Fully managed Jupyter
+-- User-managed Notebooks - More control
+-- Workbench Instances - Latest offering
Features:
+-- Pre-installed ML libraries
+-- GPU/TPU support
+-- Git integration
+-- BigQuery connector
+-- Scheduled executions
+-- Collaboration features
# Create managed notebook
gcloud notebooks instances create my-notebook \
--location=us-central1-a \
--machine-type=n1-standard-4 \
--accelerator-type=NVIDIA_TESLA_T4 \
--accelerator-core-count=1 \
--install-gpu-driver
# Create with specific image
gcloud notebooks instances create ml-notebook \
--location=us-central1-a \
--machine-type=n1-standard-8 \
--vm-image-project=deeplearning-platform-release \
--vm-image-family=tf-latest-gpu
# Schedule notebook execution
gcloud notebooks executions create \
--display-name="Daily Training" \
--execution-template=execution-template.yaml \
--input-notebook-file=gs://bucket/notebooks/train.ipynb \
--output-notebook-folder=gs://bucket/outputs/ \
--params='{"learning_rate": 0.01}' \
--service-account=ml-sa@project.iam.gserviceaccount.com
Notebook Best Practices:
+-- Use parameterized notebooks
+-- Version control notebooks
+-- Separate experimentation from production
+-- Use idle shutdown
+-- Tag resources for cost tracking
# Terraform
resource "google_notebooks_instance" "notebook" {
name = "ml-notebook"
location = "us-central1-a"
machine_type = "n1-standard-4"
vm_image {
project = "deeplearning-platform-release"
image_family = "tf-latest-gpu"
}
install_gpu_driver = true
accelerator_config {
type = "NVIDIA_TESLA_T4"
core_count = 1
}
}
4. How do you train custom models?
Custom Training Options:
+-- Pre-built containers
+-- Custom containers
+-- Local training
+-- Distributed training
# Pre-built container training
from google.cloud import aiplatform
job = aiplatform.CustomTrainingJob(
display_name='custom_training',
script_path='train.py',
container_uri='us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-12:latest',
requirements=['pandas', 'scikit-learn'],
model_serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-gpu.2-12:latest'
)
model = job.run(
replica_count=1,
machine_type='n1-standard-8',
accelerator_type='NVIDIA_TESLA_V100',
accelerator_count=1,
args=['--epochs=10', '--batch_size=32'],
environment_variables={'MY_VAR': 'value'},
base_output_dir='gs://bucket/output'
)
# train.py
import argparse
import tensorflow as tf
from google.cloud import storage
def train(args):
# Load data
train_data = tf.data.TFRecordDataset('gs://bucket/train.tfrecord')
# Build model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Train
model.fit(train_data, epochs=args.epochs, batch_size=args.batch_size)
# Save model
model.save(f'{args.model_dir}/model')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--batch_size', type=int, default=32)
parser.add_argument('--model_dir', default=os.environ.get('AIP_MODEL_DIR'))
args = parser.parse_args()
train(args)
# Distributed training
job = aiplatform.CustomTrainingJob(...)
model = job.run(
replica_count=4,
machine_type='n1-standard-16',
accelerator_type='NVIDIA_TESLA_V100',
accelerator_count=2,
reduction_server_replica_count=1,
reduction_server_machine_type='n1-highcpu-16'
)
5. What are Vertex AI Pipelines?
Pipelines:
+-- Orchestrate ML workflows
+-- Based on Kubeflow Pipelines
+-- Reusable components
+-- Automatic artifact tracking
+-- Integration with Vertex services
Pipeline Example:
from kfp.v2 import dsl
from kfp.v2.dsl import component, Output, Input, Dataset, Model, Metrics
from google.cloud import aiplatform
@component(
packages_to_install=['pandas', 'scikit-learn'],
base_image='python:3.9'
)
def preprocess_data(
input_path: str,
output_dataset: Output[Dataset]
):
import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(input_path)
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df))
df_scaled.to_csv(output_dataset.path, index=False)
@component(
packages_to_install=['scikit-learn', 'pandas'],
base_image='python:3.9'
)
def train_model(
dataset: Input[Dataset],
model: Output[Model],
metrics: Output[Metrics]
):
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pickle
df = pd.read_csv(dataset.path)
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
metrics.log_metric('accuracy', accuracy)
with open(model.path, 'wb') as f:
pickle.dump(clf, f)
@dsl.pipeline(
name='ml-pipeline',
description='End-to-end ML pipeline'
)
def ml_pipeline(input_path: str):
preprocess_task = preprocess_data(input_path=input_path)
train_task = train_model(dataset=preprocess_task.outputs['output_dataset'])
# Compile and run
from kfp.v2 import compiler
compiler.Compiler().compile(ml_pipeline, 'pipeline.json')
aiplatform.init(project='my-project', location='us-central1')
job = aiplatform.PipelineJob(
display_name='my-pipeline',
template_path='pipeline.json',
parameter_values={'input_path': 'gs://bucket/data.csv'}
)
job.run()