What Evaluation Metrics Are Available in TensorFlow and How to Create Custom Metrics - 面试题

Evaluation metrics are used to assess model performance and are important tools for deep learning model development and tuning.

Common Evaluation Metrics

1. Classification Metrics

Accuracy

python
from tensorflow.keras.metrics import Accuracy

# Use accuracy metric
accuracy = Accuracy()

# Calculate accuracy
y_true = tf.constant([0, 1, 1, 0, 1])
y_pred = tf.constant([0, 1, 0, 0, 1])
accuracy.update_state(y_true, y_pred)
result = accuracy.result()
print(result)  # 0.8

# Use in model compilation
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Characteristics:

Intuitive and easy to understand
Suitable for balanced datasets
Sensitive to class imbalance

Use Cases:

Balanced classification tasks
Scenarios requiring simple evaluation

Precision

python
from tensorflow.keras.metrics import Precision

# Use precision metric
precision = Precision()

# Calculate precision
y_true = tf.constant([0, 1, 1, 0, 1])
y_pred = tf.constant([0, 1, 0, 0, 1])
precision.update_state(y_true, y_pred)
result = precision.result()
print(result)  # 1.0

# Use in model compilation
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[Precision()])

Characteristics:

Measures accuracy of positive predictions
Suitable for scenarios focusing on false positives
Not sensitive to class imbalance

Use Cases:

Spam detection
Medical diagnosis
Scenarios requiring reduced false positives

Recall

python
from tensorflow.keras.metrics import Recall

# Use recall metric
recall = Recall()

# Calculate recall
y_true = tf.constant([0, 1, 1, 0, 1])
y_pred = tf.constant([0, 1, 0, 0, 1])
recall.update_state(y_true, y_pred)
result = recall.result()
print(result)  # 0.666...

# Use in model compilation
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[Recall()])

Characteristics:

Measures ability to identify positive samples
Suitable for scenarios focusing on false negatives
Not sensitive to class imbalance

Use Cases:

Disease screening
Anomaly detection
Scenarios requiring reduced false negatives

F1 Score

python
from tensorflow.keras.metrics import F1Score

# Use F1 score metric
f1 = F1Score(num_classes=2, threshold=0.5)

# Calculate F1 score
y_true = tf.constant([[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]])
y_pred = tf.constant([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.2, 0.8], [0.9, 0.1]])
f1.update_state(y_true, y_pred)
result = f1.result()
print(result)  # [0.666..., 0.8]

# Use in model compilation
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[F1Score(num_classes=2)])

Characteristics:

Harmonic mean of precision and recall
Balances precision and recall
Suitable for imbalanced datasets

Use Cases:

Imbalanced classification tasks
Scenarios requiring balanced precision and recall

AUC-ROC

python
from tensorflow.keras.metrics import AUC

# Use AUC metric
auc = AUC()

# Calculate AUC
y_true = tf.constant([0, 1, 1, 0, 1])
y_pred = tf.constant([0.1, 0.9, 0.8, 0.2, 0.7])
auc.update_state(y_true, y_pred)
result = auc.result()
print(result)  # 0.916...

# Use in model compilation
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[AUC()])

Characteristics:

Measures overall performance of classifier
Not affected by threshold
Suitable for binary classification problems

Use Cases:

Binary classification tasks
Scenarios requiring overall performance evaluation

2. Regression Metrics

Mean Squared Error (MSE)

python
from tensorflow.keras.metrics import MeanSquaredError

# Use MSE metric
mse = MeanSquaredError()

# Calculate MSE
y_true = tf.constant([1.0, 2.0, 3.0, 4.0])
y_pred = tf.constant([1.1, 2.2, 2.9, 4.1])
mse.update_state(y_true, y_pred)
result = mse.result()
print(result)  # 0.0175

# Use in model compilation
model.compile(optimizer='adam', loss='mse', metrics=[MeanSquaredError()])

Characteristics:

Measures difference between predicted and true values
Sensitive to outliers
Suitable for continuous value prediction

Use Cases:

Regression tasks
Scenarios requiring precise prediction

Mean Absolute Error (MAE)

python
from tensorflow.keras.metrics import MeanAbsoluteError

# Use MAE metric
mae = MeanAbsoluteError()

# Calculate MAE
y_true = tf.constant([1.0, 2.0, 3.0, 4.0])
y_pred = tf.constant([1.1, 2.2, 2.9, 4.1])
mae.update_state(y_true, y_pred)
result = mae.result()
print(result)  # 0.125

# Use in model compilation
model.compile(optimizer='adam', loss='mae', metrics=[MeanAbsoluteError()])

Characteristics:

Measures absolute difference between predicted and true values
Not sensitive to outliers
Suitable for regression tasks with outliers

Use Cases:

Regression tasks
Data with outliers

Mean Absolute Percentage Error (MAPE)

python
# Custom MAPE metric
def mean_absolute_percentage_error(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.cast(y_pred, tf.float32)
    diff = tf.abs((y_true - y_pred) / y_true)
    return 100.0 * tf.reduce_mean(diff)

# Use MAPE
y_true = tf.constant([100.0, 200.0, 300.0])
y_pred = tf.constant([110.0, 190.0, 310.0])
mape = mean_absolute_percentage_error(y_true, y_pred)
print(mape)  # 5.555...

Characteristics:

Measures percentage error of predictions
Intuitive and easy to understand
Sensitive to values close to zero

Use Cases:

Scenarios requiring percentage error
Time series prediction

R-squared (R²)

python
# Custom R² metric
def r_squared(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.cast(y_pred, tf.float32)
    
    ss_res = tf.reduce_sum(tf.square(y_true - y_pred))
    ss_tot = tf.reduce_sum(tf.square(y_true - tf.reduce_mean(y_true)))
    
    return 1 - ss_res / (ss_tot + tf.keras.backend.epsilon())

# Use R²
y_true = tf.constant([1.0, 2.0, 3.0, 4.0])
y_pred = tf.constant([1.1, 2.2, 2.9, 4.1])
r2 = r_squared(y_true, y_pred)
print(r2)  # 0.982...

Characteristics:

Measures proportion of variance explained by model
Range is (-∞, 1]
1 indicates perfect fit

Use Cases:

Regression tasks
Scenarios requiring evaluation of model explanatory power

3. Other Metrics

Top-K Accuracy

python
from tensorflow.keras.metrics import TopKCategoricalAccuracy

# Use Top-5 accuracy
top5_acc = TopKCategoricalAccuracy(k=5)

# Calculate Top-5 accuracy
y_true = tf.constant([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]])
y_pred = tf.constant([[0.1, 0.2, 0.3, 0.1, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]])
top5_acc.update_state(y_true, y_pred)
result = top5_acc.result()
print(result)  # 1.0

# Use in model compilation
model.compile(optimizer='adam', loss='categorical_crossentropy', 
              metrics=[TopKCategoricalAccuracy(k=5)])

Characteristics:

Measures if prediction is in top K highest probabilities
Suitable for multi-class tasks
Commonly used in image classification

Use Cases:

Large-scale multi-class tasks
Image classification
Recommendation systems

Confusion Matrix

python
from sklearn.metrics import confusion_matrix
import numpy as np

# Calculate confusion matrix
y_true = np.array([0, 1, 1, 0, 1, 0, 1, 0])
y_pred = np.array([0, 1, 0, 0, 1, 1, 1, 0])

cm = confusion_matrix(y_true, y_pred)
print(cm)
# [[2 1]
#  [1 4]]

# Visualize confusion matrix
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

Characteristics:

Detailed display of classification results
Suitable for multi-class tasks
Visualizes classification performance

Use Cases:

Multi-class tasks
Scenarios requiring detailed analysis of classification results

Custom Evaluation Metrics

1. Basic Custom Metric

python
# Define custom metric
def custom_metric(y_true, y_pred):
    # Calculate custom metric
    return tf.reduce_mean(tf.abs(y_true - y_pred))

# Use custom metric
model.compile(optimizer='adam', 
              loss='mse', 
              metrics=[custom_metric])

2. Class-based Custom Metric

python
# Define class-based custom metric
class CustomMetric(tf.keras.metrics.Metric):
    def __init__(self, name='custom_metric', **kwargs):
        super(CustomMetric, self).__init__(name=name, **kwargs)
        self.count = self.add_weight(name='count', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Update state
        diff = tf.abs(y_true - y_pred)
        if sample_weight is not None:
            diff = diff * sample_weight
        self.count.assign_add(tf.reduce_sum(tf.cast(diff > 0.5, tf.float32)))
        self.total.assign_add(tf.cast(tf.size(diff), tf.float32))
    
    def result(self):
        # Calculate result
        return self.count / self.total
    
    def reset_states(self):
        # Reset state
        self.count.assign(0.0)
        self.total.assign(0.0)

# Use custom metric
custom_metric = CustomMetric()
model.compile(optimizer='adam', 
              loss='mse', 
              metrics=[custom_metric])

3. Multi-label Classification Metric

python
# Define multi-label accuracy
def multilabel_accuracy(y_true, y_pred):
    # Convert probabilities to binary
    y_pred_binary = tf.cast(y_pred > 0.5, tf.float32)
    
    # Calculate accuracy for each sample
    sample_accuracy = tf.reduce_all(
        tf.equal(y_true, y_pred_binary), axis=1
    )
    
    # Calculate overall accuracy
    return tf.reduce_mean(tf.cast(sample_accuracy, tf.float32))

# Use multi-label accuracy
model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=[multilabel_accuracy])

4. IoU (Intersection over Union)

python
# Define IoU metric
class IoU(tf.keras.metrics.Metric):
    def __init__(self, num_classes, name='iou', **kwargs):
        super(IoU, self).__init__(name=name, **kwargs)
        self.num_classes = num_classes
        self.intersection = self.add_weight(
            name='intersection', 
            shape=(num_classes,), 
            initializer='zeros'
        )
        self.union = self.add_weight(
            name='union', 
            shape=(num_classes,), 
            initializer='zeros'
        )
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Convert predictions to class indices
        y_pred = tf.argmax(y_pred, axis=-1)
        y_true = tf.argmax(y_true, axis=-1)
        
        # Calculate IoU for each class
        for i in range(self.num_classes):
            true_mask = tf.cast(y_true == i, tf.float32)
            pred_mask = tf.cast(y_pred == i, tf.float32)
            
            intersection = tf.reduce_sum(true_mask * pred_mask)
            union = tf.reduce_sum(true_mask + pred_mask) - intersection
            
            self.intersection[i].assign_add(intersection)
            self.union[i].assign_add(union)
    
    def result(self):
        # Calculate IoU
        return self.intersection / (self.union + tf.keras.backend.epsilon())
    
    def reset_states(self):
        # Reset state
        self.intersection.assign(tf.zeros_like(self.intersection))
        self.union.assign(tf.zeros_like(self.union))

# Use IoU metric
iou = IoU(num_classes=10)
model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=[iou])

5. Dice Coefficient

python
# Define Dice coefficient metric
class DiceCoefficient(tf.keras.metrics.Metric):
    def __init__(self, name='dice_coefficient', **kwargs):
        super(DiceCoefficient, self).__init__(name=name, **kwargs)
        self.intersection = self.add_weight(name='intersection', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Convert predictions to binary
        y_pred_binary = tf.cast(y_pred > 0.5, tf.float32)
        
        # Calculate intersection and union
        intersection = tf.reduce_sum(y_true * y_pred_binary)
        total = tf.reduce_sum(y_true) + tf.reduce_sum(y_pred_binary)
        
        self.intersection.assign_add(intersection)
        self.total.assign_add(total)
    
    def result(self):
        # Calculate Dice coefficient
        return 2.0 * self.intersection / (self.total + tf.keras.backend.epsilon())
    
    def reset_states(self):
        # Reset state
        self.intersection.assign(0.0)
        self.total.assign(0.0)

# Use Dice coefficient metric
dice = DiceCoefficient()
model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=[dice])

Combining Evaluation Metrics

1. Multi-metric Evaluation

python
# Combine multiple evaluation metrics
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=[
        'accuracy',
        Precision(name='precision'),
        Recall(name='recall'),
        F1Score(num_classes=10, name='f1_score'),
        TopKCategoricalAccuracy(k=5, name='top5_accuracy')
    ]
)

2. Conditional Metrics

python
# Define conditional metric
class ConditionalAccuracy(tf.keras.metrics.Metric):
    def __init__(self, condition_fn, name='conditional_accuracy', **kwargs):
        super(ConditionalAccuracy, self).__init__(name=name, **kwargs)
        self.condition_fn = condition_fn
        self.correct = self.add_weight(name='correct', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Apply condition function
        mask = self.condition_fn(y_true, y_pred)
        
        # Calculate accuracy
        y_pred_class = tf.argmax(y_pred, axis=-1)
        y_true_class = tf.argmax(y_true, axis=-1)
        
        correct = tf.cast(tf.equal(y_pred_class, y_true_class), tf.float32)
        correct = correct * tf.cast(mask, tf.float32)
        
        self.correct.assign_add(tf.reduce_sum(correct))
        self.total.assign_add(tf.reduce_sum(tf.cast(mask, tf.float32)))
    
    def result(self):
        return self.correct / (self.total + tf.keras.backend.epsilon())
    
    def reset_states(self):
        self.correct.assign(0.0)
        self.total.assign(0.0)

# Use conditional metric (e.g., only calculate accuracy for positive class)
positive_condition = lambda y_true, y_pred: tf.reduce_any(y_true > 0.5, axis=-1)
positive_accuracy = ConditionalAccuracy(positive_condition, name='positive_accuracy')

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy', positive_accuracy]
)

Evaluation Metrics Best Practices

1. Choose Appropriate Metrics Based on Task

python
# Classification task
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy', 'precision', 'recall', 'f1_score']
)

# Regression task
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae', 'mse']
)

# Imbalanced classification task
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['precision', 'recall', 'auc']
)

2. Use Multiple Metrics for Comprehensive Evaluation

python
# Combine multiple metrics
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=[
        'accuracy',
        Precision(name='precision'),
        Recall(name='recall'),
        AUC(name='auc'),
        TopKCategoricalAccuracy(k=5, name='top5_accuracy')
    ]
)

3. Monitor Metric Changes

python
# Custom callback to monitor metrics
class MetricsMonitor(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        print(f"Epoch {epoch}:")
        print(f"  Accuracy: {logs['accuracy']:.4f}")
        print(f"  Precision: {logs['precision']:.4f}")
        print(f"  Recall: {logs['recall']:.4f}")
        print(f"  AUC: {logs['auc']:.4f}")

# Use monitoring callback
model.fit(x_train, y_train, 
          validation_data=(x_val, y_val),
          callbacks=[MetricsMonitor()])

4. Visualize Metrics

python
import matplotlib.pyplot as plt

# Plot metric curves
def plot_metrics(history):
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Accuracy
    axes[0, 0].plot(history.history['accuracy'], label='Training Accuracy')
    axes[0, 0].plot(history.history['val_accuracy'], label='Validation Accuracy')
    axes[0, 0].set_title('Accuracy')
    axes[0, 0].set_xlabel('Epoch')
    axes[0, 0].set_ylabel('Accuracy')
    axes[0, 0].legend()
    
    # Precision
    axes[0, 1].plot(history.history['precision'], label='Training Precision')
    axes[0, 1].plot(history.history['val_precision'], label='Validation Precision')
    axes[0, 1].set_title('Precision')
    axes[0, 1].set_xlabel('Epoch')
    axes[0, 1].set_ylabel('Precision')
    axes[0, 1].legend()
    
    # Recall
    axes[1, 0].plot(history.history['recall'], label='Training Recall')
    axes[1, 0].plot(history.history['val_recall'], label='Validation Recall')
    axes[1, 0].set_title('Recall')
    axes[1, 0].set_xlabel('Epoch')
    axes[1, 0].set_ylabel('Recall')
    axes[1, 0].legend()
    
    # AUC
    axes[1, 1].plot(history.history['auc'], label='Training AUC')
    axes[1, 1].plot(history.history['val_auc'], label='Validation AUC')
    axes[1, 1].set_title('AUC')
    axes[1, 1].set_xlabel('Epoch')
    axes[1, 1].set_ylabel('AUC')
    axes[1, 1].legend()
    
    plt.tight_layout()
    plt.show()

# Use
history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=50)
plot_metrics(history)

Summary

TensorFlow provides rich evaluation metrics:

Classification Metrics: Accuracy, Precision, Recall, F1 Score, AUC-ROC
Regression Metrics: MSE, MAE, MAPE, R²
Other Metrics: Top-K Accuracy, Confusion Matrix, IoU, Dice
Custom Metrics: Can create custom evaluation metrics for specific needs
Metric Combination: Can combine multiple metrics for comprehensive model evaluation

Choosing appropriate evaluation metrics requires considering task type, data characteristics, and business requirements. Through the combination of multiple metrics, you can more comprehensively evaluate model performance.