乐闻世界logo
搜索文章和话题

What Evaluation Metrics Are Available in TensorFlow and How to Create Custom Metrics

2月18日 17:58

Evaluation metrics are used to assess model performance and are important tools for deep learning model development and tuning.

Common Evaluation Metrics

1. Classification Metrics

Accuracy

python
from tensorflow.keras.metrics import Accuracy # Use accuracy metric accuracy = Accuracy() # Calculate accuracy y_true = tf.constant([0, 1, 1, 0, 1]) y_pred = tf.constant([0, 1, 0, 0, 1]) accuracy.update_state(y_true, y_pred) result = accuracy.result() print(result) # 0.8 # Use in model compilation model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Characteristics:

  • Intuitive and easy to understand
  • Suitable for balanced datasets
  • Sensitive to class imbalance

Use Cases:

  • Balanced classification tasks
  • Scenarios requiring simple evaluation

Precision

python
from tensorflow.keras.metrics import Precision # Use precision metric precision = Precision() # Calculate precision y_true = tf.constant([0, 1, 1, 0, 1]) y_pred = tf.constant([0, 1, 0, 0, 1]) precision.update_state(y_true, y_pred) result = precision.result() print(result) # 1.0 # Use in model compilation model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[Precision()])

Characteristics:

  • Measures accuracy of positive predictions
  • Suitable for scenarios focusing on false positives
  • Not sensitive to class imbalance

Use Cases:

  • Spam detection
  • Medical diagnosis
  • Scenarios requiring reduced false positives

Recall

python
from tensorflow.keras.metrics import Recall # Use recall metric recall = Recall() # Calculate recall y_true = tf.constant([0, 1, 1, 0, 1]) y_pred = tf.constant([0, 1, 0, 0, 1]) recall.update_state(y_true, y_pred) result = recall.result() print(result) # 0.666... # Use in model compilation model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[Recall()])

Characteristics:

  • Measures ability to identify positive samples
  • Suitable for scenarios focusing on false negatives
  • Not sensitive to class imbalance

Use Cases:

  • Disease screening
  • Anomaly detection
  • Scenarios requiring reduced false negatives

F1 Score

python
from tensorflow.keras.metrics import F1Score # Use F1 score metric f1 = F1Score(num_classes=2, threshold=0.5) # Calculate F1 score y_true = tf.constant([[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]) y_pred = tf.constant([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.2, 0.8], [0.9, 0.1]]) f1.update_state(y_true, y_pred) result = f1.result() print(result) # [0.666..., 0.8] # Use in model compilation model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[F1Score(num_classes=2)])

Characteristics:

  • Harmonic mean of precision and recall
  • Balances precision and recall
  • Suitable for imbalanced datasets

Use Cases:

  • Imbalanced classification tasks
  • Scenarios requiring balanced precision and recall

AUC-ROC

python
from tensorflow.keras.metrics import AUC # Use AUC metric auc = AUC() # Calculate AUC y_true = tf.constant([0, 1, 1, 0, 1]) y_pred = tf.constant([0.1, 0.9, 0.8, 0.2, 0.7]) auc.update_state(y_true, y_pred) result = auc.result() print(result) # 0.916... # Use in model compilation model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[AUC()])

Characteristics:

  • Measures overall performance of classifier
  • Not affected by threshold
  • Suitable for binary classification problems

Use Cases:

  • Binary classification tasks
  • Scenarios requiring overall performance evaluation

2. Regression Metrics

Mean Squared Error (MSE)

python
from tensorflow.keras.metrics import MeanSquaredError # Use MSE metric mse = MeanSquaredError() # Calculate MSE y_true = tf.constant([1.0, 2.0, 3.0, 4.0]) y_pred = tf.constant([1.1, 2.2, 2.9, 4.1]) mse.update_state(y_true, y_pred) result = mse.result() print(result) # 0.0175 # Use in model compilation model.compile(optimizer='adam', loss='mse', metrics=[MeanSquaredError()])

Characteristics:

  • Measures difference between predicted and true values
  • Sensitive to outliers
  • Suitable for continuous value prediction

Use Cases:

  • Regression tasks
  • Scenarios requiring precise prediction

Mean Absolute Error (MAE)

python
from tensorflow.keras.metrics import MeanAbsoluteError # Use MAE metric mae = MeanAbsoluteError() # Calculate MAE y_true = tf.constant([1.0, 2.0, 3.0, 4.0]) y_pred = tf.constant([1.1, 2.2, 2.9, 4.1]) mae.update_state(y_true, y_pred) result = mae.result() print(result) # 0.125 # Use in model compilation model.compile(optimizer='adam', loss='mae', metrics=[MeanAbsoluteError()])

Characteristics:

  • Measures absolute difference between predicted and true values
  • Not sensitive to outliers
  • Suitable for regression tasks with outliers

Use Cases:

  • Regression tasks
  • Data with outliers

Mean Absolute Percentage Error (MAPE)

python
# Custom MAPE metric def mean_absolute_percentage_error(y_true, y_pred): y_true = tf.cast(y_true, tf.float32) y_pred = tf.cast(y_pred, tf.float32) diff = tf.abs((y_true - y_pred) / y_true) return 100.0 * tf.reduce_mean(diff) # Use MAPE y_true = tf.constant([100.0, 200.0, 300.0]) y_pred = tf.constant([110.0, 190.0, 310.0]) mape = mean_absolute_percentage_error(y_true, y_pred) print(mape) # 5.555...

Characteristics:

  • Measures percentage error of predictions
  • Intuitive and easy to understand
  • Sensitive to values close to zero

Use Cases:

  • Scenarios requiring percentage error
  • Time series prediction

R-squared (R²)

python
# Custom R² metric def r_squared(y_true, y_pred): y_true = tf.cast(y_true, tf.float32) y_pred = tf.cast(y_pred, tf.float32) ss_res = tf.reduce_sum(tf.square(y_true - y_pred)) ss_tot = tf.reduce_sum(tf.square(y_true - tf.reduce_mean(y_true))) return 1 - ss_res / (ss_tot + tf.keras.backend.epsilon()) # Use R² y_true = tf.constant([1.0, 2.0, 3.0, 4.0]) y_pred = tf.constant([1.1, 2.2, 2.9, 4.1]) r2 = r_squared(y_true, y_pred) print(r2) # 0.982...

Characteristics:

  • Measures proportion of variance explained by model
  • Range is (-∞, 1]
  • 1 indicates perfect fit

Use Cases:

  • Regression tasks
  • Scenarios requiring evaluation of model explanatory power

3. Other Metrics

Top-K Accuracy

python
from tensorflow.keras.metrics import TopKCategoricalAccuracy # Use Top-5 accuracy top5_acc = TopKCategoricalAccuracy(k=5) # Calculate Top-5 accuracy y_true = tf.constant([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]]) y_pred = tf.constant([[0.1, 0.2, 0.3, 0.1, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]]) top5_acc.update_state(y_true, y_pred) result = top5_acc.result() print(result) # 1.0 # Use in model compilation model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[TopKCategoricalAccuracy(k=5)])

Characteristics:

  • Measures if prediction is in top K highest probabilities
  • Suitable for multi-class tasks
  • Commonly used in image classification

Use Cases:

  • Large-scale multi-class tasks
  • Image classification
  • Recommendation systems

Confusion Matrix

python
from sklearn.metrics import confusion_matrix import numpy as np # Calculate confusion matrix y_true = np.array([0, 1, 1, 0, 1, 0, 1, 0]) y_pred = np.array([0, 1, 0, 0, 1, 1, 1, 0]) cm = confusion_matrix(y_true, y_pred) print(cm) # [[2 1] # [1 4]] # Visualize confusion matrix import matplotlib.pyplot as plt import seaborn as sns plt.figure(figsize=(8, 6)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') plt.xlabel('Predicted') plt.ylabel('True') plt.title('Confusion Matrix') plt.show()

Characteristics:

  • Detailed display of classification results
  • Suitable for multi-class tasks
  • Visualizes classification performance

Use Cases:

  • Multi-class tasks
  • Scenarios requiring detailed analysis of classification results

Custom Evaluation Metrics

1. Basic Custom Metric

python
# Define custom metric def custom_metric(y_true, y_pred): # Calculate custom metric return tf.reduce_mean(tf.abs(y_true - y_pred)) # Use custom metric model.compile(optimizer='adam', loss='mse', metrics=[custom_metric])

2. Class-based Custom Metric

python
# Define class-based custom metric class CustomMetric(tf.keras.metrics.Metric): def __init__(self, name='custom_metric', **kwargs): super(CustomMetric, self).__init__(name=name, **kwargs) self.count = self.add_weight(name='count', initializer='zeros') self.total = self.add_weight(name='total', initializer='zeros') def update_state(self, y_true, y_pred, sample_weight=None): # Update state diff = tf.abs(y_true - y_pred) if sample_weight is not None: diff = diff * sample_weight self.count.assign_add(tf.reduce_sum(tf.cast(diff > 0.5, tf.float32))) self.total.assign_add(tf.cast(tf.size(diff), tf.float32)) def result(self): # Calculate result return self.count / self.total def reset_states(self): # Reset state self.count.assign(0.0) self.total.assign(0.0) # Use custom metric custom_metric = CustomMetric() model.compile(optimizer='adam', loss='mse', metrics=[custom_metric])

3. Multi-label Classification Metric

python
# Define multi-label accuracy def multilabel_accuracy(y_true, y_pred): # Convert probabilities to binary y_pred_binary = tf.cast(y_pred > 0.5, tf.float32) # Calculate accuracy for each sample sample_accuracy = tf.reduce_all( tf.equal(y_true, y_pred_binary), axis=1 ) # Calculate overall accuracy return tf.reduce_mean(tf.cast(sample_accuracy, tf.float32)) # Use multi-label accuracy model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[multilabel_accuracy])

4. IoU (Intersection over Union)

python
# Define IoU metric class IoU(tf.keras.metrics.Metric): def __init__(self, num_classes, name='iou', **kwargs): super(IoU, self).__init__(name=name, **kwargs) self.num_classes = num_classes self.intersection = self.add_weight( name='intersection', shape=(num_classes,), initializer='zeros' ) self.union = self.add_weight( name='union', shape=(num_classes,), initializer='zeros' ) def update_state(self, y_true, y_pred, sample_weight=None): # Convert predictions to class indices y_pred = tf.argmax(y_pred, axis=-1) y_true = tf.argmax(y_true, axis=-1) # Calculate IoU for each class for i in range(self.num_classes): true_mask = tf.cast(y_true == i, tf.float32) pred_mask = tf.cast(y_pred == i, tf.float32) intersection = tf.reduce_sum(true_mask * pred_mask) union = tf.reduce_sum(true_mask + pred_mask) - intersection self.intersection[i].assign_add(intersection) self.union[i].assign_add(union) def result(self): # Calculate IoU return self.intersection / (self.union + tf.keras.backend.epsilon()) def reset_states(self): # Reset state self.intersection.assign(tf.zeros_like(self.intersection)) self.union.assign(tf.zeros_like(self.union)) # Use IoU metric iou = IoU(num_classes=10) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[iou])

5. Dice Coefficient

python
# Define Dice coefficient metric class DiceCoefficient(tf.keras.metrics.Metric): def __init__(self, name='dice_coefficient', **kwargs): super(DiceCoefficient, self).__init__(name=name, **kwargs) self.intersection = self.add_weight(name='intersection', initializer='zeros') self.total = self.add_weight(name='total', initializer='zeros') def update_state(self, y_true, y_pred, sample_weight=None): # Convert predictions to binary y_pred_binary = tf.cast(y_pred > 0.5, tf.float32) # Calculate intersection and union intersection = tf.reduce_sum(y_true * y_pred_binary) total = tf.reduce_sum(y_true) + tf.reduce_sum(y_pred_binary) self.intersection.assign_add(intersection) self.total.assign_add(total) def result(self): # Calculate Dice coefficient return 2.0 * self.intersection / (self.total + tf.keras.backend.epsilon()) def reset_states(self): # Reset state self.intersection.assign(0.0) self.total.assign(0.0) # Use Dice coefficient metric dice = DiceCoefficient() model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[dice])

Combining Evaluation Metrics

1. Multi-metric Evaluation

python
# Combine multiple evaluation metrics model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=[ 'accuracy', Precision(name='precision'), Recall(name='recall'), F1Score(num_classes=10, name='f1_score'), TopKCategoricalAccuracy(k=5, name='top5_accuracy') ] )

2. Conditional Metrics

python
# Define conditional metric class ConditionalAccuracy(tf.keras.metrics.Metric): def __init__(self, condition_fn, name='conditional_accuracy', **kwargs): super(ConditionalAccuracy, self).__init__(name=name, **kwargs) self.condition_fn = condition_fn self.correct = self.add_weight(name='correct', initializer='zeros') self.total = self.add_weight(name='total', initializer='zeros') def update_state(self, y_true, y_pred, sample_weight=None): # Apply condition function mask = self.condition_fn(y_true, y_pred) # Calculate accuracy y_pred_class = tf.argmax(y_pred, axis=-1) y_true_class = tf.argmax(y_true, axis=-1) correct = tf.cast(tf.equal(y_pred_class, y_true_class), tf.float32) correct = correct * tf.cast(mask, tf.float32) self.correct.assign_add(tf.reduce_sum(correct)) self.total.assign_add(tf.reduce_sum(tf.cast(mask, tf.float32))) def result(self): return self.correct / (self.total + tf.keras.backend.epsilon()) def reset_states(self): self.correct.assign(0.0) self.total.assign(0.0) # Use conditional metric (e.g., only calculate accuracy for positive class) positive_condition = lambda y_true, y_pred: tf.reduce_any(y_true > 0.5, axis=-1) positive_accuracy = ConditionalAccuracy(positive_condition, name='positive_accuracy') model.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', positive_accuracy] )

Evaluation Metrics Best Practices

1. Choose Appropriate Metrics Based on Task

python
# Classification task model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy', 'precision', 'recall', 'f1_score'] ) # Regression task model.compile( optimizer='adam', loss='mse', metrics=['mae', 'mse'] ) # Imbalanced classification task model.compile( optimizer='adam', loss='binary_crossentropy', metrics=['precision', 'recall', 'auc'] )

2. Use Multiple Metrics for Comprehensive Evaluation

python
# Combine multiple metrics model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=[ 'accuracy', Precision(name='precision'), Recall(name='recall'), AUC(name='auc'), TopKCategoricalAccuracy(k=5, name='top5_accuracy') ] )

3. Monitor Metric Changes

python
# Custom callback to monitor metrics class MetricsMonitor(tf.keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): print(f"Epoch {epoch}:") print(f" Accuracy: {logs['accuracy']:.4f}") print(f" Precision: {logs['precision']:.4f}") print(f" Recall: {logs['recall']:.4f}") print(f" AUC: {logs['auc']:.4f}") # Use monitoring callback model.fit(x_train, y_train, validation_data=(x_val, y_val), callbacks=[MetricsMonitor()])

4. Visualize Metrics

python
import matplotlib.pyplot as plt # Plot metric curves def plot_metrics(history): fig, axes = plt.subplots(2, 2, figsize=(15, 10)) # Accuracy axes[0, 0].plot(history.history['accuracy'], label='Training Accuracy') axes[0, 0].plot(history.history['val_accuracy'], label='Validation Accuracy') axes[0, 0].set_title('Accuracy') axes[0, 0].set_xlabel('Epoch') axes[0, 0].set_ylabel('Accuracy') axes[0, 0].legend() # Precision axes[0, 1].plot(history.history['precision'], label='Training Precision') axes[0, 1].plot(history.history['val_precision'], label='Validation Precision') axes[0, 1].set_title('Precision') axes[0, 1].set_xlabel('Epoch') axes[0, 1].set_ylabel('Precision') axes[0, 1].legend() # Recall axes[1, 0].plot(history.history['recall'], label='Training Recall') axes[1, 0].plot(history.history['val_recall'], label='Validation Recall') axes[1, 0].set_title('Recall') axes[1, 0].set_xlabel('Epoch') axes[1, 0].set_ylabel('Recall') axes[1, 0].legend() # AUC axes[1, 1].plot(history.history['auc'], label='Training AUC') axes[1, 1].plot(history.history['val_auc'], label='Validation AUC') axes[1, 1].set_title('AUC') axes[1, 1].set_xlabel('Epoch') axes[1, 1].set_ylabel('AUC') axes[1, 1].legend() plt.tight_layout() plt.show() # Use history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=50) plot_metrics(history)

Summary

TensorFlow provides rich evaluation metrics:

  • Classification Metrics: Accuracy, Precision, Recall, F1 Score, AUC-ROC
  • Regression Metrics: MSE, MAE, MAPE, R²
  • Other Metrics: Top-K Accuracy, Confusion Matrix, IoU, Dice
  • Custom Metrics: Can create custom evaluation metrics for specific needs
  • Metric Combination: Can combine multiple metrics for comprehensive model evaluation

Choosing appropriate evaluation metrics requires considering task type, data characteristics, and business requirements. Through the combination of multiple metrics, you can more comprehensively evaluate model performance.

标签:Tensorflow