In TensorFlow, step and epoch are two commonly used terms during neural network training, describing different aspects of data processing and iteration.
1. Step
A step refers to the process of performing one forward pass and one backward pass using a batch of data. In other words, completing one step involves processing a single batch of data.
Example:
Suppose you have a dataset with 1000 samples. If you set the batch size to 100, processing the entire dataset requires 10 steps (1000 / 100 = 10).
2. Epoch
A epoch refers to traversing the entire dataset completely, meaning all data is processed by the model once. This implies that the number of steps per epoch equals the total number of samples divided by the batch size.
Example:
Continuing with the previous example, if your dataset has 1000 samples and the batch size is set to 100, each epoch contains 10 steps. If you set the training process to 10 epochs, the total number of steps will be 100 (10 epochs * 10 steps/epoch).
Summary
- Step focuses on the process of a single iteration.
- Epoch focuses on the complete traversal of the entire dataset.
These concepts help us understand and control the progress and details of model training. Adjusting them typically affects the training performance and speed of the model, making them important in practice.