7.4. Preprocessor#
Section 7.3 introduces the general interfaces map()
and map_batches()
. For structured tabular data, Ray Data introduces a high-level API called the Preprocessor, building upon map()
and map_batches
. Preprocessor consists of a series of feature processing operations, providing better integration with machine learning model training and inference. It is similar to scikit-learn’s sklearn.preprocessing, making it easy for scikit-learn users transfering quickly. For unstructured data such as images or videos, it is still recommended to use map()
or map_batches()
.