Preprocessor

7.4. Preprocessor#

Section 7.3 introduces the general interfaces map() and map_batches(). For structured tabular data, Ray Data introduces a high-level API called the Preprocessor, building upon map() and map_batches. Preprocessor consists of a series of feature processing operations, providing better integration with machine learning model training and inference. It is similar to scikit-learn’s sklearn.preprocessing, making it easy for scikit-learn users transfering quickly. For unstructured data such as images or videos, it is still recommended to use map() or map_batches().