Generate a comprehensive ML model evaluation framework with multi-metric scoring, statistical significance testing, and visual reports.
Select your task type and list models to compare. Prepare a test dataset with ground truth labels. Run benchmark with config.yaml to generate comparative reports.
Initial release
Sign in and download this prompt to leave a review.