Is the performance of my deep network too good to be true? A direct approach to estimating the Bayes error in binary classification