Abstract:
The problem of singing voice extraction from mono audio recordings, i.e., one microphone separation of voice and music, is studied. The approaches are based on a priori probabilistic models for two sources. A comparative study of different models and estimators is done together with a study of the impact of heterogeneity between training data and data to be separated. We show that the adaptation of music model from the non vocal parts allows to obtain the good results in realistic conditions.