Abstract:
The problem of singing voice extraction from mono audio recordings, i.e., one microphone separation of voice and music, is studied. The approach is based on a priori probabilistic models for two sources, more precisely on Gaussian Mixture Models (GMM). A method for model adaptation to the characteristics of the mixed sources is developed and a comparative study of different models and estimators is performed. We show that the adaptation of the model of music from the non-vocal parts of songs yields good results in realistic conditions.