One microphone singing voice separation
One microphone source separation
Multichannel nonnegative matrix factorization for convolutive blind source separation
Factorial scaled hidden Markov model for single channel speech / music separation
SARAH project istrument extraction demos:
Sound examples for the paper: "One microphone singing voice separation using source-adapted models", A. Ozerov, P. Philippe, R. Gribonval and F. Bimbot, In IEEE Worksh. on Apps. of Signal Processing to Audio and Acoustics (WASPAA'05), pages 90-93, Mohonk, NY, Oct. 2005.
Example 1 (in English) | Mixture (= Music + Voice): WAV | Estimated Music: WAV | Estimated Voice: WAV |
Example 2 (in Russian) | Mixture (= Music + Voice): WAV | Estimated Music: WAV | Estimated Voice: WAV |
Sound examples for the paper: "Séparation voix / musique à partir d'enregistrements mono : quelques remarques sur le choix et l'adaptation des modèles", A. Ozerov, R. Gribonval, P. Philippe and F. Bimbot, In GRETSI'05 Symposium on Signal and Image Processing, Louvain-la-Neuve, Belgique, Sept. 2005.
Sound examples for the paper: "Adaptation of Bayesian models for single channel source separation and its application to voice / music separation in popular songs", A. Ozerov, P. Philippe, F. Bimbot and R. Gribonval, IEEE Trans. on Audio, Speech and Lang. Proc., special issue on Blind Signal Proc. for Speech and Audio Applications, vol. 15, no. 5, pp. 1564-1578, July 2007.
Note that in contrast to the examples 1 - 3, which are obtained using manual vocal/non vocal segmentaions, the examples below are obtained in a completely automatic manner.
- | Language | Style | Mixture (= Music + Voice) | Estimated Music | Estimated Voice |
Example 4 | English | pop | WAV | WAV | WAV |
Example 5 | Spanish | pop/rock | WAV | WAV | WAV |
Example 6 | French | rock | WAV | WAV | WAV |
Example 7 | English | rock | WAV | WAV | WAV |
Example 8 | English | rock | WAV | WAV | WAV |
Example 9 | Spanish | pop/rock | WAV | WAV | WAV |
Example 10 | Russian | rock | WAV | WAV | WAV |
Example 11 | Russian | rock | WAV | WAV | WAV |
Example 12 | Spanish | rock | WAV | WAV | WAV |
Example 13 | Spanish | ska/rock | WAV | WAV | WAV |
Example 14 | English | rap/rock | WAV | WAV | WAV |
Example 15 | English | techno/rock | WAV | WAV | WAV |
Sound examples for the paper: "Single channel source separation using static and dynamic features in the power domain", I. Potamitis and A. Ozerov, In EUSIPCO, 16th European Signal Processing Conference, Laussane, Switzerland, August 25-29, 2008.
Example | Separation method | Estimated Source 1 | Estimated Source 2 |
Example 1 Cicada / Cricket Mixture (= Cicada + Cricket): WAV |
Spectral | Estimated Cicada: WAV | Estimated Cricket: WAV |
Power Static | Estimated Cicada: WAV | Estimated Cricket: WAV | |
Power Static+Dynamic | Estimated Cicada: WAV | Estimated Cricket: WAV | |
Example 2 Speech 1 / Speech 2 Mixture (= Speech 1 + Speech 2): WAV |
Spectral | Estimated Speech 1: WAV | Estimated Speech 2: WAV |
Power Static | Estimated Speech 1: WAV | Estimated Speech 2: WAV | |
Power Static+Dynamic | Estimated Speech 1: WAV | Estimated Speech 2: WAV | |
Example 3 Speech / Bubble noise Mixture (= Speech + Bubble): WAV |
Spectral | Estimated Speech: WAV | Estimated Bubble: WAV |
Power Static | Estimated Speech: WAV | Estimated Bubble: WAV | |
Power Static+Dynamic | Estimated Speech: WAV | Estimated Bubble: WAV | |
Example 4 Speech / Music Mixture (= Speech + Music): WAV |
Spectral | Estimated Speech: WAV | Estimated Music: WAV |
Power Static | Estimated Speech: WAV | Estimated Music: WAV | |
Power Static+Dynamic | Estimated Speech: WAV | Estimated Music: WAV |
Sound examples for the paper: "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation", A. Ozerov and C. Févotte, IEEE Trans. on Audio, Speech and Lang. Proc., Special Issue on Signal Models and Representations of Musical and Environmental Sounds, vol. 18, no. 3, pp. 550-563, March 2010.
Note that this experiment was performed in some laboratory (non-realistic) conditions, since perturbed oracle initialization is obtained from the original separated sources (see the paper).
Musical sources are from Shannon Hurley
Mixing filters are from Source Separation Evaluation Campaign SiSEC 2008 (Under-determined speech and music mixtures task)
Mixture type | Linear instantaneous mixture | Convolutive mixture | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Test material |
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Condition | perturbed oracle init. | after convergence | perturbed oracle init. | after convergence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
Test material is from Source Separation Evaluation Campaign SiSEC 2008 (Under-determined speech and music mixtures task)
Linear instantaneous mixtures | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mixture | female4 | male4 | nodrums | wdrums | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Test material |
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
Synthetic convolutive mixtures (1m) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mixture | female4 | male4 | nodrums | wdrums | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Test material |
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
Live-recorded convolutive mixtures (1m) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mixture | female4 | male4 | nodrums | wdrums | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Test material |
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
Test material is from Source Separation Evaluation Campaign SiSEC 2008 ( Professionally produced music recordings task)
Recording | "Que pena tanto faz" by Tamy | "Roads" by Bearlin | ||||||||||||
Training source images |
|
| ||||||||||||
Mix to separate |
|
| ||||||||||||
Source images separated with |
|
| ||||||||||||
Source images separated with |
|
|
| EM algorithm | MU rules | |||||||||||||||||||||||||||||||
Example 1 |
|
|
| ||||||||||||||||||||||||||||||
Example 2 |
|
|
| ||||||||||||||||||||||||||||||
Example 3 |
|
|
|
Sound examples for the paper: "Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation", A. Ozerov and C. Févotte, In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'09), pages 3137-3140, Taipei, Taiwan, April 19-24, 2009.
Musical sources are from Shannon Hurley
Mixing filters are from Source Separation Evaluation Campaign SiSEC 2008 (Under-determined speech and music mixtures task)
Original (mono) sources |
| |||
Convolutive mixing filters | Matlab .mat file | |||
Original (stereo) source images |
| |||
Stereo mix (to separate) | mix |
Algorithm | EM algorithm | MU rules | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Condition | best SDR | best cost | best SDR | best cost | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Estimated images |
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
Saparation into 3 sources was performed with 500 iterations of the EM/MU algorithm, using 18 NMF components (in total), and some non-random initialization.
Note that the convolutive mixing system model was used for all recordings, even if some recordings are in fact mixed using an instantaneous mixing system.
| Original stereo mix | Stereo images estimated with EM algorithm | Stereo images estimated with MU rules | ||||||
Example 1 |
|
| |||||||
Example 2 |
|
| |||||||
Example 3 |
|
| |||||||
Example 4 |
|
| |||||||
Example 5 |
|
| |||||||
Example 6 |
|
| |||||||
Example 7 |
|
|
Sound examples for the paper: "Factorial scaled hidden Markov model for polyphonic audio representation and source separation", A. Ozerov, C. Févotte and M. Charbit, In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA'09), Mohonk, NY, Oct. 18-21, 2009.
Speech model | S-HMM | S-HMM | NMF | ||||||
Music model | S-HMM | NMF | NMF | ||||||
Example | Speech | Music | Mixtures | Est. speech | Est. music | Est. speech | Est. music | Est. speech | Est. music |
Example 1 (male) | |||||||||
Example 2 (female) | |||||||||
Example 3 (male) | |||||||||
Example 4 (female) | |||||||||