Footnotes

(1)

Variable resolution spectral analysis of a signal is presented in details in C. Chouzenoux, Analyse spectrale à résolution variable: application au signal de parole, Ph.D. thesis, ENST Paris, 1982, where it is applied to speech coding.

(2)

Actually half of the FFT length.

(3)

Somehow, zeroing the last cepstral coefficients is like applying a low-pass filter to the (log module of) the original signal spectrum.

(4)

SPHERE is the file format used by most NIST tools and databases. See http://www.nist.gov/speech for the SPHERE package.

(5)

Note that, as opposed to previous versions if SPro, the dimension in the header correspond to the total feature vector dimension.

(6)

This is a known 'bug' that should be corrected someday. It is actually rather impossible to correct the bug for global normalization which would require to store all of the data into memory. However, it is possible -- and probably desirable -- to correct things when a sliding window is specified.

(7)

Frames are duplicated at the (buffer) boundaries.

(8)

HTK is a popular Hidden Markov Model Toolkit from Cambridge University, http://htk.eng.cam.ac.uk.

(9)

Sirocco is a free large vocabulary speech recognition search engine, http://www.enst.fr/~sirocco

(10)

In HTK, this actually depends whether or not NATURALREADORDER=T was specified in your configuration file.

(11)

Maybe to the exception of scopy which is a total mess.

(12)

For increased readability, error checking has been removed from the allocations.

(13)

This will probably change in future versions where we should try to reuse as much as possible of the input features. Meanwhile, you will have to do with things the way they are...

This document was generated by Guillaume Gravier on March, 5 2004 using texi2html