Implementation of the Cone Kernel for Speech Signal Analysis

by Kevin Fink

SPHSC 525 : Digital Signal Processing for Speech Signals

Prof. Eugene Buder

Spring Quarter, 1993


The predominant tool used in speech signal analysis today is the spectrogram. The spectrogram provides a picture of the energy in a signal as a function of time and frequency. It is normally displayed as a two-dimensional image, where the x-axis corresponds to time, the y-axis corresponds to frequency, and the intensity of the grey scale values corresponds to energy. By inspecting the spectrogram, important speech features can be observed, identified, and analyzed. However, due to limitations of the spectrogram, these features can be distorted or obscured. In addition, it is difficult to quantify these features based on the spectrogram display.

Other time-frequency representations (TFRs) of a signal are available, including the Wigner TFR, the Choi-Williams TFR, and the cone-kernel TFR. Each representation has different advantages and limitations, and the circumstances where the use of one in particular may be advantageous have not been studied in detail.

No bilinear TFR is perfect. The perfect TFR would give the exact amount of energy present in a signal as a function of time and frequency. However, no known TFR has that property. By breaking this ideal property up into its components, we can classify and rate various TFRs. These components are nonnegativity, time and frequency marginals, finite support, and artifact suppression.

Since the TFR represents the amount of energy present in a signal as a function of time and frequency, it should be nonnegative everywhere. A negative energy density doesn't make physical sense. An infinite number of nonnegative TFRs exist, but very little work has been done with them. In the class of bilinear TFRs (those in which the kernel is independent of the signal), only the spectrogram is nonnegative everywhere.

A TFR can be viewed as a bivariate distribution of signal energy per time/frequency. Thus if we sum all the values of the TFR over all time, we should obtain the signal's average energy spectral density. Similarly, if we sum all the values of the TFR over all frequencies, we should get the instantaneous signal energy density. These two quantities are called the frequency and time marginals, respectively. No bilinear TFR satisfies both marginals for every signal. By selecting the TFR to match the signal, however, sometimes the marginals can be met.

The spectrogram provides a useful illustration of this idea. Choosing a wide- band spectrogram results in good time resolution, whereas choosing a narrow-band spectrogram results in good frequency resolution. So for a signal with several constant frequency tones, a narrow-band spectrogram will satisfy the marginals. In practical terms, it will give an accurate representation of the signal. However, for a signal which has both tones and impulses, no spectrogram bandwidth can simultaneously resolve both the times and frequencies implicit in the signal.

The third property, finite support, is related to the marginals. For a signal which is zero over some time interval, the TFR should also be zero over that interval. Similarly, if the signal has no frequency content in some frequency band, then the TFR should be zero over that band. A TFR satisfying both these conditions is said to have strong finite support in both time and frequency.

A less restrictive concept, weak finite support, is satisfied when the TFR has the same finite extent in time or frequency that the signal does. In other words, a TFR exhibiting weak finite time support will be zero before and after the signal is present, but may not be zero during pauses within the signal.

The spectrogram does not exhibit either strong or weak finite support in either time or frequency. By choosing the bandwidth appropriately, however, the spectrogram can approximate strong finite support in either time or frequency, but not both simultaneously. The cone kernel TFR was explicitly designed to overcome the time-frequency tradeoff of the spectrogram, and does exhibit weak finite time support, but not strong finite time support.

The final property is that of artifact suppression. All bilinear TFRs are smoothed versions of the Wigner TFR. The Wigner distribution has good time and frequency resolution, but exhibits artifacts. These artifacts are cross-terms due to the multiple components of the signal which appear to be actual signal components. These cross-terms may contain useful information about the relationships between the different components in the signal. However, they appear at time/frequency points at which no actual signal energy may be present, giving the misleading interpretation of the signal. The magnitude of these artifacts is often larger than the magnitude of the actual individual signal components. The only distinguishing characteristic of the cross-terms is that they tend to be oscillatory, and thus may be smoothed by a low-pass filtering operation.

This smoothing is performed by convolving the Wigner distribution with a function referred to as the kernel of the TFR. By changing the kernel, the Wigner distribution can be smoothed into the spectrogram, the cone kernel TFR, or any other bilinear TFR. The cone kernel TFR is named that due to the shape of its kernel in the t-ç (time-lag) plane. The spectrogram has the shape of a diamond in the t-ç plane. The kernel's shape determines the amount and type of smoothing to perform.

For the spectrogram, the shape of the kernel is fixed, but the size can be varied by varying the spectrogram's bandwidth. A narrow-band spectrogram has a large kernel, whereas a wide-band spectrogram has a small kernel. The width of the kernel along the t-axis determines the amount of smoothing in time that will occur, and thus a narrow-band spectrogram smooths more in time than a wide-band spectrogram. The cone-kernel TFR has width 1 along the t-axis, and thus doesn't smooth the Wigner distribution in time at all (for zero lag). As the lag increases, the amount of smoothing increases, so artifacts are supressed, although not as well as the spectrogram does.

Convolving the Wigner distribution with the kernel function physically displaces the cross-terms as well as attenuating them. The spectrogram kernel moves the cross terms so that they appear at the same place as the true signal components, resulting in a less cluttered display. The cone kernel TFR also attenuates and moves the cross-terms, resulting in substantially fewer artifacts than the Wigner distribution. It does not smooth them as much, however, as the spectrogram.

In summary, the spectrogram has the properties of nonnegativity and good artifact supression, but requires a tradeoff between time and frequency resolution. The cone-kernel TFR sacrifices nonnegativity in favor of time support and artifact supression. It doesn't smooth cross terms as much as the spectrogram, but it preserves onset times, which provides the opportunity to make accurate measurements of voice onset times (VOTs), voicing durations, silence durations, and other transient features of speech. Since it also avoids the time-frequency tradeoff of the spectrogram, it also promises give higher frequency resolution information about short-time features of the signal.

In order to investigate the opportunities provided by the cone-kernel TFR, I implemented the algorithm in C. The source code is provided in Appendix B. My implementation uses the concept of a boxcar accumulator (Kooiman, 1989) to speed the calculations. It can calculate either the spectrogram or cone-kernel TFRs. The DFT size and window type, length, and increment can be specified. The program can also first difference the data to compensate for the spectral roll- off of speech data. The program reads ASCII data files, and the output is stored in a binary file. Another program is also provided to translate the binary file to an ASCII data file for an external image viewing program to read. This program also allows the data to be manipulated in several ways, then normalizes it for display purposes. A third program allows single time slices to be extracted from the binary output file to allow a display analogous to a short-time Fourier transform.

Appendix A includes a variety of images generated by the cone-kernel TFR and spectrograms for comparison purposes. Figure 1a is a synthetic signal consisting of a sinusoidal signal lasting for 600 samples, turning off for 200 samples, then turning back on again for 224 samples. A second sinusoidal signal of twice the frequency of the first is added from the 600th to the 1024th sample. In addition, the signal has an impulse at the 100th sample and another at the 500th sample.

Figure 1b shows what the perfect TFR would look like. The lower horizontal line corresponds to the first sinusoid, the higher horizontal line to the second sinusoid, and the two vertical lines to the impulses. Figure 1c shows a very narrow-band spectrogram (128 samples in time) of the signal. The positions of the impulses and the gate times of the sinusoids can not be determined due to smearing in time, but the frequencies of the sinusoids are very well-defined. Figure 1d shows a very wide-band spectrogram (8 samples in time). The impulse positions and gate times of the sinusoids can be determined fairly precisely, but the frequencies of the sinusoids are substantially smeared. Figure 1e shows a compromise spectrogram (32 samples), illustrating a medium amount of smearing in both time and frequency.

Figure 1f shows the cone-kernel TFR of the same signal. The impulse positions can be determined exactly, and the frequencies of the sinusoids are also well defined. The gate times of the sinusoids are a little harder to determine, but are still better defined than in the spectrogram. Note, however, the cone- kernel's artifacts due to lack of smoothing, such as the scalloping around the sinusoids and the grey smear just to the left of the higher tone where there should be no signal component. By reducing the cone kernel's extent in time, we can reduce these artifacts at the cost of increasing others and reducing the gating time resolution. This is illustrated in Figure 1g.

Figure 2a is another synthetic signal, consisting of two frequency chirps which are converging over time. Figure 2b shows the perfect TFR of this signal. Figure 2c shows a narrow-band spectrogram. The overall shape is correct, but the exact frequencies of the chirps as a function of time would be impossible to determine. Figure 2d shows a wide-band spectrogram of this signal. This representation is much more precise than the narrow-band spectrogram's, but shows substantial distortion from reality. In particular, the frequencies appear to be stepping closer together rather than uniformly converging. This is due to the limited frequency resolution of a wide-band spectrogram.

Figure 2e shows the cone-kernel representation of the signal. It shows the smooth, linear nature of the frequency variation of the tones with time and isolates the exact frequencies well. It does show a small amount of artifacts near the beginning and end of the signal. Although it is difficult to tell from the figures, the cone-kernel also starts exactly when the signal starts and ends exactly when the signal ends, as opposed to the spectrograms, both of which smear the starting and ending times.

Synthetic signals are good for showing the advantages, disadvantages, and characteristics of different TFRs. However, it is on real signals that the TFRs will ultimately be used, and thus their performance must be measured on realistic signals. Figures 3a through 3i are time histories of speech segments sampled at 22.05 kHz. Figure 3a is the word "tot", spoken by a male speaker. Figure 3b is the word "cot", and Figure 3c is the word "pot". Figures 3d-3f are the initial consonant bursts "t", "k", and "p", respectively, taken from the first part of each word. Figures 3g-3i are those same bursts after being first- differenced to remove low-frequency components.

Figure 4a is a narrow-band spectrogram of the word "tot". The formant tracks for the voiced vowel [a] are clearly visible. The syllable-final "t" burst is also visible, but its onset time cannot be precisely determined. The syllable- initial "t" burst is barely visible. Figure 4b shows a wide-band spectrogram of the same sample. The onset times of both "t" bursts are well-defined, and the individual glottal pulses during the voicing can also be seen. However, the formant tracks are almost completely smeared together. Figure 4c shows a cone- kernel TFR of the sample. The burst onset times are precisely defined, the glottal impulses are well-defined, and the formant tracks during voicing are visible. Figures 4d, 4e, and 4f are the narrow-band spectrogram, wide-band spectrogram, and cone-kernel TFR, respectively, of the word "kot", and Figures 4g, 4h, and 4i follow the same pattern for "pot". In each case, similar observations can be made as for "tot".

The next phase of the project was to determine whether the cone kernel could be used to provide information about transient bursts in speech signals which the spectrogram could not. I examined the bursts "p", "t", and "k" from a male speaker saying the words "pot", "tot", and "kot". Due to the short time intervals of these bursts, the spectrogram's bandwidth must be chosen very large, resulting in poor frequency resolution. The cone kernel should be able to overcome this problem.

The available literature indicates that these three bursts may be differentiated on the basis of their overall frequency content. The "t" burst has predominantly high-frequency content, the "k" burst has mid-frequency content, and the "p" burst has low-frequency content. However, these ranges have not been quantified and are discussed as "rules of thumb" more than accurate identifiers.

The results of this portion of the project are rather preliminary and do not by any stretch of the imagination exhaust the possibilities of using the cone kernel for this or other purposes. I was not able to clearly see the frequency content differentiation described above using the cone kernel. However, I also tried using traditional short-time Fourier transform methods on the signals that I had, and was unable to differentiate between the bursts on that basis, either. That leads me to conclude that either my samples were of too low quality to observe differences between, or that the overall frequency content is not a good method of differentiating the bursts. Since I was easily able to differentiate between the bursts based on hearing the signals, even without hearing the remainder of the syllable, I assume that the limitation lies in the method rather than the data.

In view of this, further work needs to be done to determine what parameters of the cone-kernel TFRs of these bursts may be used to differentiate between them. Examination of Figures 4a-4f results in a few preliminary observations. First, the overall energy level of "t" (Figure 4a) is much higher than that of "k" (Figure 5b), which is correspondingly higher than that of "p" (Figure 5c). This makes intuitive sense, since "p", in particular, sounds much softer than "t" or "k".

Another observation relies on the temporal accuracy of the cone kernel. Whereas the spectrogram would smear the energy content of the bursts over the entire burst, the cone kernel shows the temporal variation in energy content over the burst interval. After normalizing the burst amplitudes (Figures 5b, 5d, and 5e), we can see that most of the energy in the "t" and "k" bursts comes early in the burst, whereas the energy in the "p" burst is larger later in the burst than earlier.

A final observation differentiates the "t" and "k" bursts. The "k" burst seems to have frequency content grouped in two frequency bands, while the "t" burst's energy is confined to one band.

All of these observations are based on only one burst of each type and thus are not statistically significant, but are valuable in pointing out areas which may be fruitful to pursue further.


Bibliography

L. Atlas, J. Fang, P. Loughlin, W. Music, "Resolution advantages of quadratic signal processing," Proceedings of the SPIE - International Society for Optical Engineering, vol. 1566 Advanced Signal Processing Algorithms, Architectures, and Implementations II, pp. 134-143, 1991.

L. Atlas, J. Fang, "Quadratic Detectors for General Nonlinear Analysis of Speech," IEEE Proceedings ICASSP '92, pp. II-9 - II-12, 1992.

L. Atlas, Y. Zhao, R. Marks II, "Application of the generalized time-frequency representation to speech signal analysis," IEEE Proceedings of the Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 517-520, 1987.

L. Atlas, P. Loughlin, J. Pitton, "Truly nonstationary techniques for the analysis and display of voiced speech," IEEE Proceedings ICASSP '91, pp. 433-436, 1991.

L. Atlas, P. Loughlin, J. Pitton, "A theory of desirable properties for preprocessors for speech recognizers," IEEE Proceedings Pacific Rim Conference on Communications, Computers, and Signal Processing, pp. 803-806, 1991.

S. Cheung and J. Lim, "Combined multi-resolution (wideband/narrowband) spectrogram," IEEE Proceedings ICASSP '91, pp. 457-460, 1991.

T. Claasen and W. Mecklenbrauker, "The Wigner distribution - A tool for time-frequency analysis, a) Part I: continuous-time signals; b) Part II: discrete-time signals; c) Part III: relations with other time-frequency signal transformations," Phillips Journal of Resesarch, vol. 35, nos. 3-6, pp. 217-250, 276-300, 372-389, 1980.

R. D. Kent and C. Read, The Acoustic Analysis of Speech, Singular Publishing Group, 1992.

W. C. Kooiman, "Time-Frequency Speech Displays that are an Improvement over the Spectrogram", M.S. Thesis, University of Washington, 1989.

P. Loughlin, J. Pitton, L. Atlas, "New properties to alleviate interference in time-frequency representations," IEEE Proceedings ICASSP '91, pp. 3205-3208, 1991.

P. Louglin, L. Atlas, J. Pitton, "Advanced Time-Frequency Representations for Speech Processing," The European Speech Communication Association Workshop on Comparing Speech Signal Representations, 1992.

C. E. Reid and T. B. Passin, Signal Processing in C. John Wiley & Sons, Inc. 1992.

E. Velez and R. Absher, "Transient analysis of speech signals using the Wigner time-frequency representation," IEEE Proceedings ICASSP '89, pp. 2242-2245, 1989.

Y. Zhao, L. Atlas, R. Marks II, "The use of cone-shaped kernels for generalized time-frequency representations of nonstationary signals," IEEE Transactions ASSP, vol. 38, no. 7, pp. 1084-1091, 1990.


Appendix A - Figures


Appendix B - Source Code Listings

*** GTFR.C - Main Program ***

#include "headers.h" main(argc,argv) int argc; char **argv; { VectorDouble *input_data; int win2,flags; Arguments *args = new_Arguments(argc,argv); char input_file_name[80]; char output_file_name[80]; char log_file_name[84]; FILE *log_ptr; getargs(args); if(args->dft_size < args->window_size) { fprintf(stderr,"Window size must be at least as large as DFT size.\n"); exit(-1); } win2 = args->dft_size/2; flags = args->flags; strcpy(input_file_name,args->input_file_name); strcpy(output_file_name,args->output_file_name); strcpy(log_file_name,output_file_name); strcat(log_file_name,".log"); log_ptr = openfile(log_file_name,"w"); if(flags&4) { printf("Input File Name: %s\n",input_file_name); printf("Output File Name: %s\n",output_file_name); printf("GTFR Type: "); if(flags&16) printf("Cone Kernel\n"); else printf("Spectrogram\n"); printf("Window Size: %d\n",args->window_size); printf("DFT Size: %d\n",args->dft_size); printf("First Difference Data:"); if(flags&1) printf("Yes\n"); else printf("No\n"); printf("Hanning Window Data:"); if(flags&2) printf("Yes\n"); else printf("No\n"); printf("Jump Mode (Skip points between DFT calculations):"); if(flags&8) printf("Yes, skip %d points.\n",args->jump_size); else printf("No\n"); if(flags&32) { printf("Using Magnitude of DFT instead of Real Part (for "); printf("Cone Kernel only).\n"); } } fprintf(log_ptr,"Input File Name: %s\n",input_file_name); fprintf(log_ptr,"Output File Name: %s\n",output_file_name); fprintf(log_ptr,"GTFR Type: "); if(flags&16) fprintf(log_ptr,"Cone Kernel\n"); else fprintf(log_ptr,"Spectrogram\n"); fprintf(log_ptr,"Window Size: %d\n",args->window_size); fprintf(log_ptr,"DFT Size: %d\n",args->dft_size); fprintf(log_ptr,"First Difference Data:"); if(flags&1) fprintf(log_ptr,"Yes\n"); else fprintf(log_ptr,"No\n"); fprintf(log_ptr,"Hanning Window Data:"); if(flags&2) fprintf(log_ptr,"Yes\n"); else fprintf(log_ptr,"No\n"); fprintf(log_ptr,"Jump Mode (Skip points between DFT calculations):"); if(flags&8) fprintf(log_ptr,"Yes, skip %d points.\n",args->jump_size); else fprintf(log_ptr,"No\n"); if(flags&32) { fprintf(log_ptr,"Use Magnitude of DFT instead of Real Part (for "); fprintf(log_ptr,"Cone Kernel only).\n"); } if(flags&4) printf("\nReading Data...\n"); input_data=getdata(input_file_name); if(args->window_size>input_data->length) { printf("Window size greater than number of data points.\n"); exit(-1); } if(flags&4) printf("Number of Data Points: %d\n",input_data->length); { int rows = input_data->length; if(flags&8) rows = rows/(args->jump_size/2)-1; if(flags&4) printf("Output file size: %d rows by %d columns\n",rows,win2); fprintf(log_ptr,"Output file size: %d rows by %d columns\n",rows,win2); } if(flags&1) { if(flags&4) printf("First Differencing Data...\n"); input_data = firstdifference(input_data); } if(flags&4) { if(flags&16) printf("Computing Cone Kernel...\n"); else printf("Computing Spectrogram...\n"); } if(flags&16) cone(input_data,args); else spectrogram(input_data,args); old_VectorDouble(input_data); close(log_ptr); } <H2>*** CONE_KERNEL.C - Cone Kernel Implementation ***</H2> #include "headers.h" void cone(VectorDouble *input_data, Arguments *args) { int n,k, num_points=input_data->length, dft_size = args->dft_size, dft2=dft_size/2, skip=0, flags = args->flags; double *window, *tmpdata = (double *) malloc(dft_size*sizeof(double)); VectorDouble *work_data = paddata(input_data); double *data = work_data->buffer + work_data->first; VectorDouble *accumulator1 = new_VectorDouble(dft_size), *accumulator2 = new_VectorDouble(dft_size); double *acc1 = accumulator1->buffer + accumulator1->first, *acc2 = accumulator2->buffer + accumulator2->first; VectorComplex *df1 = new_VectorComplex(dft_size), *df2 = new_VectorComplex(dft_size); Complex *df1d = df1->buffer + df1->first, *df2d = df2->buffer + df2->first; FFT *control = new_fft(dft_size); FILE *fileptr = openfile(args->output_file_name,"wb"); if(flags&2) window = hanning(dft_size); else window = boxcar(dft_size); for(n=num_points;n<2*num_points;n+=2) { for(k=0;k<args->window_size;k++) { acc1[k] = acc2[k] + data[n] * data[n+k]; acc1[k] -= data[n-1] * data[n-k-1]; acc2[k] = acc1[k] + data[n+1] * data[n+k+1]; acc2[k] -= data[n] * data[n-k]; } skip+=2; if(!(flags&8) || skip==args->jump_size) { skip=0; fft_pair(control,accumulator1,df1,accumulator2,df2,window); for(k=0;k<dft2;k++) { if(flags&32) /* Use magnitude-squared of DFT */ { tmpdata[k]=(df1d[k].x*df1d[k].x+df1d[k].y*df1d[k].y); tmpdata[k+dft2] = (df2d[k].x*df2d[k].x+df2d[k].y*df2d[k].y); } else /* Use real part of DFT (and convert to dB) */ { tmpdata[k] = df1d[k].x; if(tmpdata[k]<=0) tmpdata[k] = -100.0; else tmpdata[k] = 10.0 * log10(tmpdata[k]); tmpdata[k+dft2] = df2d[k].x; if(tmpdata[k+dft2]<=0) tmpdata[k+dft2] = -100.0; else tmpdata[k+dft2] = 10.0 * log10(tmpdata[k+dft2]); } } fwrite(tmpdata,sizeof(double),dft_size,fileptr); } } fclose(fileptr); }

*** SPEC_GRAM.C - Spectrogram Implementation ***

#include "headers.h" void spectrogram(VectorDouble *input_data, Arguments *args) { int n,k, num_points=input_data->length, window_size = args->window_size, dft_size = args->dft_size, dft2=dft_size/2, skip=2, flags = args->flags; double *window; double *tmpdata = (double *) malloc(dft_size*sizeof(double)); VectorDouble *work_data = paddata(input_data); double *data = work_data->buffer + work_data->first; VectorDouble *accumulator1 = new_VectorDouble(dft_size), *accumulator2 = new_VectorDouble(dft_size); double *acc1 = accumulator1->buffer + accumulator1->first, *acc2 = accumulator2->buffer + accumulator2->first; VectorComplex *df1 = new_VectorComplex(dft_size), *df2 = new_VectorComplex(dft_size); Complex *df1d = df1->buffer + df1->first, *df2d = df2->buffer + df2->first; FFT *control = new_fft(dft_size); FILE *fileptr = openfile(args->output_file_name,"wb"); if(flags&2) window = hanning(window_size); else window = boxcar(window_size); if(flags&8) skip = args->jump_size; for(n=num_points;n<2*num_points;n+=skip) { accumulator1->buffer = &data[n]; accumulator2->buffer = &data[n+1]; fft_pair(control,accumulator1,df1,accumulator2,df2,window); for(k=0;k<dft2;k++) { tmpdata[k]=(df1d[k].x*df1d[k].x+df1d[k].y*df1d[k].y); tmpdata[k+dft2] = (df2d[k].x*df2d[k].x+df2d[k].y*df2d[k].y); } fwrite(tmpdata,sizeof(double),dft_size,fileptr); } fclose(fileptr); }

*** DATA_IO.C - File Utilities ***

#include "headers.h" FILE *openfile(char *output_file_name, char *mode) { FILE *OUT = malloc(sizeof(FILE)); if((OUT=fopen(output_file_name,mode))==NULL) { fprintf(stderr,"Error: Can't open file %s.\n",output_file_name); exit(-1); } return(OUT); } VectorDouble *getdata(input_file_name) char input_file_name[80]; { FILE *filept = NULL; int point_num=0; double time,value; int num_points=1024; VectorDouble *input_data = new_VectorDouble(num_points); double *data = input_data->buffer + input_data->first; double max = -MAXFLOAT, min = MAXFLOAT; filept=openfile(input_file_name,"r"); while(fscanf(filept,"%lg",&value)==1) { data[point_num++]=value; if(value>max) max=value; if(value<min) min=value; if(point_num>=num_points) { input_data=resize_VectorDouble(input_data,num_points+=1024); data = input_data->buffer + input_data->first; } } fclose(filept); input_data=resize_VectorDouble(input_data,point_num); input_data->max = max; input_data->min = min; return(input_data); } void putintdata(output_file_name,data_array) char *output_file_name; MatrixInt *data_array; { FILE *filept = NULL; int row_index, num_rows = data_array->rows, col_index, num_cols = data_array->columns; int *data = data_array->buffer; filept=openfile(output_file_name,"w"); for(row_index=0;row_index<num_rows;row_index++) { for(col_index=0;col_index<num_cols;col_index++) { fprintf(filept,"%d\t",data[row_index*num_rows+col_index]); } fprintf(filept,"\n"); } fclose(filept); }

*** DATA_UTIL.C - Miscellaneous Data Operations ***

#include "headers.h" /* PadData: Creates a new VectorDouble which has been padded with zeros. */ VectorDouble *paddata(VectorDouble *input_data) { int index, num_points = input_data->length; VectorDouble *work_data = new_VectorDouble(num_points*3); double *work = work_data->buffer + work_data->first; for(index=0;index<num_points;index++) { work[index]=0; work[num_points+index]=input_data->buffer[index]; work[2*num_points+index]=0; } work_data->max = input_data->max; work_data->min = input_data->min; return(work_data); } /* FirstDifference: Calculates the first difference of the data (a very simple high-pass filtering operation. This should only be done for speech signals. */ VectorDouble *firstdifference(VectorDouble *input_data) { int index, num_points = input_data->length; VectorDouble *work_data = new_VectorDouble(num_points); double *work = work_data->buffer + work_data->first; double *data = input_data->buffer + input_data->first; double tmp, max = -MAXFLOAT, min = MAXFLOAT; for(index=1;index<num_points;index++) { tmp = data[index]-data[index-1]; if(tmp>max) max=tmp; if(tmp<min) min=tmp; work[index] = tmp; } tmp = data[0]; if(tmp>max) max=tmp; if(tmp<min) min=tmp; work[0] = tmp; work_data->max = max; work_data->min = min; return(work_data); } /* BoxCar: Creates a boxcar window (all 1's). */ double *boxcar(int size) { double *window = (double *) malloc(size*sizeof(double)); int index; for(index=0;index<size;index++) window[index]=1; return(window); } /* Hanning: Creates a Hanning window */ double *hanning(int size) { double *window = (double *) malloc(size*sizeof(double)); int index; for(index=0;index<size;index++) { window[index]=0.5*(1.0-cos(2*M_PI*index/(size-1))); } return(window); }

*** GET_ARGS.C - Parse command line arguments ***

#include "headers.h" void getargs(Arguments *args) { intargc = args->argc, argnum, window_size = 0, dft_size = 0, jump_size = 0, flags = 48; char **argv = args->argv, input_file_name[80] = "", output_file_name[80] = ""; if(argc<2) { printf("Not enough arguments given.\nUsage: cone/spectrogram "); printf("-i input_file -o output_file -s window_size [-d dft_size] "); printf("[-f] [-w] [-V] [-j jump_size] [-t type] [-r]\n"); printf("Where:\n"); printf("\t-i input_file specifies the input file name to be processed\n"); printf("\t-o output_file specifies the output file name\n"); printf("\t-s window_size specifies the window size to be used\n"); printf("\t-d dft_size specifies the DFT size to be used (defaults "); printf("to window_size)\n"); printf("\t-f turns on First Differencing\n"); printf("\t-w turns on Hanning Windowing\n"); printf("\t-V turns on Verbose mode\n"); printf("\t-r Use real part of DFT instead of magnitude for output\n"); printf("\t-j turns on jump mode (skip jump_size points between "); printf("DFT computations)\n"); printf("\t-t spectrogram computes a spectrogram.\n"); printf("\t-t cone computes a GTFR with the cone kernel (default).\n"); exit(-1); } for(argnum=1;argnum<argc;argnum++) { switch(argv[argnum][1]) { case 'i': sprintf(input_file_name,argv[++argnum]); break; case 'o': sprintf(output_file_name,argv[++argnum]); break; case 's': window_size = atoi(argv[++argnum]); break; case 'd': dft_size = atoi(argv[++argnum]); break; case 't': if(strcmp(argv[++argnum],"spectrogram")==0) flags -= 16; else if(strcmp(argv[argnum],"cone")!=0) printf("Don't know type %s... Using Cone Kernel\n",argv[argnum]); break; case 'f': flags = flags | 1; break; case 'w': flags = flags | 2; break; case 'V': flags = flags | 4; break; case 'r': flags -= 32; break; case 'j': flags = flags | 8; jump_size = atoi(argv[++argnum]); if(jump_size==0) printf("Warning: Jump mode initiated, but jump_size set to 0.\n"); break; default : printf("Skipping unknown command line option: %s\n",argv[argnum]); break; } } if(input_file_name[0]=='\0') { fprintf(stderr,"No input file name specified.\n"); exit(-1); } if(output_file_name[0]=='\0') { fprintf(stderr,"No output file name specified.\n"); exit(-1); } if(window_size == 0) { fprintf(stderr,"No window size specified.\n"); exit(-1); } strcpy(args->input_file_name,input_file_name); strcpy(args->output_file_name,output_file_name); args->window_size = window_size; if(dft_size == 0) args->dft_size = window_size; else args->dft_size = dft_size; args->jump_size = jump_size; args->flags = flags; }

*** FFT.C - Fast Fourier Transform Algorithms ***

/* Adapted from Signal Processing in C. */ #include "numeric.inc" #include "factor.c" int fft_pair(FFT *control, VectorDouble *f1, VectorComplex *df1, VectorDouble *f2, VectorComplex *df2, double *window) { /* Make one complex vector from two real vectors. */ { Complex *c = df1->buffer + df1->first; double *real = f1->buffer + f1->first, *imag = f2->buffer + f2->first; int sc = df1->spacing, sr = f1->spacing, si = f2->spacing, k = f1->length; for(;k--;c+=sc,real+=sr,imag+=si,window++) { c->x = *real * *window; c->y = *imag * *window; } } /* Now do the FFT */ fft(control,df1,df1); /* Pull the two FFTs apart */ { int s1 = df1->spacing, s2 = df2->spacing, k = f1->length; Complex *up1 = df1->buffer + df1->first, *dn1 = up1 + s1*(k-1), *up2 = df2->buffer + df2->first, *dn2 = up2 + s2*(k-1); up2->x = up1->y; up2->y = 0; up2 += s2; up1->y = 0; up1 += s1; for(k=(k-1)/2;k--;up1+=s1,up2+=s2,dn1-=s1,dn2-=s2) { Complex u= *up1, d= *dn1; dn2->x = up2->x = (u.y + d.y)*0.5; dn2->y = -(up2->y = (d.x - u.x)*0.5); dn1->x = up1->x = (u.x + d.x)*0.5; dn1->y = -(up1->y = (u.y - d.y)*0.5); } if((f1->length & 1) == 0) { Complex u = *up1; up2->x = u.y; up2->y = 0; up1->x = u.x; up1->y = 0; } } return 1; } int fft(FFT *control, VectorComplex *f, VectorComplex *dft) { int prime_count = control->factors.number_of_primes; int flop = 0; int N = control->factors.N; int m = N; VectorComplex *t[2]; t[0] = new_VectorComplex(N); t[1]=dft; copy_VectorComplex(f,t[0]); while(prime_count--) { int p = control->factors.pp[prime_count].prime; int k = control->factors.pp[prime_count].power; for(;k--;flop ^= 1) { m /= p; iterate_fft(N,m,p,control->roots,t[flop],t[flop^1]); } } if(!flop) copy_VectorComplex(t[0],dft); old_VectorComplex(t[0]); return 1; } iterate_fft(int N, int m, int p, Complex *roots, VectorComplex *f0, VectorComplex *f1) { Complex *f1b = f1->buffer+f1->first, *f0b = f0->buffer+f0->first; int spacing_0 = f0->spacing, spacing_1 = f1->spacing; int mp = m*p; int g = -1, g_limit = N/m, gm = 0, gmp = -mp; for(;++g<g_limit;gm+=m) { int r=m; gmp += mp; if(gmp >= N) gmp -= N; while(r--) { Complex sum = {0,0}; int s=0, gs=0; for(;s<mp;s+=m) { Complex w = roots[gs], f = f0b[spacing_0*(r+s+gmp)]; sum.x += w.x * f.x + w.y * f.y; sum.y += w.x * f.y - w.y * f.x; gs += gm; if(gs >= N) gs -= N; } f1b[spacing_1*(r+gm)] = sum; } } }

*** GET_LINE.C - Pulls one line from TFR ***

#include "headers.h" main(argc,argv) int argc; char **argv; { char input_file[80], output_file[80]; int rows = atoi(argv[3]), cols = atoi(argv[4]), row_number = atoi(argv[5]), col_index; double *tmpdat = (double *) malloc(cols*sizeof(double)); FILE *in_ptr; FILE *out_ptr; if(argc!=6) { printf("Wrong number of arguments.\nUsage: getline input_file_name"); printf(" output_file_name rows columns row_number\n"); printf("\trows = number of time points\n"); printf("\tcolumns = number of frequency points\n"); exit(-1); } strcpy(input_file,argv[1]); in_ptr = openfile(input_file,"rb"); strcpy(output_file,argv[2]); if(strcmp(output_file,"stdout")==0) out_ptr = stdout; else out_ptr = openfile(output_file,"w"); if(fseek(in_ptr,8*cols*row_number,SEEK_SET)) { fprintf(stderr,"Error in fseek call. Probably indexed out of file.\n"); exit(1); } if((fread(tmpdat,sizeof(double),cols,in_ptr))==0) { fprintf(stderr,"Error in fread call. Probably ran out of data.\n"); exit(1); } fclose(in_ptr); for(col_index=0;col_index<cols;col_index++) { fprintf(out_ptr,"%lg\t",tmpdat[col_index]); } fprintf(out_ptr,"\n"); fclose(out_ptr); }

*** NORMALIZE.C - Convert output of GTFR to ASCII ***

#include "headers.h" main(argc,argv) int argc; char **argv; { char input_file[80]; char output_file[80]; int window_size,flags, rows = atoi(argv[3]), cols = atoi(argv[4]); double power = atof(argv[5]); int row_index,col_index; double *tmpdat = (double *) malloc(cols*sizeof(double)); FILE *in_ptr; FILE *out_ptr; if(argc!=6) { printf("Wrong number of arguments.\nUsage: normalize input_file_name"); printf(" output_file_name rows columns power\n"); exit(-1); } strcpy(input_file,argv[1]); if(strcmp(input_file,"stdin")==0) in_ptr = stdin; else in_ptr = openfile(input_file,"rb"); strcpy(output_file,argv[2]); if(strcmp(output_file,"stdout")==0) out_ptr = stdout; else out_ptr = openfile(output_file,"w"); for(row_index=0;row_index<rows;row_index++) { if((fread(tmpdat,sizeof(double),cols,in_ptr))!=cols) { fprintf(stderr,"Ran out of data.\n"); exit(-1); } for(col_index=0;col_index<cols;col_index++) fprintf(out_ptr,"%g\t",tmpdat[col_index]); fprintf(out_ptr,"\n"); } fclose(in_ptr); fclose(out_ptr); }

Kevin Fink's Home Page (http://www.fink.com/Kevin.html)

kevin@fink.com