Archive for the ‘Audio DSP’ Category

Musical Blackboard Video

Tuesday, February 20th, 2007

HRTF 3D Panning VST Plugin

Tuesday, June 28th, 2005

Introduction

By the power of HRTF, I present an HRTF panner… in VST form! Most of the code was originally created for a PortAudio implementation, and then adapted for the VST specification. The code to read the HRTF directory was taken and adapted from code in the MAT 240C codebase, by an anonymous author.

UI of the prototype HRTF panning plugin

The interface is nothing to write home about, but looks can be deceiving. This plugin will eat your mono signal’s soul and spit it back out as a glob of spatial HRTF audio.

Download

Download the project file (Microsoft Visual Studio .Net). It should be cross-compatible, but may require some tweaking on your system to compile properly.

You will also need FFTW3 (compiled as a DLL), LibSndFile (compiled as a LIB), and the IRC_1026_R IRCAM HRTF files. Upon downloading the IRCAM files, put the “files” file included in the zip archive into the directory. This allows the code to load up all of the HRIR impulse responses. (Note: the use of another IRCAM data set is possible, but modification of both the C code and the “files” index file will be required.)

Requirements

To run the plugin on your machine, you will require:

  • hrtf.dll installed in your common VST Plugins directory
  • Support files: fftw3.dll and the IRC_1026_R directory installed in the same location as hrtf.dll
  • A reasonably fast computer (10% computer usage on a 2.2 GHz P4, you do the math)
  • Patience for pre-alpha software

Though this plugin should be cross-platform and compatible with all VST hosts, this is not guaranteed. As of this writing, the only host this has been tested in is H. Seib’s VSTHost.

Future Improvements

There are many things that need to be improved. Let’s list them, shall we?

  • User Interface: A kick-booty GUI is essential for a good plugin. We’re talking 3D OpenGL Matrix-style graphics, with flying speaker sources and stuff…
  • Fix the Phi bug: Right now setting Phi to anything other than between 0 and 90 and 315 through 345 produces wrong results. Modifying the SelectHRTF() function will accomplish this
  • HRTF interpolation: As of now, the plugin simply selects the closest HRTF. Fine for static transforms, not so good for moving sources. We need to calculate the intermediate HRTFs for moving sources to better accomodate moving sounds.
  • Lots’o testing: VST plugins and hosts are notorious for their idiosynchrocies. Thus, a regimented system of testing needs to be implemented to weed out bugs.

Final words

And that’s that. Happy? What, you mean you’re not? Then contact me and let me know about it.

The Musical Blackboard

Tuesday, June 28th, 2005

The prototype of the Musical Blackboard hacked Casio keyboard

Introduction

The virtual musical blackboard is an alternative musical interface device. Specifically, the blackboard serves as a “glitch interface” to a Casio MT-401 synthesizer.

The Musical Blackboard schematics (Note: The PIC Microcontroller includes all of the helper circuitry given by the CUI design by Dan Overholt)

In Action

Concept

The Casio MT-240 Keyboard

The original concept for the system came from an initial exploration of the circuitry of the Casio MT-401 keyboard. It was initially thought that because the keyboard is digital in its sound production, there would be limited opportunities for “circuit bending”, a procedure traditionally reserved for analog systems.

But in the process of probing the system, it was discovered that shorting various pins on the memory chip together produced a variety of digital noise effects.

From this discovery, the abstract blackboard interface was conceived. Because the digital noises produced from the memory pin shorting were loud and abrasive, it was reminiscent of the sound that fingernails on a blackboard produce.

The Casio MT-240 circuit board

Now, it just so happens that the main keyboard PCB contains an unused 40-pin chip slot (presumably for debugging early in-house system prototypes). Using this, we will be able to interface with the keyboard in whatever fashion we want.

Design

Shorting various pins on the memory chip together produces a variety of digital noise effects. There are several ways to replicate the effect of shorting memory pin chips. One of the simplest would be to connect all 40 pins to an external micro-controller. The micro-controller could pull individual pins high or low. However, this method does not allow us to short individual memory pins together. Nor does it replicate the analog nature of physically attaching electrodes to memory pins.

Thus, we will construct a 40-pin parallel analog switching circuit. The logic is simple: A serial stream of digital bits gets converted to a 40-bit parallel signal. These lines each control the enabling of an IC analog switch. The inputs to each of these switches are wired to a common point, so that an enabled switch will connect the corresponding memory pin with all other pins currently switched on.

Sensor design

The prototype of the capacitive pressure sensor

In order to realize the blackboard metaphor, several different types of sensors were considered. The original goal was to construct a sensor that could detect a range of different gestures from a large surface. However, the complexity of the interface needed to remain low because of the limited time available for the project.

After much research, it was decided that a capacitive sensor would be the ideal method to gather input from the blackboard. Capacitive sensors can be found in many touch-sensitive lamps and radios, because they are cheap and relatively simple to produce.

However, most touch-sensitive switches are of the simple on-off variety - definitely not enough input variation for the kind of interface we need. So a pressure- and velocity-sensitive version was adapted from a circuit by John Simonton.

Construction

The prototype of the physical blackboard

The construction of the physical blackboard interface proved to be fairly simple. A wooden frame was constructed to hold a piece of scrap sheet metal. The unit was then soldered to a lead attached to the capacitive sensing circuit.

I actually squeezed this whole mess into the Casio keyboard itself!

The construction of the interface between the keyboard and the microcontroller proved to be a bit more labor-intensive. The sheer number of pins on the keyboard memory chip meant that a large Serial-In Parallel-Out (SIPO) latch circuit was needed to provide 40 outputs necessary from the microcontroller.

Testing

With a large circuit comes a large debugging task. Though the initial design was theoretically simple, all circuit projects have unforseen problems.

The first problem occured when attempting to power the interface from the keyboard. For some unknown reason, the keyboard would not start up correctly if it was powering the interface on its own power. Thus, the system needed to provide its own power, in the form of a USB cable from the PIC microcontroller board.

Beautification

With so much circuitry, it was difficult to find a way to integrate the project into the keyboard. Ideally, the entire interface save for the physical blackboard would be contained inside the keyboard. But this has two problems:

  1. Without designing a custom PCB, the circuit area of the entire project is too large to fit in the space available in the keyboard
  2. Though the initial hope was to power the interface from the keyboard, doing so introduced unforseen problems related to the capacitive sensor.

So, the PIC microcontroller and the sensing circuitry were fitted into an external housing, while the interface between the PIC and the keyboard were fit into the keyboard housing. A 5-pin ribbon cable is all that connects the two units.

Results

Through much testing and trial changing the code, an algorithm was devised that produced a fairly extensive range of glitch sounds when the blackboard was touched. Much of the trial had to do with how the pressure/velocity data from the blackboard was translated to the memory pins, and how fast the system updated its readings. But with suitable values, the system retains much of its “analog pin-swiping” sound it exhibited originally.

It was initially thought that the capacitive sensing circuit would produce unstable results, causing glitches in the system when not being touched and not activating when touched. However, the sensor is suprisingly stable. This has quite a bit to do with the “noise gate” coded into the software: No input was transferred to the memory pins until the pressure of the sensor climbed above a certain threshold.

Final Words

Though the actual design changed quite a bit from initial concept to final prototype, the end product was remarkably similar to what was envisioned. Next time around, more care needs to be taken to ensure all construction is resistant against damage. Things such as proper connectors, proper housing, and proper power need to be addressed in order to build a second, better prototype.

Investigation of LPC Synthesis Parameters

Friday, June 3rd, 2005

LPC Introduction

Linear Predictive Coding is a technique used to model speech and other similar systems. It has applications in the following areas:

  • Economic series modeling
  • Seismic exploration
  • Speech synthesis, coding, and recognition
  • Communications receivers

LPC is the process of predicting future samples in a sequence given a set of its N past values. Thus, for an LPC resynthesis of order N, the following equation is used:

To determine the proper set of coefficients “a”, we need to predict what values would produce a resynthesis closest to the original signal. This prediction can be accomplished by various means, but the minimization of the mean-square error of the prediction is a common method.

For speech coding and synthesis, a window of samples is analyzed according to the above equation. Upon calculating the linear predictor coefficients, the residual signal (error signal) epsilon is calculated. The coeffients are then saved and the next window is analyzed. Upon analyzing the entire signal, the LPC coefficients are saved for later resynthesis, or transmitted across a channel for remote resynthesis.

Because only coefficients are saved, the coded signal has a much lower bitrate and thus is useful for applications such as cellular phones, internet audio and voice prompting, where high bitrates are not available.

It is interesting to note that if the residual signal is added to the resynthesized signal, the original signal can be perfectly reconstructed. This is because the residual is by its very nature the difference of the input and resynthesized signals. Though at first this sounds like lossless compression, the residual has a bitrate equal to the original signal, so transmitting the residual does not result in any data compression. Usually, the residual is simply thrown out after analysis.

During signal resynthesis, the speech is modeled as a periodic impulse (glottal pulses) filtered through a vocal tract modeled by the linear prediction coefficients. Though this approximates the fundamentals of speech well, it does not model any of the complexities of the human voice, such as inharmonic frequencies and voiced/unvoiced-combination vocal utterances.

Implementation

To investigate LPC synthesis in detail, we will use a computer routine originally developed by Perry R. Cook. The code has been updated to use LibSndFile for sound file I/O, and signal statistics have been added in order to qualitatively measure resynthesis quality.

The source code has been compiled on Microsoft Visual C++ .Net, but should be compiler- and platform-independant. (Note: source requires LibSndFile to compile) Effect of LPC order on resynthesis quality

As the order of the LPC analysis increases, the resynthesized signal approximates the original signal more closely. We can here this from the following audio samples:

From these audio samples, it is clear that as order increases, the resynthesized LPC signal approximates the original file better. We can also ear the robotic nature of LPC, with its simplistic vocal tract model. However, we want to objectively measure the effect of LPC order on analysis quality.

We will look at plots of several signal statistics versus LPC order, to measure signal quality.

Average Block Error, a measure of the average mean square error of the synthesized output across all analysis blocks

Average Block Error vs. LPC Order

Input / Residual cross-correlation, a measure of how closely related the input and residual signals are to eachother.

Cross Correlation of Residual and Input vs. LPC Order

Input / Output cross-correlation, a measure of how closely related the input and output signals are to each other.

Cross Correlation of Output and Input vs. LPC Order

Mean square error of residual, a measure of how much deviation exists between the residual and input signals.

Mean Square Error of Residual vs. LPC Order

Mean square error of output, a measure of how much deviation exists between the output and input signals

Mean Square Error of Output vs. LPC Order

Power of residual, a measure of signal strength. Notice that residual power decreases as order increases, as more of the signal is coded in the predictor coefficients.

Power of Residual vs. LPC Order

Power of output, a measure of signal strength.

Power of Output vs. LPC Order

From the plots above, we can make several observations.

  1. The input/output correlation does not follow the general trend of the other statistics. This is to be expected, for LPC is not designed to reconstruct the time-domain signal. Thus, phase differences will lead to an output signal that is not correlated to its corresponding input.
  2. There exists a “knee” near order 9 and 10, above which the statistics do not change much. Using an order above roughly 10 does not affect signal resynthesis enough to justify the added computing time.
  3. The LPC model will never reach perfect reconstruction, even with a very high order. The reason is the related to the two points above: LPC is not designed to reconstruct a signal, but rather merely synthesize speech using a simplistic model of the vocal tract.
  4. Upon reflection, several of these statistics are meaningless for our explorations. Particularly, Input/Output Cross-Correlation, Output Mean Square Error, and Output Power are not relevant for our discussion, as LPC is not designed to recreate the time-domain version of a signal.

We can also view the time- and frequency-domain plots of the output signals for various LPC orders. We will see how LPC resynthesis of increasing order affects the quality of signal synthesis.

Animation of output in time-domain vs. LPC Order

A few observations can be made about this time-domain animation.

  1. The physical model of LPC speech synthesis is clearly seen in the time-domain. As previously discussed, the vocal tract is modeled by a pulse train filtered through a model of the vocal tract. The vocal tract model is governed by the LPC coeffients. As we see, as the LPC order increases, the time-domain response contains an increasing number of harmonics.
  2. We can see that even for order = 1, the base frequency of the synthesized signal matches the original. This is because accuracy of the base frequency is not dependant on LPC order. Interestingly, LPC resynthesis can use frequency parameters from other sources. Through this technique, interesting robotic effects, hybrid sounds, or whispers can be created.
  3. Though the time-domain signal resembles the original closer at higher orders, it does not exhibit some of the subtleties that exist in the original signal. These nuances are properties of the original signal that cannot be modeled through the LPC vocal tract model, and are thrown away with the residual.

Animation (not working) of output in freq-domain vs. LPC Order

From the above plot, we notice several things

  1. The base frequency and its harmonics of a resynthesis of any order are the same as the original signal. This is the effect of the pulse train used in the resynthesis.
  2. In the frequency domain, the LPC vocal tract model attempts to match the formant curve of the original signal. We can see that higher LPC orders produce responses closer to the original signal, since more poles are available to match the original response.
  3. After roughly order = 10, the frequency response does not change all that much. Thus, higher order LPC analyses can be considered a waste of processing power and bandwidth.

Stability of LPC coefficients

Careful observers might notice that for the time-domain and frequency-domain graphs above, orders N=2 and N=3 produce odd plots. Indeed, the Cook implementation used sometimes produces unstable responses.

Looking into the source of the unstable responses, we note that using autocorrelation to minimize the mean-square of the error-signal (which the Cook implementation uses) should result in guaranteed stability. However, further investigation reveals that precision and roundoff errors for coefficients near 0 or greater than 1 may cause the actual response to deviate from the required response, generating instabilities in the process.

These errors result in improper frequency and time responses for the orders in question, as seen in the above animations.

Concluding remarks

LPC Synthesis is a powerful technique for speech analysis, coding, and resynthesis. By investigating various stastics, parameters and outputs related to the technique, we can better understand the effects of LPC on an input signal.

We have shown that LPC is not designed for reconstructing audio signals, and as a result does not work well for non-speech signal coding. But to transmit intelligible audio across a low bandwidth channel, LPC is a very useful technique.

We ave shown that beyond 9 or 10 coefficients, the increase in quality of the LPC reconstruction does not justify the added computation. For this reason, LPC-10 (with -10 denoting the 10 prediction coefficients, and 180 samples per analysis frame) became an industry-standard codec for low-bandwidth speech transmission. We can calculate the estimated bitrate of the coded signal as follows:

  • Original signal: 8000 samples/second = 64000 bits/second for 8-bit audio
  • Coded LPC signal, order 10: 44.44 frames/sec * (8 bits/coefficient/frame * 10 coefficients + 8 bits/frame pitch + 8 bits/frame gain) = 4088 bits/second
  • The LPC-10 specification includes slight changes in the bit allocation per frame, as follows:
  • Coded LPC-10 signal: 44.44 frames/sec * (42 bits/frame for coefficients + 7 bits/frame pitch + 5 bits/frame for gain) = 2400 bits/second

It is quite evident from the above calculations that LPC coding produces an extraordinary bitrate compared with sampled audio, with a bitrate of roughly 3.5% of the original signal. Of course, the output will sound robotic and will be highly succeptible to other noise in the signal. But if transmitting intelligible speech reproduction is all that is needed, LPC is a wonderful tool.

References

  1. Gibson, Jerry D. Lecture on LPC analysis and coding, MAT 201A. UC Santa Barbara, 05/09/2005.
  2. Morgan, Nelson. Lecture slides on feature extraction, EECS 225D. UC Berkeley. http://www.icsi.berkeley.edu/eecs225d/spr05/slides/frontend_arch.ppt
  3. Robinson, Tony. Speech Vision Robots group website. http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html

MIDI 2 OpenGL Visualization

Tuesday, December 7th, 2004

Introduction

This program was designed as a demonstration of how to integrate PortMIDI with OpenGL 3D visualizations. It takes MIDI input from the default MIDI Input port, outputs that data to the default MIDI output, and translates the note and controller data into various OpenGL parameters. It serves as an example for implementing PortMIDI, and basic OpenGL concepts such as GLU Quadrics and lighting. The OpenGL code is heavily referenced from the great tutorials at NeHe Productions.

The APIs implemented are as follows:

Parameter Implementation

The program was designed to be used with an Oxygen8 keyboard controller, so the notes and parameters used are as follows:

Note C2 .. C4 Individual size of corresponding sphere
Controller
1
X-coord of spotlight
7
Y-coord of spotlight
11
Diffuse light: Red value
12
Diffuse light: Green value
13
Diffuse light: Blue value
15
Ambient light: Red value
16
Ambient light: Green value
17
Ambient light: Blue value

For Oxygen8 users, these parameters correspond to the Modulation wheel, the data slider, and the 3 right-most rotary knobs in both rows.

To Use

Run midi2opengl.exe . The program relies on your default MIDI input and output ports being the ones you intend for input and output, so other setups will require re-coding in the openMidi function. To trigger the spheres, press C2 .. C4 on your MIDI input controller. The lighting characteristics are modified by the controller parameters listed above.

As this is only a quick technology/interfacing demo, some re-coding might be necessary for this to run on your machine. As every MIDI I/O situation is different, you MAY need to change the IN and OUT parameters in openMidi to suit your specific setup. Run the test application packaged with PortMidi to determine the proper parameters to use with your system.

IMPORTANT NOTE! This program does not use threads for video and MIDI updates, and instead runs in an endless loop. Though this makes the code easy to read, it has the side effect of maxing your CPU at 100% usage. Take note of this fact when running the program – it won’t harm your computer, but it will heat up the processor quickly!

Requirements

A PC running Windows 98+, a sound card or external audio device with MIDI inputs and outputs

Download

Links

To Do

  • Change OpenGL and MIDI polling to thread-based, to prevent maxing out the processor
  • Add support for non-default MIDI devices
  • Add support for more MIDI parameters: OpenGL rotation control, others(?)
  • Change graphics to a time-based 3D “piano roll” that handles all note values