There was a great party at Google IO last year, with lots of loud music, beer and, of course, with lots of new gadgets and smartphones around. For us, it was also a day when an idea of an application transformed into a set of simple requirements.
Probably, everyone has seen a device with a simple but entertaining functionality: transform audio into color. It has multiple names, but most people know it as “Disco lights”. It does not have any useful functions except one single cool thing – amaze people with the graphical representation of a music they listen.
Thus, the only requirement was: let’s have those disco lights on an android phone. It will record the audio from the mic, and draw something exciting on the screen, synchronously with the audio rhythm, notes and loudness.
Quick search in the market indicated nothing similar to this, so the idea made it to the actual application which is now published.
If you are still reading this, here are a few technical details about the implementation.
As you probably know, there are basically two common approaches to the audio-to-light converters:
1. A set of bandpass filters.
2. Signal-to-spectrum conversion via Fast Fourier Transform.
The initial implementation was using the FFT approach, as a quickest way to have a working app. Later on we did a few experiments with the bandpass filters as well, however it turned out to be that good FIR or IIR filters require too much computation power, especially if you want many frequency bands. Thus, it was decided to stick to the FFT solution and optimize it as much as possible.
A few words about the audio recording. Android devices support that with AudioRecord class. It gives PCM (wave) output @ 44100 Hz (the only standard frequency for every android phone), which is more than enough for our goal. In fact, it would be even too much, since this sampling frequency allows working with audio of up to 20 KHz and we will never be able to reach that level of quality through the microphone embedded in most smartphones.
Besides, the usual music spectrum (from rap to classics) fits well within much narrower band, something like 50 to 5000 Hz, so the sampling frequency of 44100 is not necessary. Fortunately, it can be easily decimated at runtime into half or quarter of that without much computation (just take every second of forth sample and that’s it).
The bad news is that the audio recording requires the platform to collect some certain amount of samples before passing it to the application, which, in simple words, means delays between the music you hear and visual effect you see. This delay seems to be different among android phone models, but keeping the recording buffer small helps reduce that delay to a reasonable minimum (100 ms is quite satisfactory).
The most interesting part of this is the performance and battery life. In order to make an application which would work on virtually any android phone, the weakiest phone needed to be considered. The Google Ion (HTC Saphire) with Qualcomm MSM 7201 @ 528 Mhz is probably the best representative of this category at the moment, so it was taken as the target device for development.
In order to evaluate the CPU speed, we did a few simple tests which indicated the difference between floating point and integer mathematics, as well as between java and native calculations.
Long story short, here are the data we collected. 100 millions multiplications. The speed that we observed was as follows:
Saphire’s CPU supports ARMv6 instruction set, however NDK compiles the code to ARMv5 and the floating point math is implemented in software. There are various sources in the Internet saying different things about floating point support in ARMv6 chipsets, but the most important knowledge from all that is: there are plenty of ARMv6 phones which do not have HW acceleration of the floating point math. Therefore, the winner is native long long int. Obviously, we will have to use NDK for native computations which is available since Cupcake. However, to make things more interesting, it was decided to use Donut as the oldest platform version, because in Donut we also have some OpenGL ES support natively which gave us greater choice of visual effect implementation.
Another thing worth mentioning is sqrt() function. Initially we have made our own (it was supposed to be quick as hell) implementation of sqrt(). It turned out that the standard implementation of this function is approximately 10x faster than Newton algorithm written in C. Seems like it is quite optimized for ARM CPU which is great. So the lesson basically is: do not try to make something faster unless you understand how fast you can make it 🙂
By the way, here is a great article about ARM computations of sqrt() function: http://www.finesse.demon.co.uk/steven/sqrt.html
All in all, here are the options which were chosen for the final version of the app:
1. FFT size is 512 samples, collected at 44100 Hz and downsampled to 11025 Hz, about 20 FFTs per second. This gives a signal spectrum for up to 5 KHz.
2. Effects are rendered on the SurfaceView at the same ~20 frames/sec
3. Frequency band is split into visual channels on the musical note basis (i.e. non-linear). Each channel measures overall audio power in its frequency band and renders a bar or circle with the appropriate color brightness.
This set gives a nice result from the entertainment point of view, and uses as little battery as possible.
That’s basically it. The only question which might be bothering you is why did we need OpenGL? Well, OpenGL is necessary for more visual effects that will be coming soon to the Disco Lights. 🙂