07.09.10

Help Fight Patent Bullying From Shazam — Spread This Code!

Posted in Free/Libre Software, Patents at 4:51 pm by Dr. Roy Schestowitz

Gorilla

Summary: This post looks at patent bullying against Free software and it calls for the spreading of source code which Shazam unlawfully tries to remove from the Internet

EARLIER TODAY we wrote about NetApp's threats against ZFS distributors. As one blogger put it:

Enterprise Strategy Group senior analyst Terri McClure wonders why NetApp didn’t hit Nexenta with the same letter since Nexenta supplies its ZFS software to multiple storage vendors.

“If NetApp did it would make sense – stop a number of vendors instead of just one. It certainly makes you wonder why they would single out Coraid, people could read into this that NetApp sees Coraid as a threat. Coraid’s NAS product is pretty new but the underlying platform has been on the market a while and is solid, at a really aggressive price point,” said McClure.

“[NetApp] just spent a couple of hundred dollars in lawyer’s fees and took a competitor out of the market. Quick and easy, but a little disappointing, too. At the end of the day, ZFS is open source, and while there is no way to predict how the settlement talks between Oracle and NetApp will turn out, you can’t really un-open source ZFS,” she said.

There’s still no word from NetApp on the matter.

The “patent troll, NTP, is back, buoyed dosh from RIM,” says Glyn Moody, who found this new article.

NTP, a patent-holding company best known for prying a settlement of more than $600 million from the maker of the BlackBerry, is now suing the other big names in the smartphone industry: Apple, Google, Microsoft, HTC, LG and Motorola, writes The New York Times’s Steve Lohr.

The suits, filed late Thursday afternoon in federal district court in Richmond, Va., charge that the cellphone e-mail systems of those companies are illegally using NTP’s patented technology.

We mentioned NTP before and so did Patent Troll Tracker. Speaking of trolls, earlier today we wrote about Shazam's patent bullying. That previous post gave just the gist of it and the discussion at Slashdot ought to say more. From the summary:

“The code wasn’t even released, and yet Roy van Rijn, a Music & Free Software enthusiast received a C&D from Landmark Digital Services, owners of Shazam, a music service that allows you to find a song, by listening to a part of it. And if that wasn’t enough, they want him to take down his blog post (Google Cache) explaining how he did it because it ‘may be viewed internationally. As a result, [it] may contribute to someone infringing our patents in any part of the world.’”

Jan Wildeboer calls it “Patent Infringement Madness” and another post Wildeboer says “is (a) a blog entry or (b) patent infringement? I say (a) Shazam says (b)”

Two readers urged us to make a mirror just in case (other people ought to mirror this too, in order to ensure that Shazam will lose hope of successfully censoring perfectly legal Dutch code).

Patents are supposed to encourage publication of ideas, not to suppress them. The following code is not in any way infringing Shazam copyrights.


Creating Shazam in Java

A couple of days ago I encountered this article: How Shazam Works

This got me interested in how a program like Shazam works… And more importantly, how hard is it to program something similar in Java?

About Shazam

Shazam is an application which you can use to analyse/match music. When you install it on your phone, and hold the microphone to some music for about 20 to 30 seconds, it will tell you which song it is.

When I first used it it gave me a magical feeling. “How did it do that!?”. And even today, after using it a lot, it still has a bit of magical feel to it.
Wouldn’t it be great if we can program something of our own that gives that same feeling? That was my goal for the past weekend.

Listen up..!

First things first, get the music sample to analyse we first need to listen to the microphone in our Java application…! This is something I hadn’t done yet in Java, so I had no idea how hard this was going to be.

But it turned out it was very easy:

1 final AudioFormat format = getFormat(); //Fill AudioFormat with the wanted settings
2 DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
3 final TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
4 line.open(format);
5 line.start();

Now we can read the data from the TargetDataLine just like a normal InputStream:

01 // In another thread I start:
02   
03 OutputStream out = new ByteArrayOutputStream();
04 running = true;
05   
06 try {
07     while (running) {
08         int count = line.read(buffer, 0, buffer.length);
09         if (count > 0) {
10             out.write(buffer, 0, count);
11         }
12     }
13     out.close();
14 } catch (IOException e) {
15     System.err.println("I/O problems: " + e);
16     System.exit(-1);
17 }

Using this method it is easy to open the microphone and record all the sounds! The AudioFormat I’m currently using is:

1 private AudioFormat getFormat() {
2     float sampleRate = 44100;
3     int sampleSizeInBits = 8;
4     int channels = 1; //mono
5     boolean signed = true;
6     boolean bigEndian = true;
7     return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian);
8 }

So, now we have the recorded data in a ByteArrayOutputStream, great! Step 1 complete.

Microphone data

The next challenge is analyzing the data, when I outputted the data I received in my byte array I got a long list of numbers, like this:

01 0
02 0
03 1
04 2
05 4
06 7
07 6
08 3
09 -1
10 -2
11 -4
12 -2
13 -5
14 -7
15 -8
16 (etc)

Erhm… yes? This is sound?

To see if the data could be visualized I took the output and placed it in Open Office to generate a line graph:

Ah yes! This kind of looks like ’sound’. It looks like what you see when using for example Windows Sound Recorder.

This data is actually known as time domain. But these numbers are currently basically useless to us… if you read the above article on how Shazam works you’ll read that they use a spectrum analysis instead of direct time domain data.
So the next big question is: How do we transform the current data into a spectrum analysis?

Discrete Fourier transform

To turn our data into usable data we need to apply the so called Discrete Fourier Transformation. This turns the data from time domain into frequency domain.
There is just one problem, if you transform the data into the frequency domain you loose every bit of information regarding time. So you’ll know what the magnitude of all the frequencies are, but you have no idea when they appear.

To solve this we need a sliding window. We take chunks of data (in my case 4096 bytes of data) and transform just this bit of information. Then we know the magnitude of all frequencies that occur during just these 4096 bytes.

Implementing this

Instead of worrying about the Fourier Transformation I googled a bit and found code for the so called FFT (Fast Fourier Transformation). I’m calling this code with the chunks:

01 byte audio[] = out.toByteArray();
02   
03 final int totalSize = audio.length;
04   
05 int amountPossible = totalSize/Harvester.CHUNK_SIZE;
06   
07 //When turning into frequency domain we'll need complex numbers:
08 Complex[][] results = new Complex[amountPossible][];
09   
10 //For all the chunks:
11 for(int times = 0;times < amountPossible; times++) {
12     Complex[] complex = new Complex[Harvester.CHUNK_SIZE];
13     for(int i = 0;i < Harvester.CHUNK_SIZE;i++) {
14         //Put the time domain data into a complex number with imaginary part as 0:
15         complex[i] = new Complex(audio[(times*Harvester.CHUNK_SIZE)+i], 0);
16     }
17     //Perform FFT analysis on the chunk:
18     results[times] = FFT.fft(complex);
19 }
20   
21 //Done!

Now we have a double array containing all chunks as Complex[]. This array contains data about all frequencies. To visualize this data I decided to implement a full spectrum analyzer (just to make sure I got the math right).
To show the data I hacked this together:

01 for(int i = 0; i < results.length; i++) {
02     int freq = 1;
03     for(int line = 1; line < size; line++) {
04         // To get the magnitude of the sound at a given frequency slice
05         // get the abs() from the complex number.
06         // In this case I use Math.log to get a more managable number (used for color)
07         double magnitude = Math.log(results[i][freq].abs()+1);
08   
09         // The more blue in the color the more intensity for a given frequency point:
10         g2d.setColor(new Color(0,(int)magnitude*10,(int)magnitude*20));
11         // Fill:
12         g2d.fillRect(i*blockSizeX, (size-line)*blockSizeY,blockSizeX,blockSizeY);
13   
14         // I used a improviced logarithmic scale and normal scale:
15         if (logModeEnabled && (Math.log10(line) * Math.log10(line)) > 1) {
16             freq += (int) (Math.log10(line) * Math.log10(line));
17         } else {
18             freq++;
19         }
20     }
21 }

Introducing, Aphex Twin

This seems a bit of OT (off-topic), but I’d like to tell you about a electronic musician called Aphex Twin (Richard David James). He makes crazy electronic music… but some songs have an interesting feature. His biggest hit for example, Windowlicker has a spectrogram image in it.
If you look at the song as spectral image it shows a nice spiral. Another song, called ‘Mathematical Equation’ shows the face of Twin! More information can be found here: Bastwood – Aphex Twin’s face.

When running this song against my spectral analyzer I get the following result:

Not perfect, but it seems to be Twin’s face!

Determining the key music points

The next step in Shazam’s algorithm is to determine some key points in the song, save those points as a hash and then try to match on them against their database of over 8 million songs. This is done for speed, the lookup of a hash is O(1) speed. That explains a lot of the awesome performance of Shazam!

Because I wanted to have everything working in one weekend (this is my maximum attention span sadly enough, then I need a new project to work on) I kept my algorithm as simple as possible. And to my surprise it worked.

For each line the in spectrum analysis I take the points with the highest magnitude from certain ranges. In my case: 40-80, 80-120, 120-180, 180-300.

01 //For every line of data:
02   
03 for (int freq = LOWER_LIMIT; freq < UPPER_LIMIT-1; freq++) {
04     //Get the magnitude:
05     double mag = Math.log(results[freq].abs() + 1);
06   
07     //Find out which range we are in:
08     int index = getIndex(freq);
09   
10     //Save the highest magnitude and corresponding frequency:
11     if (mag > highscores[index]) {
12         highscores[index] = mag;
13         recordPoints[index] = freq;
14     }
15 }
16   
17 //Write the points to a file:
18 for (int i = 0; i < AMOUNT_OF_POINTS; i++) {
19     fw.append(recordPoints[i] + "\t");
20 }
21 fw.append("\n");
22   
23 // ... snip ...
24   
25 public static final int[] RANGE = new int[] {40,80,120,180, UPPER_LIMIT+1};
26   
27 //Find out in which range
28 public static int getIndex(int freq) {
29     int i = 0;
30     while(RANGE[i] < freq) i++;
31         return i;
32     }
33 }

When we record a song now, we get a list of numbers such as:

01 33  56  99  121 195
02 30  41  84  146 199
03 33  51  99  133 183
04 33  47  94  137 193
05 32  41  106 161 191
06 33  76  95  123 185
07 40  68  110 134 232
08 30  62  88  125 194
09 34  57  83  121 182
10 34  42  89  123 182
11 33  56  99  121 195
12 30  41  84  146 199
13 33  51  99  133 183
14 33  47  94  137 193
15 32  41  106 161 191
16 33  76  95  123 185

If I record a song and look at it visually it looks like this:


(all the red dots are ‘important points’)

Indexing my own music

With this algorithm in place I decided to index all my 3000 songs. Instead of using the microphone you can just open mp3 files, convert them to the correct format, and read them the same way we did with the microphone, using an AudioInputStream. Converting stereo music into mono-channel audio was a bit trickier then I hoped. Examples can be found online (requires a bit too much code to paste here) have to change the sampling a bit.

Matching!

The most important part of the program is the matching process. Reading Shazams paper they use hashing to get matches and the decide which song was the best match.

Instead of using difficult point-groupings in time I decided to use a line of our data (for example “33, 47, 94, 137″) as one hash: 1370944733
(in my tests using 3 or 4 points works best, but tweaking is difficult, I need to re-index my mp3 every time!)

Example hash-code using 4 points per line:

01 //Using a little bit of error-correction, damping
02 private static final int FUZ_FACTOR = 2;
03   
04 private long hash(String line) {
05     String[] p = line.split("\t");
06     long p1 = Long.parseLong(p[0]);
07     long p2 = Long.parseLong(p[1]);
08     long p3 = Long.parseLong(p[2]);
09     long p4 = Long.parseLong(p[3]);
10     return  (p4-(p4%FUZ_FACTOR)) * 100000000 + (p3-(p3%FUZ_FACTOR)) * 100000 + (p2-(p2%FUZ_FACTOR)) * 100 + (p1-(p1%FUZ_FACTOR));
11 }

Now I create two data sets:

- A list of songs, List<String> (List index is Song-ID, String is songname)
- Database of hashes: Map<Long, List<DataPoint>>

The long in the database of hashes represents the hash itself, and it has a bucket of DataPoints.

A DataPoint looks like:

01 private class DataPoint {
02   
03     private int time;
04     private int songId;
05   
06     public DataPoint(int songId, int time) {
07         this.songId = songId;
08         this.time = time;
09     }
10   
11     public int getTime() {
12         return time;
13     }
14     public int getSongId() {
15         return songId;
16     }
17 }

Now we already have everything in place to do a lookup. First I read all the songs and generate hashes for each point of data. This is put into the hash-database.
The second step is reading the data of the song we need to match. These hashes are retrieved and we look at the matching datapoints.

There is just one problem, for each hash there are some hits, but how do we determine which song is the correct song..? Looking at the amount of matches? No, this doesn’t work…
The most important thing is timing. We must overlap the timing…! But how can we do this if we don’t know where we are in the song? After all, we could just as easily have recorded the final chords of the song.

By looking at the data I discovered something interesting, because we have the following data:

- A hash of the recording
- A matching hash of the possible match
- A song ID of the possible match
- The current time in our own recording
- The time of the hash in the possible match

Now we can substract the current time in our recording (for example, line 34) with the time of the hash-match (for example, line 1352). This difference is stored together with the song ID. Because this offset, this difference, tells us where we possibly could be in the song.
When we have gone through all the hashes from our recording we are left with a lot of song id’s and offsets. The cool thing is, if you have a lot of hashes with matching offsets, you’ve found your song.

The results

For example, when listening to The Kooks – Match Box for just 20 seconds, this is the output of my program:

01 Done loading: 2921 songs
02   
03 Start matching song...
04   
05 Top 20 matches:
06   
07 01: 08_the_kooks_-_match_box.mp3 with 16 matches.
08 02: 04 Racoon - Smoothly.mp3 with 8 matches.
09 03: 05 Röyksopp - Poor Leno.mp3 with 7 matches.
10 04: 07_athlete_-_yesterday_threw_everyting_a_me.mp3 with 7 matches.
11 05: Flogging Molly - WMH - Dont Let Me Dia Still Wonderin.mp3 with 7 matches.
12 06: coldplay - 04 - sparks.mp3 with 7 matches.
13 07: Coldplay - Help Is Round The Corner (yellow b-side).mp3 with 7 matches.
14 08: the arcade fire - 09 - rebellion (lies).mp3 with 7 matches.
15 09: 01-coldplay-_clocks.mp3 with 6 matches.
16 10: 02 Scared Tonight.mp3 with 6 matches.
17 11: 02-radiohead-pyramid_song-ksi.mp3 with 6 matches.
18 12: 03 Shadows Fall.mp3 with 6 matches.
19 13: 04 Röyksopp - In Space.mp3 with 6 matches.
20 14: 04 Track04.mp3 with 6 matches.
21 15: 05 - Dress Up In You.mp3 with 6 matches.
22 16: 05 Supergrass - Can't Get Up.mp3 with 6 matches.
23 17: 05 Track05.mp3 with 6 matches.
24 18: 05The Fox In The Snow.mp3 with 6 matches.
25 19: 05_athlete_-_wires.mp3 with 6 matches.
26 20: 06 Racoon - Feel Like Flying.mp3 with 6 matches.
27   
28 Matching took: 259 ms
29   
30 Final prediction: 08_the_kooks_-_match_box.mp3.song with 16 matches.

It works!!

Listening for 20 seconds it can match almost all the songs I have. And even this live recording of the Editors could be matched to the correct song after listening 40 seconds!

Again it feels like magic! :-)

Currently, the code isn’t in a releasable state and it doesn’t work perfectly. It has been a pure weekend-hack, more like a proof-of-concept / algorithm exploration.

Maybe, if enough people ask about it, I’ll clean it up and release it somewhere. Or turn it into a huge online empire like Shazam… who knows!

Share in other sites/networks: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • email

This post is also available in Gemini over at:

gemini://gemini.techrights.org/2010/07/09/speaking-with-code/

If you liked this post, consider subscribing to the RSS feed or join us now at the IRC channels.

Pages that cross-reference this one

2 Comments

  1. twitter said,

    July 9, 2010 at 6:34 pm

    Gravatar

    Shazam must have lost their minds to bully this guy. They picked the wrong victim for many reasons and have really harmed the reputation of software patents. He’s outside US law, so they can’t really hope to enforce things and everyone will side with the victim. If a newbie java coder can come up with a working implementation in a weekend, is there anyone who would consider the thing an invention? The blog post eliminates any fog people might have with the legal language of the patent itself to reveal a simple method of applying math. It’s hard to imagine a worse case for big dumb companies to use, but software patents were bound to reach this level eventually.

    The material is a great example of something that should not deserve a patent that might confuse people who don’t know better. The math used is complex (pun intended) and what is patented is a method of applying it. This is clever but it’s not an invention it’s a method that does not deserve patent protection any more than a bubble sort or a method to alphabetize paper files. Even people in the US understand this but software patents don’t come any better.

    Thanks for publishing this, it shines a bright light on what software patents are all about. Ownership of business methods and monopolies through judicial extortion. To make that kind of monopoly work, big dumb companies will have to be watching all of us all the time, much as publishers do to guard their media monopolies.

    Dr. Roy Schestowitz Reply:

    The EFF has just said that this guy is safe in Holland (another reason for programmers to leave the US post Bilski) and just to clarity, as I received some confused feedback in identi.ca, “Landmark Digital Services, LLC” is BMI’s (think about RIAA) shell/troll for using Shazam “technology” (euphemism for software patents).

    I will write more about this tomorrow.

What Else is New


  1. Freenode is IRC... in Collapse

    Freenode is now down to just 13,194 online users, which makes it the 6th biggest IRC network. Months ago it was #1 with almost 6 times as many users as those below it. The graph above shows what the latest blunder has done (another massive drop in less than a week, with a poem and the all-time chart at the very bottom).



  2. Barrier and Synergy Can Work Together, Connecting Lots of Different Machines

    Barrier and Synergy can be configured to work properly in conjunction, though only provided different port numbers (non-default) are specified; in my current setup I have two computers to my right, working over Barrier, and two older ones on the left, working over Synergy; the video explains the setup and the underlying concepts



  3. Links 2/8/2021: Open Science in France and Zoom Pays to Settle Privacy Violations

    Links for the day



  4. It Almost Feels Like Battistelli Still Runs the EPO (by Extension/Proxy)

    The "Mafia" that destroyed the EPO is still being put in charge and is using the EPO for shameless self-promotion; it is never being held accountable, not even when courts demand remediatory action and staff seeks reparations



  5. [Meme] Vichyite Battistelli Committed Crimes and His Buddy António Snubs Courts That Confirm These Are Crimes

    Staff of the EPO is coming to realise (or reaching acceptance of the fact) that the spirit of Battistelli — not just people he left in charge of the EPO — dooms the Office and there’s no way out of this mess



  6. Links 2/8/2021: Linux 5.14 RC4 and 20% Growth in Steam

    Links for the day



  7. IRC Proceedings: Sunday, August 01, 2021

    IRC logs for Sunday, August 01, 2021



  8. Links 1/8/2021: LibreOffice 7.2 RC2 and Lakka 3.3

    Links for the day



  9. Was Microsoft Ever First in the Market?

    Confronting the false belief that Microsoft ever innovates anything of significance or is "first" in some market/s



  10. Links 1/8/2021: 4MLinux 37.0, IBM Fluff, and USMCA Update

    Links for the day



  11. Microsoft Knows That When Shareholders Realise Azure Has Failed the Whole Boat Will Sink

    The paranoia at Microsoft is well justified; they've been lying to shareholders to inflate share prices and they don't really deliver the goods, just false hopes and unfulfilled promises



  12. [Meme] Nobody and Nothing Harms Europe's Reputation Like the EPO Does

    Europe’s second-largest institution, the EPO, has caused severe harm/damage to Europe’s economy and reputation; its attacks on the courts and on justice itself (even on constitutions in the case of UPC — another attempt to override the law and introduce European software patents) won’t be easily forgotten; SUEPO has meanwhile (on Saturday, link at the bottom in German) reminded people that Benoît Battistelli and António Campinos have driven away the EPO’s most valuable workers or moral compass



  13. IRC Proceedings: Saturday, July 31, 2021

    IRC logs for Saturday, July 31, 2021



  14. [Meme] When it Comes to Server Share, Microsoft Azure is Minuscule (But Faking It)

    Don't believe the lies told by Microsoft's charlatans and frauds; Azure has been a total failure and that's why there are layoffs as well



  15. [Meme] Mozilla Has Turned From Technical to Marketing

    Way back, long before Mozilla and Firefox got hijacked by politics (turning Mozilla into a VPN reseller that lies about its stance on privacy), geeks were driving the company, not corporate lawyers and spying/marketing people



  16. Over 1,500 (Known/Unorphaned) Gemini Capsules and Over 160,000 Page Requests in gemini.techrights.org During July

    Techrights is expanding at gemini:// (Gemini space) and over 1,500 capsules are reported to have been found (less than 4 months ago it was about 1,000)



  17. Links 31/7/2021: Kernel Additions and Linux Mint 20.3 Release Date

    Links for the day



  18. Microsoft Azure Stagnating

    Reprinted with permission from Mitchel Lewis, former Microsoft employee



  19. For 17 Days (and Counting) António Campinos Has Failed to Respond to Call for Compliance With the Law

    Team Campinos has been so arrogant and so evasive that there’s no indication (yet) that it will follow court orders (Willy ‘Guillaume’ Minnoye openly bragged about ignoring court orders and he's still cheering for the EPO's abuses); therefore, staff of the EPO takes collective action



  20. Raw: Elodie Bergot Breaking the Law by Threatening Against the Exercise of Fundamental Rights

    Over the years we saw a number of rude letters from Elodie Bergot, the grossly under-qualified spouse of a friend of Vichyite Benoît Battistelli; most of these we never published (we already have these and can always publish if the need arises), but those paranoid and insecure “Mafia”-like ‘cabal’ need to be exposed for the mobsters they are; for nearly a decade they’ve illegally bullied EPO staff in clear violation of the law (and for over 3 years António Campinos has kept those bullies on board); why does Europe do nothing and why is it never holding high-profile abusers accountable (only low-level facilitators)? Is it because the EU too is being infiltrated by them?



  21. Linspire Should Be Avoided in 2021 Just Like It Was Avoided 14 Years Ago

    The brand "Linspire" was brought back, but the agenda seems to be more or less the same, namely pushing proprietary software and serving Microsoft's commercial agenda (in 'Linux' clothing)



  22. The Death of Freenode Would Be Freenode's Own Fault

    Freenode is going dark and now it’s asking people to create accounts at IRC.com (just to get back into the network that they may have already occupied for decades) as if Freenode owns “IRC” as a whole



  23. Links 31/7/2021: KDE Progress and Activision Catastrophe

    Links for the day



  24. IRC Proceedings: Friday, July 30, 2021

    IRC logs for Friday, July 30, 2021



  25. The Smartest Meter of All

    Yesterday a lady came over to take our power readings (electric/gas meter); secure these people's jobs as they help protect people's privacy (dignity) at home



  26. [Meme] A Web of False Dichotomies

    A reminder that Techrights is fully available (all blog posts and wiki pages) in gemini://



  27. Freenode Shrinks by Another Quarter and Gemini Continues to Grow (For Techrights at Least)

    Freenode continues to perish faster than we've imagined; it's a good thing that we've had contingencies set up; regarding the monopolised and increasingly centralised Web, we're still making baby steps towards weaning ourselves off it



  28. Links 31/7/2021: Wine 6.14 and Chrome 93 Beta

    Links for the day



  29. European Media Does Not Care About Europe's Second-Largest Institution Crushing Basic Laws and Fundamental Rights

    New video about the latest publication from SUEPO (the EPO’s staff union); it was published yesterday, seeing that the “Mafia” (what EPO staff actually calls the management!) hasn’t done anything to comply with a wide-ranging set of court rulings from ILO-AT; why has the media said nothing about this and what does that say about today’s media? The material is all in the public domain, in widely understood languages, and SUEPO spoke about it more than 3 weeks ago.



  30. Links 30/7/2021: Distro Comparisons and Tootle Introduced

    Links for the day


RSS 64x64RSS Feed: subscribe to the RSS feed for regular updates

Home iconSite Wiki: You can improve this site by helping the extension of the site's content

Home iconSite Home: Background about the site and some key features in the front page

Chat iconIRC Channel: Come and chat with us in real time

Recent Posts