Patch: Allow id3v2(1) to Include Colons in Comment Fields

After a disagreement with Rhythmbox about how ID3 comments on my music collection should be handled, I went looking for a more fine-grained tool to manipulate them. A quick browse through aptitude later, I chose a tool named ‘id3v2’, a frustratingly un-googleable name. The official distribution is available from SourceForge, and I’m using the Debian package locally. The tool worked well, but it was unable to properly handle comment text which contained a colon (:).

What’s the problem?

Here’s the syntax for adding a comment to an MP3 using id3v2:

id3v2 -c DESCRIPTION:COMMENT:LANGUAGE file.mp3

Here’s that command run on a real file, and the resulting comment inserted into the file. The “LANGUAGE” field is always truncated to 3 characters, so I’m switching to “LNG”.

$ id3v2 -c DESCRIPTION:COMMENT:LNG t.mp3
$ id3v2 -l t.mp3 | grep ^COMM
COMM (Comments): (DESCRIPTION)[LNG]: COMMENT

Now here’s a similar set of commands showing various attempts to use colon(s) in the tag, including an example that’s very close to my actual use case. Note that I am running id3v2 -r COMM t.mp3 between these commands to remove the existing comments, I’m just not showing it since it doesn’t add anything to the discussion.

$ id3v2 -c DESCRIPTION:'COM: MENT':LNG t.mp3 
$ id3v2 -l t.mp3 | grep ^COMM
COMM (Comments): (DESCRIPTION)[ ME]: COM


$ id3v2 -c DESCRIPTION:'COM\: MENT':LNG t.mp3 
$ id3v2 -l t.mp3 | grep ^COMM
COMM (Comments): (DESCRIPTION)[ ME]: COM\

$ id3v2 -c '':'From\: CD, 20120717; TAG-GOOD; ':'' t.mp3 
$ id3v2 -l t.mp3 | grep ^COMM
COMM (Comments): ()[ CD]: From\

What’s the history?

Unfortunately I couldn’t find any work around for this. The code is using strchr(3) to find colons in the argument to -c and there’s no checking for any sort of escape character. So the bad news is that there’s no toehold in the compiled binary to try and fudge it.

The good news is that it isn’t extremely hard to write some code that can check for escapes (\:) and allow them to be text rather than field delimiters.

The id3v2 package is not actively maintained. It has an owner, and I wouldn’t say the project is abandoned, but there’s not very much activity. This has been a bug for a while and been noted in several places, but it’s never had an official fix. Some bugs are logged against the lyrics field instead of the comment field, but they actually use the same code path internally, so it’s the same bug. There are three related bugs in the SourceForge project tracker (IDs 907469 (2004), 1588861 (2006), and 3033559 (2010)), and there’s a Debian bug also outstanding. The Debian bug references a patch on the SourceForge site, but the submitter never tried the patch or responded to the Debian maintainer’s request for feedback.

What’s the solution?

I did look at the SourceForge patch, but it’s not especially well written. It doesn’t handle colons in all fields, and it doesn’t remove the escaping-‘\’ from the text.

It’s a bit confusing figuring out which code base to start with when modifying id3v2. The SourceForge repo is presumably authoritative, but the project site itself is a ghost town. There are multiple id3v2 forks on github, including 2 that appear to be owned by the SourceForge project admins (myers and nagilo). However, all of those forks appear to be based off of the 0.1.11 release, and 0.1.12 is current. Also, the two repos owned by the SF admins were old compared to some other forks. When I was researching all of this last night the github network graph for id3v2 wouldn’t load, making this harder than it should have been.

I decided to go with the group that seemed to be responding to issues with id3v2, Debian, and patch their version and submit the patch to them. I did so, resulting in Debian bug 681847. Today while nosing around again I realized that github user cockroach is actually Stefan Ott, the Debian package maintainer. Therefore I also forked his repo, pushed my patch, and sent him a pull request. The Debian bug and the github pull request were probably redundant, but I like having the patch in guthub and I missed that cockroach was Stefan Ott last night.

Finally, I’m putting the patch inline here for (hopefully) easy Googling.

--- id3v2-0.1.12.dist/id3v2.cpp	2012-07-17 00:25:08.000000000 -0400
+++ id3v2-0.1.12/id3v2.cpp	2012-07-17 00:33:28.000000000 -0400
@@ -512,29 +512,35 @@
         case ID3FID_COMMENT:
         case ID3FID_UNSYNCEDLYRICS:
         {
-          // split the string at the ':' remember if no : then leave
-          // descrip/lang empty
-          char *text;
-          text = strchr(frameList[ii].data, ':');
-          if (text == NULL) 
-          {
-            myFrame->Field(ID3FN_TEXT) = frameList[ii].data;
-          } else {
-         	*text = '\0';
-          	text++;
-          	char *lang;
-          	lang = strchr(text, ':');
-          	if (lang == NULL) 
-          	{
-          	  myFrame->Field(ID3FN_DESCRIPTION) = frameList[ii].data;
-          	  myFrame->Field(ID3FN_TEXT) = text;
-          	} else {
-          	  *lang = '\0';
-          	  lang++;
-          	  myFrame->Field(ID3FN_DESCRIPTION) = frameList[ii].data;
-              myFrame->Field(ID3FN_TEXT) = text;
-              myFrame->Field(ID3FN_LANGUAGE) = lang;
+          // split the string at the ':' (ignoring "\:" and splitting into no more than 3 pieces
+          // remember if no ':' then leave descrip/lang empty
+          char *pieces[3] = { frameList[ii].data, NULL, NULL };
+          if (strchr(frameList[ii].data, ':')) {
+            size_t read_at = 0, inst_at = 0, piece = 0;
+            unsigned int olen = strlen(frameList[ii].data);
+            for (read_at = 0; read_at < olen; read_at++, inst_at++) {
+              if (frameList[ii].data[read_at] == '\\' && frameList[ii].data[read_at+1] == ':') {
+                read_at++;
+              }
+              else if (frameList[ii].data[read_at] == ':' && piece < 2) {
+                frameList[ii].data[read_at] = '\0';
+                pieces[++piece] = frameList[ii].data + inst_at + 1;
+              }
+              frameList[ii].data[inst_at] = frameList[ii].data[read_at];
             }
+            frameList[ii].data[inst_at] = '\0';
+          }
+          // fprintf(stdout, "1: %s, 2: %s, 3: %s\n", pieces[0], pieces[1], pieces[2]);
+          
+          if (pieces[1] == NULL) {
+            myFrame->Field(ID3FN_TEXT) = pieces[0];
+          } else if (pieces[2] == NULL) {
+            myFrame->Field(ID3FN_DESCRIPTION) = pieces[0];
+            myFrame->Field(ID3FN_TEXT) = pieces[1];
+          } else {
+            myFrame->Field(ID3FN_DESCRIPTION) = pieces[0];
+            myFrame->Field(ID3FN_TEXT) = pieces[1];
+            myFrame->Field(ID3FN_LANGUAGE) = pieces[2];
           }
           /* debug
           std::cout << ID3_GetString(myFrame, ID3FN_DESCRIPTION) << std::endl

1 thought on “Patch: Allow id3v2(1) to Include Colons in Comment Fields

Leave a Reply

Your email address will not be published. Required fields are marked *