April 5, 2007

Understanding Spotlight and Searching Python and Ruby Files

Filed under: Technology — Cory @ 12:11 am

Tonight I spent a little time learning more about how Mac OS X’s Spotlight system works. Specifically, I wanted to know how it indexes a drive, where it stores the index, and how to trigger a re-index.

Apple has a page of Spotlight tips, but it doesn’t really give much information on how to manage the indexing process. This page offered some interesting insight into the md* commands, which I had also read about in this macosxhints.com article.

It turns out that there are two common ways to tell Spotlight to update its index.

The first way, and the Apple recommended way, is to drag your “Macintosh HD” icon into the box in the “Privacy” tab of the Spotlight preferences, and then drag it out again. By doing this you are telling Spotlight to ignore that drive, and it removes the index, which is stored in the root directory of the drive in a directory named .Spotlight-V100. When you drag the drive back out of the Privacy box, it begins to re-build the index. Apparently this can take upwards of an hour on an average Mac with an 80-100 GB drive.

The second, and apparently less predictable, way of forcing a re-index is to use the Spotlight command line tools. The following can be used to initiate a re-index from the root of the main volume (mdutil must be run as root):

sudo mdutil -E /

If you just want Spotlight to re-index a certain folder, you can use the mdimport command (doesn’t need to be run as root):

mdimport ~/Documents

Another interesting thing about Spotlight is that there are “importers” for certain types of files. These importers are used to extract metadata about certain types of files. For example, peeking into a PDF file is different than looking at an MP3, so Spotlight is designed to load lots of different importers to help it index your data. You can see a list of these imports by running the following:

mdimport -L

Also, you can see the default importers, and rearrange their ordering in search results, by going to the “Search Results” tab of the Spotlight preferences.

When I ran the above mdimport -L command I noticed that there were several custom importers on my system. I did a quick Google search (searching about searching, how meta!) to find out what other importers were out there and I found this page on Apple’s site. It turns out that people have written quite a few custom importers, including one to index Python files and another to index Ruby files.

The Python importer automatically begins indexing after you install it, so be aware that your processor may spike for 10 minutes or so after you have installed the importer. The Ruby one does not automatically index, but if you install it before the Python one, Ruby files will be indexed at that time.

Apparently I had never set up a default application for Python and Ruby files, because when they showed up in search results there was no little TextMate icon beside them. To correct this I just located and clicked on a Python file and a Ruby file, pressed Command-I, selected TextMate in the “open with” section, and clicked the “Change All” button to apply this to globally. Now when a Python or Ruby file shows up in search results, selecting it automatically launches TextMate.

There are a couple of other cool importers that I installed as well. I have a lot of zip, tar/gz and tar/bz2 files on my Mac (I keep everything I ever download), and I noticed that there are importers for zip files and tar files, so I went ahead and installed those too!

One other neat thing is that you can do Spotlight searches directly from the command line using the mdfind utility:

$ mdfind firefox
/Applications/Firefox-1.5.0.7.app
/Applications/Firefox.app
/Users/cwright/Library/Application Support/Firefox
...

Apparently Apple is improving Spotlight in Leopard by making it faster, allowing it to index and search network shares, and allowing more specific searches. Woot!

• • •

1 Comment »

  1. Lots of stuff i never knew… thanks cory!

    Comment by Todd — April 7, 2007 @ 3:56 pm

Comments RSS

Leave a comment

Powered by: WordPress • Template by: Priss