Additional Info

If debugging is the process of removing bugs, then programming must be the process of putting them in. (Edsger W. Dijkstra)

This post tagged:


Mess of Metadata

There was a long series of comments at the article about mdfind that got very confused talking about OS X metadata. I thought I'd try to straighten some of that out in a separate post - though honestly I'm still easily confused myself!

First, what metadata are we talking about? For an old Unix hand, the metadata is information stored in the inode: file size, permissions, pointers to datablocks, link counts.. that's traditional metadata.

However, there's more metadata today - not just in Unix systems, but especially in Mac OS X. There are extended permissions, acl's, xattributes, Spotlight related metadata.. it's very hard to ferret all this out of Google because similar terms are used for dissimilar features.

Macs had "resource forks" early on. OS X still has resource forks. but apparently Apple would like to move away from those. That's probably why things get so darn confusing: search for information on metadata and OS X and you'll find lots of pointers to things that talk about resource forks, but usually that's deprecated and doesn't usually apply to OS X.

Let's take Spotlight metadata first. These are specific keys that Spotlight indexes. For example, you can do things like this:

mdfind 'kMDItemFSSize > 20000000'.
mdfind 'kMDItemFinderComment == "script application wrapper"' 
mdfind  'kMDItemTextContent == "*Seneca*" && kMDItemFSName != "*emlx"'
mdfind  'kMDItemTextContent == "*Seneca*" && kMDItemContentType != ""'

How does Spotlight get the info to index? It asks an Spotlight Importer. This BASICS OF SPOTLIGHT page explains:

Once the Mac OS does kick-off the extraction of metadata from a file, it does so through a Spotlight Importer. Spotlight Importers are plug-ins for the Mac OS that a developer provides specifically for helping files created by their applications to be searchable within Spotlight. Spotlight crawls through its list of changed files, handing each one to the appropriate importer. The importers then read the files, compile a list of metadata, and then hand the metadata back to Spotlight. At this point, the changed file is available for searching within Spotlight.

OK, great, but where does the metadata that the importer supplies come from? Apparently, that's up to the developer. Apple's Extracting Metadata from Documents says:

Avoid the use of external files to store metadata content. All critical metadata should be in the same file as the data. The system store of metadata should be considered volatile.

I want to quibble a little: if it's stored in the data file, it's really not metadata, is it? But never mind. Some apps do it that way. For example, ID3 tags. But other apps do not. For example. In my ~/Library/Caches/Metadata I found some interesting stuff. *Some* apps store Spotlight metadata there. I found:

$ ls  ~/Library/Caches/Metadata 
Billings		Microsoft		Safari
Camino			Precipitate		com.evernote.Evernote

If I look in Billings, I find this:

                <string>Extreme rework</string>
                <string>Extreme rework</string>

But obviously not all apps store their Spotlight related metadata there. Entourage does, as seen in this HOW DOES ENTOURAGE WORK WITH SPOTLIGHT? bit:

When you enable Spotlight indexing within Entourage, a "cache" file is created for each item within your Entourage database. If you have 100,000 e-mail messages in your Entourage database, 100,000 cache files will be created. If you want to see the cache files, you can find them within your Library/Caches/Metadata/Microsoft folder.

Each cache file contains all the metadata that will be needed for indexing by Spotlight. All changes within Entourage are reflected to the cache files. Create a new item and a new cache file will be created. Updated an item and its cache file will update. Delete an item and its cache file will be deleted. With all these changes, Spotlight receives file change notifications and eventually will ask the modified cache files to go through the import process using the Entourage Spotlight Importer.

But there's no iTunes folder there..

There are also defaults. If I create a text file with "date > file", an "mdls" will show Spotlight keys:

kMDItemContentCreationDate     = 2009-04-12 12:07:02 -0400
kMDItemContentModificationDate = 2009-04-12 12:07:02 -0400
kMDItemContentType             = ""
kMDItemContentTypeTree         = (
kMDItemDisplayName             = "file"
kMDItemFSContentChangeDate     = 2009-04-12 12:07:02 -0400
kMDItemFSCreationDate          = 2009-04-12 12:07:02 -0400
kMDItemFSCreatorCode           = ""
kMDItemFSFinderFlags           = 0
kMDItemFSHasCustomIcon         = 0
kMDItemFSInvisible             = 0
kMDItemFSIsExtensionHidden     = 0
kMDItemFSIsStationery          = 0
kMDItemFSLabel                 = 0
kMDItemFSName                  = "file"
kMDItemFSNodeCount             = 0
kMDItemFSOwnerGroupID          = 501
kMDItemFSOwnerUserID           = 501
kMDItemFSSize                  = 29
kMDItemFSTypeCode              = ""
kMDItemKind                    = "Plain text"
kMDItemLastUsedDate            = 2009-04-12 12:07:02 -0400
kMDItemUsedDates               = (
    2009-04-12 00:00:00 -0400

Obviously the "date" command didn't create those. Spotlight won't even index that file (no extension), but it has some default keys just the same! See Spotlight, mdfind (Mac OS X Tiger searching) for more on that.

You can add metadata yourself and can modify one item of Spotlight's domain.

$ xattr -w mystuff "hello there" file
$ xattr -l file
mystuff: hello there

The only Spotlight related data you can modify is kMDItemFinderComment. You do that with GetInfo and after adding it, xattr shows this:

xattr -l file
0000   62 70 6C 69 73 74 30 30 5A 4D 79 20 43 6F 6D 6D    bplist00ZMy Comm
0010   65 6E 74 08 00 00 00 00 00 00 01 01 00 00 00 00    ent.............
0020   00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00    ................
0030   00 00 00 13                                        ....

mystuff: hello there

Note that this gives us the clue as to where the data was stored, but I don't find a file with that "" name. I do find:


But those aren't related.

So what do we know? Well, we know it's up to the application responsible for a file to provide importer code. It's up to the same app to decide where to store metadata. Obviously, that implies that for some data that would be the across all files of this type, there's no need to store it anywhere - the importer could generate the response when Spotlight asks.

That's as far as I've gone.. maybe someone else can add more.

Got something to add? Send me email.


Increase ad revenue 50-250% with Ezoic

More Articles by

Find me on Google+

© Anthony Lawrence

Mon Jun 1 16:42:53 2009: 6431   TomAndersen

Spotlight importers import metadata for one (or a group) of file types. But Apple has a system to import metadata on any file: if you set an xattr extended attribute with a plist that is not a dictionary, under a key like then in the spotlight database you will find an entry under 'YourKeyNameHere'. (you see them by doing an mdls on the file). We have developed OpenMeta to come up with a standard set of keys and safe procedures for setting this data. Apple itself uses this feature all over the place, with kMDItemWhereFroms, with time machine, and many other places.

By creating a spotlight plugin, a search for 'tag:superman' in the Spotlight UI will find all files with a array with 'Superman' as an array element.

See (link)

--Tom Andersen

Mon Jun 1 19:29:04 2009: 6433   TonyLawrence

Thanks for the info and link!

Kerio Samepage

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us