An Ode to Dropbox

One of my favorite new things in the world is Dropbox, an online syncing utility for all your stuff. If you use multiple computers (Mac, Windows, or Linux), Dropbox will automatically sync your documents, photos, music, etc. between those computers completely behind the scenes. The best thing about Dropbox, as noted two days ago by Rands in Repose, is that it is “dumb”. It does just what you would expect and doesn’t try to outsmart you. If you accidentally delete something, it offers the opportunity to restore it (or any earlier version) via the web interface. Lost data is a virtual impossibility. Even if you only use one computer, the opportunity to have access to all your data via the (secure) web interface is worth the price of admission — it keeps track of all of your changes, no matter how minor. And for me the $99 a year upgrade to 50gb is a no-brainer. I’ve already filled up 75% and am hoping they will offer more space in the near future. I’ve been using it without a hiccup for a couple of months now. Rands’ insightful and laudatory post reminded me I’ve been meaning to praise Dropbox myself.

NB: Michael Tsai’s post yesterday about Dropbox’s lack of support for upper-level file system features in Mac OSX seems to be dealing with out of date info., as noted in the comments.

P.S. Dropbox is using Amazon’s S3 data storage service to host the files, and all the files are encrypted before the transfer. See the faq for more details.

Kindle Scholarship

David Weinberger, over at his Everything is Miscellaneous blog, points out three major problems with Amazon’s Kindle reader for scholarly work:

1) “note-taking and highlighting are jokes”
2) “doesn’t know the original page numbering”
3) “no bibliographical tool”

(From “Kindle is fun but sucks for scholars“)

One of the comments to that post suggests porting to the Kindle a version of Zotero, which could make use of your database portably. Presumably such a database could be loaded via the SD card or, perhaps, it could be shared via Zotero’s new database syncing (currently in preview). I think this is a useful suggestion and could potentially be stunning. However, it will only be stunning if the killer feature of Zotero — i.e. keeping pdfs/docs of items close to their bibliographical entries — is retained. After all, what sets the Kindle apart from any other device is not its organizational structure but its beautiful reading surface. So, combining the killer feature of Zotero with the killer feature of the Kindle sounds to me like a great idea. Of course, one major hurdle is that the Kindle software is proprietary. Unlike converting single books/pdfs into Kindle format, you would want to retain the entire structure and functionality of your Zotero database, which means converting the app. I wonder if a service like Feedbooks would work as a Zotero model: you put a list of hyperlinked out-of-copyright books on your Kindle which download immediately in Kindle format when you click on the link (via 43 Folders). If converted for use with Zotero, you could search your database locally but only call the pdf when needed. To save space the pdf could be set to self-destruct after a certain amount of time, or better, it would somehow ask the user if he/she was ready to destroy the pdf. Of course, your shared database in the cloud would be maintained intact.

To me the iPhone is an ideal device for an implementation of Zotero.[1] The screen on the Kindle is better for reading lots of text, but the iPhone is much the better size for quick searches of your Zotero database while working in the stacks of a research library. Plus, I’m young enough to feel that pulling out my Kindle in public is a lot dorkier than pulling out my iPhone. In terms of content, perhaps a preliminary version wouldn’t necessitate the associated documents, but only the bibliographical entries — maybe adding new entries through photos of the barcode (à la Delicious Library). If it could be implemented, the killer feature would still be the built-in relationships between entries and documents, which is why I use Zotero in the first place. (though pdf support on the iPhone would have to improve) And adding to the database via MobileSafari would be a desideratum as well. Essentially, reproducing all the main features of Zotero on the iPhone are highly desirable to me, especially if my existing database is seamlessly integrated with the iPhone version.

All of this is well and good, but as John Gruber recently noted, developing for the iPhone requires adherence to the terms and conditions of the App Store, which seems to break the GPL. Presumably the Kindle has proprietary agreements for developers as well, and obviously lacks even the semi-open playing field of the App Store. Altogether, this is probably a deal-breaker for any portable Zotero implementation, though I hope I’m wrong. If anyone is working on this and can talk about it, please let me know!

[1] Feedbooks has an API which has been implemented in an iPhone app called Stanza by the company Lexcycle. I haven’t had a chance to test this app yet but I’ve heard good things about it. This API definitely seems to be a possible starting point for any homebrew iPhone-Zotero implementation, especially if the Zotero development group itself is unable to produce an iPhone app for whatever reason.

Computational History

William Turkel, author of the excellent Programming Historian, has recently published a provocative post on his blog, Digital History Hacks. He writes there,

To some extent we’re all digital historians already, as it is quickly becoming impossible to imagine doing historical research without making use of e-mail, discussion lists, word processors, search engines, bibliographical databases and electronic publishing. Some day pretty soon, the “digital” in “digital history” is going to sound redundant…

Turkel’s post, well worth a read in toto, eloquently states what was decried last November by Anthony Grafton in the New Yorker:

The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we’ll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention. (p.4)

Grafton and Turkel agree on the point that learning how to navigate the seas of digital information will be a crucial skill for any successful historian in the future (or even in the present). I wonder, however, if younger scholars raised on the Google, IMBD, and Facebook will find it so challenging as Grafton suggests to combine the results from multiple sites and search engines. In fact, the disjointedness of the internet is one of its few universalizing aesthetic qualities. Few people involved in the digitization of major research libraries are in favor of a single search to rule them all. Google is trying that and has met with fierce opposition, both from publishers and from libraries unwilling to cede control of their freedom to act independently. Grafton is obviously aware of this and makes a number of important points about Google in his article. I wonder, however, if Turkel’s knowledge of the technologies behind the current transition to digital libraries isn’t a sign of things to come. Is it conceivable that fluency with Python and archival APIs could become a prerequisite of creative scholarship in the next decade? In other words, as pithy as Grafton’s criticisms may be, could he be vastly underestimating the scale of the humanistic revolution at hand?

Diogenes for TLG and LSJ

A colleague recently pointed me to the new version (3.1) of the Diogenes software for searching the TLG (Thesaurus Linguae Graecae) and PHI (Packard Humanities Institute) discs of Greek and Latin texts. The CD-ROM of the TLG (version E, last updated 2000) has long been surpassed by the web version — the latter includes a whole host of texts (mainly late antique and Byzantine) which are not on the disc (to see a list, click “Post-TLG E (web only)” on the left of the homepage). Impressively, the new Diogenes comes with both the revered Liddell, Scott, and Jones (LSJ) Greek Lexicon and the Lewis & Short Latin dictionary. These are indispensable resources for the classicist. (Lewis & Short is particularly helpful since the magisterial Oxford Latin Dictionary stops sometime in the early second century AD and is virtually useless for later Latin.) Both are locally searchable and also free, which is a huge bonus.

I’ve only been really playing with Diogenes for a day but I’m already impressed. The killer feature for me is the linking between the TLG Greek texts (a huge corpus) and the LSJ. If you’re reading a Greek text and want to look up a word, all you do is click on the word and the dictionary pops up on the right; further, every word in the dictionary is also tagged, so if you click on one of those, then you’re taken to another dictionary entry. This is all dependent on the Perseus morphological database, though not in real time (as discussed at the bottom of the FAQ page). All the words in the TLG and PHI databases have been run through Perseus’ Morpheus parser ahead of time.

So far so good. In fact, at this point in my brief investigation I was in heaven. I had barely used Diogenes before (a long while ago) and was not really taken with it. I have been using the Silver Mountain software “Workplace Pack” in combination with Logos/Libronix’s edition of LSJ since 2004 or so. In my first look Diogenes was surpassing my previous tools by a long shot. However, I’ve run into two snags which have dampened my enthusiasm somewhat:

1. The Diogenes/Perseus LSJ does not include the Supplement (1996). The Libronix version not only includes the Supplement but has integrated that material into the LSJ text itself (unlike any other version of the LSJ currently available). The Libronix edition has a number of other “search enhancements” that add value to the digital version.

2. The linking between text and dictionary, which is supposed to be bi-directional, is only really secure in the direction described above — that is, from the TLG text to the LSJ. If you try to go the other direction, that is, from a reference in the LSJ to the TLG, you are not likely to end up where you intended. In my brief (and unscientific) testing, only about 1 out of every 5 textual reference links will take you to the right spot in the given TLG text. Why is this? Well, here’s my theory (and I’m definitely willing to be corrected): the TLG has made it a point to include the most up-to-date Greek critical editions of its holdings, regularly replacing earlier editions. (As is well known, none of these editions includes critical apparatus, which is ostensibly how they avoid copyright infringement.) By contrast, the Perseus texts are all older, out-of-print editions (perhaps because of copyright? I’m not sure.). So why does the LSJ-to-TLG linking work at all? Well, many texts (Homer and the New Testament included) have had their verse-numbering structure set for a very long time, so the older texts and more recent texts share the same numbers. Click on any link to a Pausanias reference and you’ll be taken to a seemingly random place in the TLG text. By contrast, click on a reference to Sophocles and, most likely, you’ll find the spot you wanted. When it works, this is an incredible piece of software, but it is also infuriating to see the unrealized possibilities. To be fair, the actual text of many works has changed since their edition of the LSJ was published, so words can always be expected to appear where they once did. And, further, the LSJ reference may refer only to a chapter of a work and not to a specific paragraph or line, so some close reading will be necessary.

So what’s next? Well, the links need to be fixed, obviously, though I’m not sure whether that is Diogenes’ or Perseus’ responsibility. Presumably the latter, though Perseus doesn’t link directly to the TLG (even though the reverse is sometimes true for translations and dictionaries). Still, the Logos/Libronix has no TLG capability as far as I am aware, and linking (for all its patchiness) is still the killer feature of Diogenes. Another killer feature is the cross-platform capability (Windows, Mac, and Linux). Logos has recently released the alpha version of its Mac client, which is welcome news to many but which is still vastly under-powered. For instance, you cannot search the LSJ by entry word in Greek; you have to scroll through the alphabet, painfully. Finally, Diogenes, like Logos, is Unicode compliant — a no-brainer these days, but it’s indicative of the quality of this app that even the Coptic texts from the PHI disc are treated properly with Unicode. Overall, it is a really nice piece of software — I like the browser style interface, which is the same across the three platforms. There’s a lot of hand-coding that will have to be done to get the LSJ-to-TLG links to work correctly, though presumably some of that could be automated. One suggestion offered by my colleague Gregory Smith is that Diogenes could issue a search for each LSJ reference, when it is clicked, to ensure that the text is really there in the TLG. This would slow things down, but at least you could trust that you’re linking to the right bit of text.

PS A final issue I should at least mention is that, as someone who works primarily with later Greek texts, I would love to see the web TLG corpus brought into the equation somehow. The E CDROM is still very valuable for local searching, but there’s much it does not include (as mentioned above). I’m not sure if I would prefer web queries or a downloadable package of all the TLG texts (surely that wouldn’t be that hard to produce). But in either case, access via Diogenes to the complete TLG is a desideratum.

Kottke on Clusterflock

Good interview with blogger/editor Jason Kottke over at Clusterflock. The part I can relate to, being a NetNewsWire junkie myself (though not half the blogger Jason is):

What’s your process, then, how do you go about your day at the site?

I read a lot. 99% of it doesn’t make the site, 1% does. Most of the stuff I read comes to me through a newsreader. I follow roughly 300 sites a day.

Jesus! Do you ever feel overwhelmed by the influx? What’s your process for dealing with that?

Very overwhelmed. At this point, I’m probably just used to it. I get the sense sometimes that reading/skimming so much information every day is not good for my brain. Sometimes I can’t remember any details from what I’ve read the previous day. Don’t know if that’s all the input or something related to getting older.

I hear you, Jason.

Music Recommendations

While not what I promised in my last post and not at all what I originally imagined I would be writing on this site, I was thinking today that perhaps some music recommendations would be welcome to you readers.

First, if you haven’t yet, go check out NPR’s All Songs Considered podcast. Things they’ve done recently which I liked are:

1) The listeners’ picks for 2007’s best CDs. A number of artists on the list I had heard of but never actually listened to. If you spend some time with this, I think you’ll agree that the list points equally to the fecundity of the “alternative music” scene (that’s what we called it in the 90’s) as well as the good taste of NPR’s listeners.

2) Interview/Guest-DJ sessions with some well known artists like Thom Yorke of Radiohead and Colin Meloy from the Decemberists. I found the following quote from Meloy especially insightful: “I am more interested in writing outside my realm of experience. To be honest, I don’t find my life that interesting or compelling.” He had been asked about why he didn’t write songs about himself (like everyone else, presumably). More and more I find that books and songs that are transparently narcissistic can’t sustain my interest. I heard Tom Wolfe give a talk about this one time: of course, when he writes outside himself it has its own name, “New Journalism“. I guess increasingly I just like stories and good third-person narrative — in that regard, I definitely appreciate Meloy’s comments on the subject. These interviews are valuable, precisely because there is other music involved which the artists can use to exemplify and contextualize their thoughts.

3) Finally, they’ve got an incredible back-catalogue of live concerts from the 9:30 Club in D.C. I really liked the recent Stephen Malkmus and the Jicks concert. The series is called Live in Concert From All Songs Considered.

Second, lately I’ve really been enjoying the following records:

1) The Raconteurs, “Consolers of the Lonely” — there’s something very unpredictable and bluesy about this record. I really enjoyed the first one, “Broken Boy Soldiers”. There’s a number of things to like: musicianship, songcraft, electric guitar bravado. The last is well known from the White Stripes records and I love it.

2) SunKilMoon’s “April” is gorgeous. I am a huge fan of their previous records and am really happy to have a third.

3) Speaking of thirds, one of my favorite bands from college, Portishead, has a new album, “Third”, coming out at the end of this month. This is their first studio album in eleven (11) years. That’s a long time, and I really hope it doesn’t disappoint. (In college we used to turn the lights out in our apartment and turn on Portishead really loud while we played Goldeneye multiplayer on the Nintendo N64. Really spooky, let me tell you. Those were good times.)

4) Gary Louris, the lead singer of the Jayhawks, recently put out a solo record which I like a lot. It’s called “Vagabonds”. If you like the Jayhawks, you’ll dig this.

Third, a few bands I listen to have their own podcasts. They Might Be Giants’s podcast is by far the best. Their Friday Night Family podcast is mainly songs taken from their recent album, “Here Come the 123s”, which my kids like. Their previous album “Here Come the ABCs” is part of our family lore: the song “C is for Conifers” is incredible. There’s also the Radiohead podcast, in which they release live versions of their songs, recorded in their Oxford studio. (FWIW, I used to see Thom Yorke walking his son in the Oxford University Park on Sunday mornings while I was in grad school. The Park is right across from my college, Keble. I never said hi, though I was sorely tempted. He seems like a really decent guy.) Wilco also had a podcast for a while, but it doesn’t seem to have been updated lately.

On the subject of podcasts, I think I’ll try to put a list together soon of things I listen to on a regular basis. I find podcasts and audiobooks really enjoyable and a useful way to “read” or catch up on the news while you’re doing something else with your hands, like writing blog posts.

A Little Behind

For those who are following this site, I realize that I’ve been rather quiet the past month. I’m currently trying to finish up a few longer posts amidst my regular academic business of attending conferences and constructing exams. I thought I might at least let you know the topics I’m working on and intending to discuss in the near future:

1. A representative of Project Muse has contacted me regarding my two posts on JSTOR. We’re trying to arrange a time to talk. I would very much like to add their perspective to the mix. Digitization is a subject that continues to fascinate me both for its practical difficulties and intellectual potential.

2. Zotero has become a mainstay in my research over the past month. I put it through the paces writing a recent conference paper, which will actually be a chapter of my current book project. Overall, I was thoroughly impressed, but there are some usability quirks that I find irksome, not least the integration with Word and OpenOffice/NeoOffice.

3. Finally, I’m trying to figure out how to put said chapter online in its draft form, in the hopes of getting some tangible feedback. Obviously, it won’t be interesting to everyone, and it’s definitely got some warts, but one of my intentions for this site is to make it productive for my own research. I just need to figure out how this will work best.

I have some other thoughts too, including how I might use Twitter or some kind of chat software in the classroom during my upcoming Spring Term course. Many of my colleagues think it might make an interesting experiment, but I’m a little sheepish at the moment. We’ll see.

Hot Air

An email I received today from Lenovo about the newly released ThinkPad X300 states the following:

“The no-compromise, ultraportable, 13.3″ widescreen notebook with an optional integrated DVD drive and 3 USB ports, starting at just 2.9lb. Everything else is just hot air.”

Let us compare this final, taunting sentence with the X300 review from CNET, which states that, in every benchmark test they performed, the MacBook Air beats not only the X300, but also the Toshiba R500 and the HP Compaq 2710p. In one way, this is not a surprising piece of news given that the Air uses a regular, desktop-class Core 2 Duo processor, running at 1.6-1.8ghz, and not a low-power version (e.g. the SL7100), running at 1.2ghz, as in the X300. However, the fact remains that Lenovo is marketing itself as superior to the MacBook Air, when in reality it is inferior in terms of performance. There are a number of reasons you may buy a X300 over an Air, especially if you need a DVD burner or are especially rough on your laptops (that hybrid roll cage looks impressive). I like ThinkPads. I use one daily as part of my work. If I’m going to buy a PC laptop it’s going to be a ThinkPad. But the Air is currently a better performing machine, which is a significant fact in the current explosion of subnotebooks.

JSTOR responds and previews

Last week I posted some positive thoughts on JSTOR and suggested that JSTOR might have a role in the new conversation about the distribution of scholarly knowledge. Harvard’s recent decision to self-distribute is very forward thinking and may have significant effects on academic publishing in the future.

The next day after posting the JSTOR post (in fact, about 12 hours after posting it) I got a call in my office from Jason Phillips, associate director for library relations at JSTOR. We talked for about an hour, and Jason answered numerous questions I had about JSTOR’s business and their plans for the future. Needless to say I was seriously impressed by Jason’s eagerness to make contact and his forthrightness about JSTOR’s work. Here are some nuggets from that conversation — much of this stuff is available on their website, if I had taken the time to look it up (which is all the more reason to be impressed by Jason’s call!).

1. The cost of the subscriptions to JSTOR which libraries have to pay (and my main point of criticism) is directly related to the cost of production (mainly scanning) and managing the content (archiving the data and preserving it). Hardware and software upgrades are part of this cost too: more on that below. JSTOR is technically a non-profit company, so all of its revenue goes back into the archival and maintenance process.

2. The publishers set the “moving wall” between the archive and the new volumes. No new publishers have a moving wall greater than seven years and most are at around three years. After first saying that the moving wall was a “joint decision”, Jason backed up and said that JSTOR “allows” for the gap in order to secure the publishers and archive their data. JSTOR does “revenue share” with publishers, but Jason claimed this was minimal. Essentially the publishers are not paid for having their content on JSTOR. That was probably the most enlightening piece of information for me and gives me more confidence about JSTOR’s role in the future model.

3. The moving wall discussion led to what I find most interesting, namely that JSTOR is largely a free-agent in all of this. They don’t really seem beholden to academic publishers, yet they do their best to be a responsible liaison between the publishers and libraries. This must be a crucial role because the publishers are willing to let their content be distributed without receiving much in the way of royalties. In other words, the archiving and distribution process must be worth more to them monetarily than the rights they have to all those back volumes. That fact is saying something really important about the “real money” value of archiving and distribution.

4. Now, this is where Harvard’s new model comes into the picture. What happens when a new method of distribution is taken up by a major university with lots of money and clout? Up to this point, of course, the university has been the consumer and has largely funded JSTOR’s archival process (in addition to donations from charitable institutions like the Mellon foundation, which founded JSTOR). But if the university gets into the archival and distribution business and, eventually, has no need for JSTOR — I’m assuming Harvard is going to digitize and self-distribute all the content in their library via someone other than or in addition to Google, which seems inevitable — then what is JSTOR’s role? And how are the publishers going to respond to this new economy of academic publishing? Especially given that it looks like Harvard will use its own weight to distribute as much as possible freely. (This is a huge copyright debate waiting to happen, and Harvard actually has the money and the will to fight it, which is exciting.) I asked Jason specifically what he thought JSTOR’s role in this new economy would be, and his answer was cautious and thoughtful. He said that JSTOR was essentially “a project” and “an archive” and does not necessarily need to exist in the future. However, JSTOR has 4,500 libraries “involved in” (i.e. subscribing to) the project and has a responsibility to them. More pointedly, he said that JSTOR’s mission statement had recently changed, and the organization is now calling itself a “platform for scholarship”. So, as I guessed, JSTOR certainly seems to be positioning itself to become a delivery mechanism for archived knowledge in the future, in whatever form that may occur, and very probably saw Harvard’s decision coming. Jason agreed with me that it was exciting and he pointed out that JSTOR had done a lot of work in the R&D of digitization technology, which is now being used by libraries all over the world, like Harvard, as well as by corporate partners/competitors like Google.

5. Speaking of redefinition and innovation, Jason encouraged me to take a look at the preview version of JSTOR’s new interface. I hope to write a review of this in the future — perhaps when it’s released in a month or so. Suffice it to say for the moment that this is an entirely new platform for them and the interface is attempting to be more individualized, with a “saved citations” section. It looks good so far — my only recommendation was that they put a thumbnail of the pdf next to the citation because, ultimately, that’s the direction all of this is heading (and is already there if you use Zotero) — you want the actual text of the article/book right next to (and incorporated with) the reference. This is something RefWorks (as I understand it) does not do, and EndNote in my experience handles this functionality rather poorly. In any case, you want a Cloud version of all your stuff anyway in case your computer gets lost or you are using a public terminal. So, it seems to me that JSTOR is moving in the right direction (viewing your pdfs is a two-click movement away from the citation list — that should be one click away and nested within the list). FWIW, you can also export selected citations from your library which is a nice feature, though I haven’t yet tested the fidelity to established bibliographical standards.

6. Finally, some criticisms. Or, rather, questions. The future is still very blurry to me, especially as it regards smaller universities and schools who cannot afford JSTOR’s (and the like’s) subscription rates. JSTOR is available in a huge number of countries but not all the universities in those countries have access to JSTOR. That’s a major issue that will need to be resolved. In some ways Harvard is going to bat for those universities while it goes to bat for itself. In terms of the new JSTOR platform, it is still a central issue that not everything is archived there. The knowledge is only as available as it’s available (duh). In other words, if they don’t have it archived (or if it’s behind the moving wall), many people will assume it doesn’t exist, including many of my students, who understandably struggle with this concept. A single archive, or better, a single network of interconnected archives (whether university or non-profit based) is going to emerge in the next several years (with no moving walls), and I assume that Harvard is already thinking about what this will look like. My point still stands, and perhaps is further emphasized by Jason’s helpful comments, that JSTOR has the ability to be involved in this if they so choose. The two keys, to me at least, are that 1) it needs to be free, and 2) it needs to be easy and clear. Neither key is currently the case across the board. JSTOR has largely achieved the second key and may be a viable option for those trying to achieve the first.

(Many thanks again to Jason Phillips for his enthusiasm and patience.)

Hiding in Leopard

Is it just me or has the “hide App” command changed in Leopard? In Tiger, as I remember, when I hide an application the app still shows up in my command-tab HUD of running apps. Now, in Leopard, when I hide an app with command-H that app no longer shows up in the command-tab HUD. I now have to go to the dock to select that app, or “re-launch” it with LaunchBar (I usually do the latter because my dock is always hidden). Anyway, if anyone else has noticed this behavior, let me know — is there an option for this in System Preferences or Finder options? If so I can’t find it. Overall, I don’t know how I feel about this change. I think I like it but I don’t know why.