Erin McKean and inclusive entrepreneurship

Today is Ada Lovelace Day, a day celebrating the achievements of women in science and technology.  There are all kinds of ways to be a scientist or a technologist, and just in the fields of computing and information technology I can think of a number of first-class inventors, investigators, developers, teachers, and integrators who are women.

When it comes to entrepreneurs, though, the list of women who come to my mind gets much shorter.  And that’s unfortunate, because in computing and information technology, as in many other technical fields, entrepreneurs play a major role in bringing the fruits of technology to the world at large.   If you mention Steve and SteveBill and Paul, or Larry and Sergey, lots of people will know who you’re talking about, and also know the stories of the companies they founded and their world-changing products.  Women tech-company founders, though, have not been so noticeable.  Jessica Livingston, one of the founders of the tech-startup catalyst Y Combinator, reports that nearly all of their applicants have Y chromosomes; only about 7 percent have been women. And in mid-2008, the San Jose Mercury News reported that there were no female CEOs in the top 150 Silicon Valley companies.

Despite these discouraging statistics, you can find women pioneering new technological businesses.  One who I find particularly notable (and not just because the woman technologist closest to me works for her) is Erin McKean, co-founder and CEO of Wordnik, a company that’s reinventing the dictionary for the Internet age. If you watch this 15-minute video of a 2007 TED talk, or just read the About and FAQ pages at Wordnik and play around with the site some, you’ll pick up the general ideas.   There are a few aspects of her venture that I think are worth special note:

Rethinking the familiar. Dictionaries have been around in print for centuries, and even computerized dictionaries have existed for decades.  But as Erin makes clear in the TED video, once you have the Internet, and lots of data and computing power, you can discard many of the limitations of prior conceptions of the dictionaries, and do lots of interesting new things.  You can include all the words in a language, not just the ones that pass some sort of notability test.  You can show examples from a huge array of  formal, informal, and ephemeral sources.  You can do statistical analysis to track word usage over different times, places, and genres.  In short, you can make a reference that meets a known need in new and useful ways.

Risk-taking. It’s a truism that startups are risky ventures.   You spend years of your life working long hours, often with low pay and few benefits at the start, for a project that may make you rich, but statistically is more likely to come to nothing.   And some believe that online dictionaries, in particular, may be obsolete, and vulnerable to the same sorts of disruptive markets that blew the encylopedia business to bits.  But thus far, Erin’s risk-taking seem to be paying off; the site was named one of PC Magazine’s Top 100 products of 2009 (one of only a few websites to be included in their list), and it continues to attract funding even in the midst of a severe recession.

Inclusion. One of the key distinguishing features of Wordnik, alluded to above, is its inclusiveness.  By collecting all the words and examples that it can, from a wide variety of sources, it stands out from other dictionaries.  And along with the usual definitions, pronunciations, and example sentences for words, Wordnik also brings in many other unconventional sources — Flickr photo streams, Scrabble scoring references, and user tags and comments, to name a few– to enhance understanding and enjoyment of words.  It remains to be seen which of these sources will prove most useful or popular in the long run, but openness to new information and ideas is often a crucial part of inventing and improving new technologies.   It also helps ventures evaluate and adapt their technologies and their business models, so that they can thrive rather than perish under changing conditions.

Inviting collaboration. If inclusiveness is important to you, it’s not enough just to wait for things to come to you; you need to go out and invite people to work with you.  The Wordnik site does this to a certain extent by design, encouraging people to contribute notes, tags, lists of favorite words, and other information.  (Our 7-year-old daughter was an ardent contributor of Pokemon character names and pronunciations during the site’s alpha test.)  More recently, Wordnik has partnered with the Internet Archive and a variety of publishers and publishing sites to develop Smartwords, a forthcoming open standard for querying and embedding word information on demand as people read online, or communicate via social software.  Erin and her collaborators hope that this will extend the reach of Wordnik’s services into many more contexts than just “going to look up a word in the dictionary”.

While the details above are specific to Wordnik, the four basic qualities they embody — rethinking the familiar, risk-taking, inclusion, and inviting collaboration — have general applicability, and are especially worth consideration on Ada Lovelace Day.  The low numbers quoted earlier for women tech founders and CEOs suggests to me that women may not always be seriously considered (by themselves or others) for those roles.  Thinking in terms of the qualities I’ve mentioned makes it easier to envision women in those positions.  These are also qualities that can help both men and women take a more entrepreneurial approach to the technologies (and the libraries) they develop, so that they can have a lasting, positive effect on the world.

Lots of conversation keeps stuff sustainable

Among the hats I wear at my place of work is that of LOCKSS cache administrator. LOCKSS is a useful distributed preservation system built around the principle “Lots of copies keep stuff safe” (whose initials give the system its name).  The idea is that, with the cooperation of publishers, a bunch of libraries each harvest copies of selected online content, and keep backups on our own LOCKSS caches, which are hooked up to local library proxy services.  Then, if the material ever becomes inaccessible from the publisher, our users will automatically be routed to our local copies.  Each LOCKSS cache also periodically checks with other LOCKSS caches to ensure that our copies are still in good shape, and to repair or replace copies that have been lost or damaged.  (Various security features protect against leaks of restricted content, or unauthorized revisions of content.)

LOCKSS is open source software that runs on commodity hardware.  It was originally envisioned to run virtually automatically.  As Chris Dobson described the ideal in a 2003 Searcher article, “Take a computer a generation past its prime…. Hook it up to the Internet and put it in a closet. Stick in the LOCKSS CD-ROM and boot it up. Close the closet door.”  And then presumably walk away and forget about it.

Of course, it’s not that simple in practice, particularly if your library is proactive about its preservation strategy.  The thing about preservation at scale is there’s always something that needs attention.  It might be something technical, or content-related, or planning-related, but preserving a growing collection requires ongoing thought.  And if you want to think as clearly and sensibly as you can, you’ll want to collaborate.

Right now, for instance, I’m trying to get my cache to harvest the full run of a journal that’s just been made available for LOCKSS harvesting, where we hope to provide post-cancellation access through LOCKSS.  Someone at Stanford just gave me a useful tip on how to give this journal priority over the other volumes I’ve got queued up for harvest.  Unfortunately, I can’t try it out until I get my cache back up after it failed to reboot cleanly after a power failure. While I wait to hear back instructions about how best to remedy this, I wonder whether switching to a new Linux-based version of LOCKSS might make such operating system-level problems easier to deal with.  But it would be useful to hear from folks who are running that version to see what their experience has been.

Meanwhile, we’re wondering how best to approach new publishers who have content that our bibliographers would like to preserve via LOCKSS. Our special collections folks wonder whether we should preserve some of our own home-grown content via a private LOCKSS network.  I’m also doing some ongoing monitoring and testing of our LOCKSS cache’s behavior (some of which I’ve reported on earlier), and would be interested in knowing if others are seeing some of the same kinds of things that I see on the cache I administer.

In short, there are a lot of things to think about, when LOCKSS plays a significant role in a preservation plan.  And a lot of the issues I’ve mentioned above are ones that others may be thinking about as well.  So let’s talk about them.  As the LOCKSS group has said, “”A vibrant, active, and engaged user community is key to the success of Open-Source efforts like LOCKSS.”

One thing you need for such an engaged community is a forum for them to talk to each other.  As it turns out, the LOCKSS group at Stanford tell me they created a LOCKSS Forum mailing list a while back, but I haven’t yet seen it publicized.   Its information page is at .  (Currently, archived email messages are not visible on the open web, though this may change in the future.)  If you’re interested in talking with others about how you use or might use LOCKSS to preserve access to digital content, I invite you to sign up and help get the conversation going.

Implementing interoperability between library discovery tools and the ILS

Last June I gave a presentation in a NISO webinar about the work a number of colleagues and I did for the Digital Library Federation to recommend standard interfaces for Integrated Library Systems (the systems that keep track of our library’s acquisitions, catalog, and circulation) to support a wide variety of tools and applications for discovery.   Our “ILS-DI” recommendation was published in 2008, and encompassed a number of functions that some ILS’s supported.  But it also included many functions that were not generally, or uniformly, supported by ILS’s of the time.  That’s still the case today.

As I said in my presentation last June, “If we look at the ILS-DI process as a development spiral, we’ve moved from a specification stage  to an implementation stage.”  My hope has been that vendors and other library software implementers would implement the basics of what we recommended– as many agreed to— and the library community could progress from there.  This often takes longer to achieve than one might hope.

But I’m happy to report that the Code4lib community is now picking up the ball.  At this month’s Code4lib conference, a group met to discuss “collaboratively develop[ing] a middleware infrastructure” to link together ILS’s and discovery tools, based on the work done by the DLF’s ILS-DI group and by the developers of systems like Jangle and XC.  The middleware would help power discovery applications like Blacklight, VuFind, Summon, WorldCat Local, and whatever else the digital library community might invent.

I wasn’t at the Code4lib conference, but the group that met there to kick off the effort has impressive collective expertise and accomplishments.   It includes several members of the DLF’s ILS-DI group, as well as the lead implementors of several relevant systems.  Roy Tennant from OCLC Research is coordinating the initial activity, and Emily Lynema of the ILS-DI group has converted the Google groups space used by the ILS-DI group for the new effort.

And you’re welcome to join too, if you’d like to help out or learn more. “This is an open, collaborative effort” is how Roy put it in the announcement of the new initiative.  Due to some prior commitments, I’ll personally be watching more than actively participating, at least to begin with, but I’ll be watching with great interest.  To find out more, and to get involved, see the Google Group.