The Great Unanswered Questions: Can Google do it right?
Through all the hype and lawsuits and screaming and dreaming surrounding the Google Book Search/Library project, I have been trying to get people asking important qualitative questions.
• Can Google actually do what it says it will do?
• What will the principles of search and organization be?
• What metadata will guide the search process?
• Will there be an algorithm similar to PageRank that determines which books end up where on the search list?
• How will Google make these searches make any sense?
• Will such searches work for one of the major goals of searching: to identify a document that we know is out there but can't quite recall?
No one at Google will answer these questions. The answers are proprietary and secret.
No one at the University of Michigan will answer these questions because they can't. They have nothing to do with the quality of Google's work. For some reason, Michigan never insisted on qualitative standards (let along open standards that would conform the the ethics and mission of academic libraries) when they agreed to give away their treasure to Google for the promise of a pile of electronic images.
Alarmingly, the current Google Book Search service offers stunningly bad results. If this is the model for the complete project, we are inviting some really bad research in the future. Don't throw away that card catalog just yet.
Now, the current index for Google Book Search is authorized. That is, each of these texts is licensed to Google under the terms it had been negotiating with publishers for some time. Only in wake of the surprise announcement of the library project did publishers get understandably upset at Google.
Basically, publishers want their books in Google's index. They just want some contractual say over how they will be presented. Some publishers would not mind a cut of the revenue it produces. As I have said many times, publishers have nothing to worry about from Google. They will not lose revenue from book sales. Publishers just want a piece of the ancillary revenue. I don't think copyright should guarantee such revenue. As Cory Doctorow has said, "should publishers get a cut of revenue for bookshelves?" I am not sure the courts would agree with me. They seldom do.
But remember Cory Doctorow's essay about Google Library, "Why publishers should send fruit baskets to Google"?
The crux of his argument is that books are in danger of becoming less relevant as they remain distant from the Great Digital Environment. As we become more dependant on on-line searches via search engines like Google, we are less likely to open a book to find an answer. This is the strongest argument for a massive digital library project. But, as I have said many times, it is not an argument that says Google should be doing the scanning and indexing. In fact, I would argue it is a reason to exclude Google from the game. But that's another story.
Cory has been pushing a line of argument for a number of years as he has tried to liberate people in the book publishing world of their irrational fear of digitization, open content, and Creative Commons licenses. He has been absolutely justified in these efforts. His book sales prove it.
However, I think Cory goes too far when he surmises that having his books included in the Google Book Search index makes them any less obscured than having them on stacks of a real library or a real book store.
To test this proposition, today I pretended to be someone who is curious about a book Cory wrote, Down and Out in the Magic Kingdom.
Now, this book is available online. So one need not use Google Book Search to read it. But let's forget that for a moment. Let's pretend that like so many books (mine, for instance), the whole text is not available via the open Web yet it is accessible (but limited by DRM and proprietary software) via Google Book Search.
I pretended that I did not remember Cory's name. I just remembered that the phrase "Magic Kingdom" is in the title. This is not a far-fetched search method. When I tried to recall the title of the essay Cory wrote about Google Library, I tried several variations of what I misremembered until I picked one general enough to get the essay to the first page of search results. That's what we do when we can't remember exactly what we want, but we know it's in there somewhere. As John Battelle explains, we both search to discover and search to recover. Here, I am trying to recover something: a science fiction novel that has "Magic Kingdom" in the title.
Here are the results of my search for "the magic kingdom book":

As it turns out, Cory's book is mighty obscure. It's in the middle of page six of the results. It's just below -- get this -- "Real Magic: Creating Miracles in Everyday Life." Here are some books that outrank Down and Out ... in relevance:
• On page 5, Liberace: An American Boy.
• On page 5, The Paradoxical Kingdom: Saudi Arabia and the Momentum of Reform.
• On page 4, Hey Kidz, Buy This Book!: A Radical Primer on Corporate and Governmental Propaganda.
• On page 3, Lawrence Lessig's Free Culture (only because it cites Cory's book! Yep, a book that cites Cory's book outranks Cory's book. Talk about obscurity!).
Oh, despite the fact that both Free Culture and Down and Out have been released under Creative Commons licenses that are supposed to lock the content open, Google has apparently violated the terms of the license by limiting what users can do with the text. Will Cory and Larry sue Google?
What's the number one hit on Google Book Search for "magic kingdom?" It's called The Magic Kingdom of God: Christianity and Global Culture Industries. I am somewhat relieved that it is NOT a book about Disney. But think about for a second. If you asked a librarian for books about "the magic kingdom," would not he direct you to books about Disney? Would not he at least ask you two or three more questions to figure out what you might be after?
Oh, the first book about Disney in the list is Disney Discourse: Producing the Magic Kingdom. It's third on the list but the only Disney-related book on the first page of results. The second Disney-related book on the list does not appear until the sixth page of the results. It is -- you guessed it -- Cory's book.
Try it at home for yourselves. Pretend you can't remember an author or exact title of a favourite book and see if Google Book Search helps you in any way. Or pick a famous phrase from a famous book and see how many other books Google thinks are more relevant than the one you are trying to find. My favourite is "the best of times, the worst of times." See how many computer manuals show up ahead of Dickens.
Just to be fair, here is the seach for "copywrongs," a term I invented for the title of my first book. Fortunately, my book ends up first. But if you search for books about copyright, you won't even find it. Now, my book might not be one of the 208 most important books about copyright. But Lessig's Free Culture sure is. It does not make the list either.
And please allow that the search experiment I conducted here was not a skilled or well-designed search strategy. Better search strategies yield better results no matter how bad the index is. But remember, most users are not trained at search strategies. That's why we need librarians.
So for all those fans of Google Book Search and the Library Project, please remind me how this service is going to improve our lives again?
Comments
Here's the way I think about Google Book Search and the Library project:
Google is a business, period. It exists to make money, period. Google obtains most of its revenue from selling advertising. They use an auction market model where the highest bidder gets the ad space. In the advertising world; this is clever, unprecedented and highly profitable for Google. The closest print analog to Google's business model would be The Yellow Pages or, to a lesser extent, The National Enquirer and People Magazine. The only significant difference is that these businesses sell ad space at fixed rates on a first come, first served basis(more equitable, don't you think?).
How would we feel about this endeavor if The National Enquirer were bankrolling this project?
Yes Siva, you are right, please have someone explain how this service is going to improve any lives but the shareholders of Google.
Posted by: Jardinero1
|
February 21, 2006 02:00 AM
I agree that the questions you raise are important, and deserve answering.
I think the value of the as-is-planned Google project will come from browsing behavior, not necessarily for focused searching for a specific book. It should, theoretically, provide another means of finding information. The problem is getting the relevance issue right. ^_^
While the chance of getting a specific book is probably low, I suspect the chance of getting something that you might not have otherwise found is pretty decent. Someone should do a study on this one of these days, if they haven't yet. ^_^
Posted by: cjovalle
|
February 21, 2006 11:00 AM