Thursday, April 23, 2009

metadata, taxonomies, and curation

read:

Wendy Duff, Evaluating Metadata on a Metalevel. Archival Science, Volume 1, Number 3 / September, 2001.

Lars Marius Garshol, Metadata? Thesauri! Taxonomies? Topic maps! Making Sense of it All. Journal of Information Science, 30(4) 2004.

Yan Han, A RDF-based Digital Library System, Library Hi Tech, 2006.

Lynne C. Howarth, Creating a Metadata-Enabled Framework for Resource Discovery in Knowledge Bases, University of Toronto eprint.

Adrian Cunningham, Digital Curation/Digital Archiving: A View from the National Archives of Australia, American Archivist, Volume 71, Number 2 / Fall/Winter 2008.

Sonia Yaco, It's Complicated: Barriers to EAD Implementation, American Archivist, Volume 71, Number 2 / Fall/Winter 2008.

Cornell Institute for Digital Collections, EAD/XML Finding Aids Project

I spent some time these past 2 weeks learning about the relationships between archival practice and other disciplines, and how finding aids and taxonomies reflect archival practice. Wendy Duff's article is a kind of reflection on philosophical underpinnings of archival practice and how variations in practice are reflected in the metadata chosen for 2 different schemes. Of course this is not directly relevant to finding aids. But ideally finding aids should represent the taxonomy of the collection, and allow the user to use metadata to home in on a particular object.

Duff identified the differences in the 2 schemas she looked at as an emphasis on providing metadata as evidence versus more traditional archival (as opposed to records management) concerns like custody and paper-based archival description. The evidence-based study (Pittsburgh) derived metadata automatically--a big contention of Dr Galloway's-- but it is easy to imagine how this would not necessarily be the kind of metadata useful for discovery. In fact this is much more of a purely digital item-level description--why would you even bother describing the finds? I wonder, is there a reason to try, if you can search the collection via metadata?

In information architecture we talk about representing taxonomies through navigation--would you not do something similar here? The archivist could still order the records, but since they are described more like a digital library, that is individually identifiable, alternative archival arrangements (i.e., for use) could be derived. Once again, when findability and access are NOT the only or even the primary concern of the organizational scheme, archives do not offer any easy solutions.

Garshol talks about was the relationship between taxonomies and metadata: metadata describes the digital object, and is connected to the taxonomy. The most relevant idea in his article, which is mostly about topic maps, is that even most non-archival metadata is about item administration, not describing content. The Dublin Core title field, for example, gives you an indication of the subject, but only indirectly. For more specific description he recommends thesauri, ontologies, and faceted description: these tools extend the metadata, and/or the relevant taxonomy, and allow concepts around the object to be placed in relation to one another to better encompass it.

The article on curation--by an archivist--takes issue with the two terms being used synonymously. He argues that the crucial difference is that the concept of curation does not permit the maintenance of the context necessary for digital records, nor does it address the "recordness" of the records, as evidence. To manage archival records requires " finely engineered metadata schemes" which represent the information contained AROUND the object: that of the event of creation. Like the finding aid, it is the job of archival metadata to represent the information which cannot be contained in the thing itself. He additionally criticizes both the curation model and the OAIS model for failing to account for problems with the records that date to before they cross the archival threshold. He says that this is therefore curation, of the records as objects, not archives, which begins while the record is still in use: hopefully generated automatically, as in the Pittsburgh model

Sonia Yaco's study about barriers to EAD implementation is my last follow-up from last week's reading. She found that the middleware necessary to make the marked-up finding aid accessible is itself a big problem. A study done by the hilariously named Michael J. Fox in 1999 found that 56% of institutions who do EAD then don't put it online. It seems at this point that there are enough instruction manuals and toolkits out there describing how to mark up finding aids: the question is what, then , to do with them after that. The XML is not the problem, at this stage: it's everything after that!

I am thinking that for my paper, the last part of my project, I will work on this kind of issue. I will go back to LBJ's finding aids, knowing what I know now, and see if I can make any suggestions about where they should invest their resources. It is perhaps overly simple in the sense that they don't have any born-digital records, but I am also interested in the digitization possibilities of such a collection as well. In addition, what kinds of digital asset management systems could be brought to bear here? For even LBJ at this point has lots of digital representations of records. What are the standards, how should they be managed, how should they be stored?

Saturday, April 11, 2009

EAD, part 2

Read: Encoding Across Frontiers. Stockting and Queyroux, eds, 2004.

Finally, a book that gets at what I really want to know about finding aids on the Web! I have been reading this, slowly, all week--it is a collection of papers from the European Conference on EAD in 2004. Actually, both the conference and the book are about EAD and EAC, which is Encoded Archival Context. The projects are discussions of real-life implementations of EAD and EAC at various stages, with discussions of the constraints, decisions, and paths not taken for projects all over Europe.

One of the ideas that resonated most with me relates to something we've been talking about in IA class--that structure conveys knowledge. This is a big part of the reason I am so interested in finding aids in the first place: the challenge of presenting a big, diverse, complex body of material with clarity and in multiple dimensions. One of the essays discusses the need for EAD to be modular and discrete, and to keep the record, creator, and function descriptions separate from one another. At first this is confusing, because of course these are the meat of any archival description. But in EAD it is the relationships of the these modular pieces to one another that convey the most information. This make so much sense! In this way the descriptions cease to be bulky, wordy things, but elements that can be combined and recombined to describe multiple features of multiple series. And of course, provide multiple points of access, as well.

The paper from Angelika Menne-Haritz about Midosa XML addressed this well, I thought. She described the job of the archivist as one of describing for the user the information around the records which is not represented within the records, such as conversations, that comprise the bulk of the process: she says that records are only "traces of their material part " (pg 89). A good archivist, and a good finding aid, can convey the information around these traces. She specifically talks about a variety of tools that can be used in a digital environment to do this, such as digital representations of records. Midosa in fact provides for the inclusion of such tools within the finding aid. And so, it seems to me, that once you include such interpretive elements finding aids become so much more than they are on paper. They move closer to digital libraries.

One of the concluding essays describes an unexpected effect of EAD implementation, that of greater collaboration between different types of institutions. I think this is because once the records become modular, and exist in the rootless environment of the Web, there is a much greater chance of connecting them in new and interesting ways. I think that's the beauty of EAD and especially of EAC: the promise at least of providing all the necessary context but at a much more granular level than you could get at in a non-digital environment. Instead of having to drill down, as in a traditional finding aid, you could theoretically locate the exact chunk you are looking for and then drill up, contextualizing as you go, adding in as much detail as you like.

And you can mix and match in a way never before possible. Archives are certainly more connected to one another through EAD, especially in Britain. The A2A federated catalogue of 44 different archives seems particularly promising, although the search is pretty clunky. I played with it a bit and turned up lots of hits on a search of "Ormonde" (as in the Earl of) but most of the early ones were from the same collection. I like the list of institutions offered on the right hand side of the page, and the advanced search is probably much better, but there really should be a "see more from this collection" kind of function, like the one Google has. I think search is probably the biggest challenge for many of these systems Aat least 2 articles in the book mention that their EAD catalogues have a "Google-style" search box--but that doesn't mean they retrieve as well!

Additionally, I don't know how well such a federated system would work with Google--do they have the same problem as OIA-PMH? I Googled "Ormonde" and got some of the same results from the National Archives. But this page was much cleaner, and did not have the same duplication. Perhaps the assumption is that if you are using the A2A you are looking for a more granular level. Frustratingly, though, the National Archives results have lots of see-also "record references" and "other references": but these are not hyperlinked. And "other references" is NOT a helpful term!

So I think EAD seems to offer huge strides in online display of finding aids, but users are still very constrained by the software used in its implementation. The focus in the books I've read so far has been on the markup of legacy finding aids, and there is no question that EAD offers the best and most realistic way to put functional finding aids online. But the markup alone doesn't provide access--and the some of the other pieces seem to be lagging behind.

Wednesday, April 1, 2009

EAD, part 1

read:

Encoded Archival Description on the Internet, Daniel Pitti and Wendy Duff, eds. 2001.

"Using the open archives initiative protocols with EAD," Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 2002.

It took me a long time to work through
Encoded Archival Description on the Internet, as it is very dense. Much of the book is an overview of the creation of the EAD standards and explanation of how EAD will facilitate resource discovery as compared to the traditional finding aid. There seems little to argue about here: clearly a finding aid marked-up and posted on the internet is more accessible than one that is sitting on a shelf in the archives.

There is also a certain amount of discussion in Encoded Archival Description on the Internet about MARC records for archival materials, which integrate it into the rest of the library catalog, albeit at the collection level. I know Dr Galloway, for one, is positively disdainful of the utility of this. Certainly, there is so little information on a MARC record, comparatively, it is difficult to see how it could be truly helpful. If you are looking for information related to the creator of the collection, chances are you already know where his/her records are likely to be. If not, a quick Google search will tell you. The heterogeneity of the collection is the thing that needs to be conveyed for the record to be useful, and that cannot be done in the MARC format.

As usual, Anne Gilliland-Swetland's piece at the end of the book--alas, I read it in order!--was the most illuminating. She talks about the three reasons for finding aids, and how each of them are related to EAD. Access, which I have been thinking of pretty much exclusively, is only one of them. One of the reasons her article is useful is that she discusses at length research that Marcia Bates has done as to how researchers use the finding aid. She includes techniques like berrypicking, browsing, name searching, etc. Gilliland-Swetland then talks about these in the context of EAD. After all the vague discussion in the rest of the book about how EAD will facilitate access, actually seeing research done on the subject was very useful!

A major recurring theme in all the articles in the book is that of what they call union discovery, of standardized markup so that a researcher can search across EAD platforms and discover records regardless of the institution that houses them. It took me a while to figure out that they were not necessarily talking about Web search, but rather something like OAI-PMH, which facilitates resource discovery by harvesting exposed metadata from a variety of institutions so a user can do one search and obtain results from a variety of different institutions.

I went on to read the 2002 article about OAI and EAD to learn more about the two in relation to one another. These authors found that integrating EAD records into OAI would require the loss of a significant amount of detail in the EAD record, or the creation of multiple OAI records from one EAD record. They determine that even with its inherent limitations, such a compromise would be useful as an alternative access point. It certainly seems to me that it would be more so than MARC records of archival collections, at any rate. However, since Yahoo! and Google are no longer searching through the OAI-PMH protocols, would anyone find these records anyway? A librarian I spoke with yesterday (I am at TLA) said that you could instead convert your OAI aggregation into XML and then Google could find it. I wonder if this is anything other than a stopgap? Will OAI fall by the wayside anyway? And I don't know much about EAD and Google--finding aids and search engines were not covered in this book, much to my chagrin, so I will look for more of these for next week. I also have a couple more EAD/Web resources I will check out in part or in whole to see if they have additional information.