What archives can learn from digital libraries

Thursday, April 23, 2009

metadata, taxonomies, and curation

read:

Wendy Duff, Evaluating Metadata on a Metalevel. Archival Science, Volume 1, Number 3 / September, 2001.

Lars Marius Garshol, Metadata? Thesauri! Taxonomies? Topic maps! Making Sense of it All. Journal of Information Science, 30(4) 2004.

Yan Han, A RDF-based Digital Library System, Library Hi Tech, 2006.

Lynne C. Howarth, Creating a Metadata-Enabled Framework for Resource Discovery in Knowledge Bases, University of Toronto eprint.

Adrian Cunningham, Digital Curation/Digital Archiving: A View from the National Archives of Australia, American Archivist, Volume 71, Number 2 / Fall/Winter 2008.

Sonia Yaco, It's Complicated: Barriers to EAD Implementation, American Archivist, Volume 71, Number 2 / Fall/Winter 2008.

Cornell Institute for Digital Collections, EAD/XML Finding Aids Project

I spent some time these past 2 weeks learning about the relationships between archival practice and other disciplines, and how finding aids and taxonomies reflect archival practice. Wendy Duff's article is a kind of reflection on philosophical underpinnings of archival practice and how variations in practice are reflected in the metadata chosen for 2 different schemes. Of course this is not directly relevant to finding aids. But ideally finding aids should represent the taxonomy of the collection, and allow the user to use metadata to home in on a particular object.

Duff identified the differences in the 2 schemas she looked at as an emphasis on providing metadata as evidence versus more traditional archival (as opposed to records management) concerns like custody and paper-based archival description. The evidence-based study (Pittsburgh) derived metadata automatically--a big contention of Dr Galloway's-- but it is easy to imagine how this would not necessarily be the kind of metadata useful for discovery. In fact this is much more of a purely digital item-level description--why would you even bother describing the finds? I wonder, is there a reason to try, if you can search the collection via metadata?

In information architecture we talk about representing taxonomies through navigation--would you not do something similar here? The archivist could still order the records, but since they are described more like a digital library, that is individually identifiable, alternative archival arrangements (i.e., for use) could be derived. Once again, when findability and access are NOT the only or even the primary concern of the organizational scheme, archives do not offer any easy solutions.

Garshol talks about was the relationship between taxonomies and metadata: metadata describes the digital object, and is connected to the taxonomy. The most relevant idea in his article, which is mostly about topic maps, is that even most non-archival metadata is about item administration, not describing content. The Dublin Core title field, for example, gives you an indication of the subject, but only indirectly. For more specific description he recommends thesauri, ontologies, and faceted description: these tools extend the metadata, and/or the relevant taxonomy, and allow concepts around the object to be placed in relation to one another to better encompass it.

The article on curation--by an archivist--takes issue with the two terms being used synonymously. He argues that the crucial difference is that the concept of curation does not permit the maintenance of the context necessary for digital records, nor does it address the "recordness" of the records, as evidence. To manage archival records requires " finely engineered metadata schemes" which represent the information contained AROUND the object: that of the event of creation. Like the finding aid, it is the job of archival metadata to represent the information which cannot be contained in the thing itself. He additionally criticizes both the curation model and the OAIS model for failing to account for problems with the records that date to before they cross the archival threshold. He says that this is therefore curation, of the records as objects, not archives, which begins while the record is still in use: hopefully generated automatically, as in the Pittsburgh model

Sonia Yaco's study about barriers to EAD implementation is my last follow-up from last week's reading. She found that the middleware necessary to make the marked-up finding aid accessible is itself a big problem. A study done by the hilariously named Michael J. Fox in 1999 found that 56% of institutions who do EAD then don't put it online. It seems at this point that there are enough instruction manuals and toolkits out there describing how to mark up finding aids: the question is what, then , to do with them after that. The XML is not the problem, at this stage: it's everything after that!

I am thinking that for my paper, the last part of my project, I will work on this kind of issue. I will go back to LBJ's finding aids, knowing what I know now, and see if I can make any suggestions about where they should invest their resources. It is perhaps overly simple in the sense that they don't have any born-digital records, but I am also interested in the digitization possibilities of such a collection as well. In addition, what kinds of digital asset management systems could be brought to bear here? For even LBJ at this point has lots of digital representations of records. What are the standards, how should they be managed, how should they be stored?

Saturday, April 11, 2009

EAD, part 2

Read: Encoding Across Frontiers. Stockting and Queyroux, eds, 2004.

Finally, a book that gets at what I really want to know about finding aids on the Web! I have been reading this, slowly, all week--it is a collection of papers from the European Conference on EAD in 2004. Actually, both the conference and the book are about EAD and EAC, which is Encoded Archival Context. The projects are discussions of real-life implementations of EAD and EAC at various stages, with discussions of the constraints, decisions, and paths not taken for projects all over Europe.

One of the ideas that resonated most with me relates to something we've been talking about in IA class--that structure conveys knowledge. This is a big part of the reason I am so interested in finding aids in the first place: the challenge of presenting a big, diverse, complex body of material with clarity and in multiple dimensions. One of the essays discusses the need for EAD to be modular and discrete, and to keep the record, creator, and function descriptions separate from one another. At first this is confusing, because of course these are the meat of any archival description. But in EAD it is the relationships of the these modular pieces to one another that convey the most information. This make so much sense! In this way the descriptions cease to be bulky, wordy things, but elements that can be combined and recombined to describe multiple features of multiple series. And of course, provide multiple points of access, as well.

The paper from Angelika Menne-Haritz about Midosa XML addressed this well, I thought. She described the job of the archivist as one of describing for the user the information around the records which is not represented within the records, such as conversations, that comprise the bulk of the process: she says that records are only "traces of their material part " (pg 89). A good archivist, and a good finding aid, can convey the information around these traces. She specifically talks about a variety of tools that can be used in a digital environment to do this, such as digital representations of records. Midosa in fact provides for the inclusion of such tools within the finding aid. And so, it seems to me, that once you include such interpretive elements finding aids become so much more than they are on paper. They move closer to digital libraries.

One of the concluding essays describes an unexpected effect of EAD implementation, that of greater collaboration between different types of institutions. I think this is because once the records become modular, and exist in the rootless environment of the Web, there is a much greater chance of connecting them in new and interesting ways. I think that's the beauty of EAD and especially of EAC: the promise at least of providing all the necessary context but at a much more granular level than you could get at in a non-digital environment. Instead of having to drill down, as in a traditional finding aid, you could theoretically locate the exact chunk you are looking for and then drill up, contextualizing as you go, adding in as much detail as you like.

And you can mix and match in a way never before possible. Archives are certainly more connected to one another through EAD, especially in Britain. The A2A federated catalogue of 44 different archives seems particularly promising, although the search is pretty clunky. I played with it a bit and turned up lots of hits on a search of "Ormonde" (as in the Earl of) but most of the early ones were from the same collection. I like the list of institutions offered on the right hand side of the page, and the advanced search is probably much better, but there really should be a "see more from this collection" kind of function, like the one Google has. I think search is probably the biggest challenge for many of these systems Aat least 2 articles in the book mention that their EAD catalogues have a "Google-style" search box--but that doesn't mean they retrieve as well!

Additionally, I don't know how well such a federated system would work with Google--do they have the same problem as OIA-PMH? I Googled "Ormonde" and got some of the same results from the National Archives. But this page was much cleaner, and did not have the same duplication. Perhaps the assumption is that if you are using the A2A you are looking for a more granular level. Frustratingly, though, the National Archives results have lots of see-also "record references" and "other references": but these are not hyperlinked. And "other references" is NOT a helpful term!

So I think EAD seems to offer huge strides in online display of finding aids, but users are still very constrained by the software used in its implementation. The focus in the books I've read so far has been on the markup of legacy finding aids, and there is no question that EAD offers the best and most realistic way to put functional finding aids online. But the markup alone doesn't provide access--and the some of the other pieces seem to be lagging behind.

Wednesday, April 1, 2009

EAD, part 1

read:

Encoded Archival Description on the Internet, Daniel Pitti and Wendy Duff, eds. 2001.

"Using the open archives initiative protocols with EAD," Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 2002.

It took me a long time to work through Encoded Archival Description on the Internet, as it is very dense. Much of the book is an overview of the creation of the EAD standards and explanation of how EAD will facilitate resource discovery as compared to the traditional finding aid. There seems little to argue about here: clearly a finding aid marked-up and posted on the internet is more accessible than one that is sitting on a shelf in the archives.

There is also a certain amount of discussion in Encoded Archival Description on the Internet about MARC records for archival materials, which integrate it into the rest of the library catalog, albeit at the collection level. I know Dr Galloway, for one, is positively disdainful of the utility of this. Certainly, there is so little information on a MARC record, comparatively, it is difficult to see how it could be truly helpful. If you are looking for information related to the creator of the collection, chances are you already know where his/her records are likely to be. If not, a quick Google search will tell you. The heterogeneity of the collection is the thing that needs to be conveyed for the record to be useful, and that cannot be done in the MARC format.

As usual, Anne Gilliland-Swetland's piece at the end of the book--alas, I read it in order!--was the most illuminating. She talks about the three reasons for finding aids, and how each of them are related to EAD. Access, which I have been thinking of pretty much exclusively, is only one of them. One of the reasons her article is useful is that she discusses at length research that Marcia Bates has done as to how researchers use the finding aid. She includes techniques like berrypicking, browsing, name searching, etc. Gilliland-Swetland then talks about these in the context of EAD. After all the vague discussion in the rest of the book about how EAD will facilitate access, actually seeing research done on the subject was very useful!

A major recurring theme in all the articles in the book is that of what they call union discovery, of standardized markup so that a researcher can search across EAD platforms and discover records regardless of the institution that houses them. It took me a while to figure out that they were not necessarily talking about Web search, but rather something like OAI-PMH, which facilitates resource discovery by harvesting exposed metadata from a variety of institutions so a user can do one search and obtain results from a variety of different institutions.

I went on to read the 2002 article about OAI and EAD to learn more about the two in relation to one another. These authors found that integrating EAD records into OAI would require the loss of a significant amount of detail in the EAD record, or the creation of multiple OAI records from one EAD record. They determine that even with its inherent limitations, such a compromise would be useful as an alternative access point. It certainly seems to me that it would be more so than MARC records of archival collections, at any rate. However, since Yahoo! and Google are no longer searching through the OAI-PMH protocols, would anyone find these records anyway? A librarian I spoke with yesterday (I am at TLA) said that you could instead convert your OAI aggregation into XML and then Google could find it. I wonder if this is anything other than a stopgap? Will OAI fall by the wayside anyway? And I don't know much about EAD and Google--finding aids and search engines were not covered in this book, much to my chagrin, so I will look for more of these for next week. I also have a couple more EAD/Web resources I will check out in part or in whole to see if they have additional information.

Friday, March 20, 2009

interactivity

Read/viewed:

Bearman D, Trant J. Interactivity comes of age: museums and the World Wide Web. Museum International. October 1999;51(4):20-24.

"The Wiki and the digital library," Jeremy Frumkin, OCLC Systems & Services. 2005, Volume 21 Issue 1.

Online Finding Aids: Are They Practical? CJ Hostetter - Journal of Archival Organization, 2004 Vol. 2, p 107.

Abdication or empowerment? User involvement in library, archives and records services.Preview By: Robinson, Leith. Australian Library Journal, Feb2007, Vol. 56 Issue 1, p30-35, 6p.

Encoded Archival Description on the Internet (Wendy Duff and Daniel Pitti)

I have been looking at EAD and interactivity for the last couple of weeks (EAD discussion pending)...Many of the articles take EAD as matter of course, as the best available finding aid format, although everyone seems dubious about MARC. The Bearman/Trant piece is more specifically about museums, following along from my last post, and at this point is 10 years old. But in addition to discussion of cool new (at the time) interactive features in museums--although many of them that I looked at have not progressed much beyond the initial (and probably NSF-grant funded, at this time?) efforts, which is disheartening--they also mention the opportunity for restructuring that Web representation provides. This does not ring true at the LBJ library, where the webmasters are a separate entity from the archives department, rather than integrated into it--I think they underestimate the intimidation factor for professionals who do not have the requisite skill sets. I think the reason for the dearth of good online finding aids is very often the result of this kind of disjuncture.

Frumkin's piece on digital libraries and wikis is also pretty dated, mostly about Ask a Librarian (and other virtual reference) and additions to content management systems. For mys purposes the discussion of user-annotated finding aids are more interesting, although he glides over what I think is the fundamental issue with wikis in this context, that we are NOT talking about a Wikipedia-level worldwide community of users, but rather a few people, some of whom may be highly expert but the rest not. Would individual commentary, reviewed by archivists, not be more appropriate in the context? But how will researchers feel to be, god forbid, corrected, if necessary, by archivists? And will they jealously guard their research? At LBJ we have a surprising number of researchers who check No, they do not want to be contacted by other researchers interested in their topic. I think there is some question as to whether people will want to contribute, as useful as the product of these contributions would be.

In her report Robinson is dubious about user involvement in libraries and archives, on the grounds that it will worsen the digital divide, threaten staff, isolate patrons, and create security problems for digital documents (this is an eminently solvable problem!). Like much of the digital library readings I did for the Management of Digital Libraries class, she comes down on services as the way for libraries to make their presence felt and appreciated in the digital environment.

The Hostetter 2002 article, "Online Finding Aids: Are They Practical?" is one of the most on-point that I have come across in readings for this project, and it raises a number of interesting issues. She administered a survey to archivists in 20 institutions about finding aids and discusses the results. It is very helpful in illuminating how archives use online finding aids in the real world, and what they see as the opportunities and challenges therein. She found that EAD is the markup of choice (no surprise) and that the biggest challenges to improvement of finding aids are time, money and staff (also no surprise). She references the Tibbo/Maho article I read earlier on the difficulty of finding EAD finding aids with search engines, information she hints that archivists are unaware of. She does discuss lots of potential benefits to online finding aids--like increased donations--which are interesting, but still the archivists she interviewed still to me seem to be inordinately attached to the notion that finding aids are more useful that digital collections, when I have seen lots of evidence to the contrary. Perhaps because archivists are thinking in terms of PhDs and other serious researchers?

I think the most surprising and controversial idea Robinson encounters is that access may not be an unmitigated good. Archivists said things like increased use increases demands on archivists' time (isn't that what they're there for?) and the Web leads neophyte researchers to believe that they will find good stuff in the archives relatively easily. This is the first time I have seen this issue laid bare. Archivists, knock it off! None of the archivists interviewed mention user studies, everything is discussed from an institutional effectiveness perspective, which seems to me to give us a good idea of what's wrong with archives.

For next week:
EAD
including Smith I. Preparing Locally Encoded Electronic Finding Aid Inventories for Union Environments: A Publishing Model for Encoded Archival Description. Information Technology & Libraries [serial online]. June 2008;27(2):26-30.

Saturday, March 7, 2009

curation: museums and archives

read/viewed:

Westbrooks, Elaine L.. "African-American documentary resources on the World Wide Web: a survey and analysis." Archival Issues 24.2 (1999): 145-73.

Karp C. Digital Heritage in Digital Museums. Museum International [serial online]. May 2004;56(1/2):45-51.

Anani N. Sustainable engagement in digital heritage– The challenges of learning environments for heritage institutions. Museum International [serial online]. May 2005;57(1/2):142-143

Wechsler H, Ledbetter E. The Nazi-Era Provenance Internet Portal. Museum International [serial online]. December 2004;56(4):53-62.

Bowen J. The virtual museum. Museum International [serial online]. January 2000;52(1):4-7.

The Westbrooks piece (1999) is interesting in part because she lambastes DLs for not sufficiently utilizing archival principles, and archivists for their insufficient understanding of the Web environment. Because of the lack of concern for archival principles like authenticity and reliability(e.g., DLs that do not include information about the original object, or the scanning environment, or the provenance) Westbrooks argues that most of the African-American heritage DLs she analyzed are inappropriate for use at an academic or scholarly level. She says that they are useful for access--for people who could not otherwise visit the collections--for K-12 and the like.But she rates them in many cases as not worthy of the money spent in their development! While I have certainly noted the K-12 bent of many of the collections, I had not related it to a lack of metadata. I tend to think of metadata in terms of IR, but of course there is much more to it that that. I did keep thinking of Cliff Lynch's effort to separate digital librarians from curators--Westbrook seems to think that both librarians and archivists are not concerned enough with curation.

On that note I turned to the issues of Museum International that Ian Anderson had mentioned, to see what kinds of issues they were grappling with in digital collections. Karp examines the definition of "virtual museum" and how it differs from a physical museum--I am reminded of Michael Lesk talking about how the term "digital library" is doomed to go the way of "horseless carriage." Karp comes up with IP and ownership, and curation--long-term preservation and care of the digital objects. Migration, bitstream preservation etc, what I think of as archival concerns.

Anani talks about Web 2.0 features for museums, like pre- and post-visit forums, and how such features might foster the development of new groups of users. Bowen mentions the 24 Hour Museum, (now Culture24) a collection of UK museum virtual collections and other related stuff in 1 place. I wonder how it works--I bet its not autopopulating from the different Web sites (via a OAI-PMH-like interface) but that would be cool to find out. I will look around a bit. Finally, the piece about the Web portal for Nazi-era art turned out to not be particularly relevant, but it was interesting anyway. A DL with an entirely different purpose, where provenance was the most important facet of the objects.

Add to list (maybe) more Yakel:
Yakel, E., & Kim, J. (2003). Midwest State Archives on the Web: A Content and Impact Analysis. Archival Issues, 28(1), 47-62.

For sure, 1 last museum piece:

Bearman D, Trant J. Interactivity comes of age: museums and the World Wide Web. Museum International [serial online]. October 1999;51(4):20-24.

Saturday, February 28, 2009

implementation

Read/viewed:
Beth Yakel and Polly Reynolds, "The Next Generation Finding Aid..." Case study from New Skills for a Digital Era workshop, June 2006: http://rpm.lib.az.us/NewSkills/CaseStudies/8_Yakel_Reynolds.pdf
Helen Tibbo and Lokman Meho, "Finding Finding Aids on the World Wide Web," American Archivist Volume 64, Number 1 / Spring-Summer 2001
Creating the Next Generation of Archival Finding Aids http://www.dlib.org/dlib/may07/yakel/05yakel.html
Burt Altman and John Nemmers, "The Usability of On-line Archival Resources: The Polaris Finding Aid" American Archivist, vol 64 spring/summer 2001.

http://polarbears.si.umich.edu/
http://pepper.cpb.fsu.edu/collection/

(Note to Megan: I have not been posting as much as I would like--I've been moving across town--but I intend to pick up the pace from here on, and will keep working over spring break as well to make sure I cover everything I want to this semester.)

I decided this week to delve into what is actually being done in the world of online finding aids, and of course Michigan's Polar Bear Project is (was?) the benchmark for next generation finding aids. I know I've read one of the Yakel articles before, but I think it was early enough in my iSchool career that I didn't know what EAD was or understand the implementation challenges they faced. The reuse of metadata in this project from EAD, MARC, and a database of the soldiers seems particularly tricky--and useful. I intend to play with EAD more in the coming weeks, and do more research to see if, as it sounds, EAD can be extended as in this project, and will therefore continue to be used. It does seem like so many institutions have so much invested in it that it would really need to prove to be sorely lacking before being discarded altogether.

Meanwhile, the POLARIS project describes what would probably be a lot more feasible for a library like LBJ--because here the finding aids or online, but the collections are not. Several times in the Altman piece he mentions that users wanted and expected--that it was necessary to clarify for them--that the objects themselves be digitized, not just the finding aids. And one the goals for the next stage (as of 2001) is the digitization of the collections, which has not yet been done--unsurprisingly given the size of the collection. For 2009 the Pepper collection is about where most archives are, in my experience. Unlike the Polar Bear Project, the online finding aid is still best used as a tool for research before visiting. It saves time onsite, but the outreach potential--Pepper uses archives-specific terms exclusively and provides few if any images--is very minimal.

Sunday, February 15, 2009

users and accessibility

Read:

Gustman, S., Soergel, D., Oard, D., Byrne, W., Picheny, M., Ramabhadran, B., and Greenberg, D. 2002. Supporting access to large digital oral history archives. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (Portland, Oregon, USA, July 14 - 18, 2002). JCDL '02. ACM, New York, NY, 18-27

Improving archives-library relations: User-centered ...By: Maher, William J.. Journal of Academic Librarianship, Jan1990, Vol. 15 Issue 6, p355, 9p;

"Primarily History: Historians and the Search for Primary Source Materials, "Helen R. Tibbo

Lee, H. and Smeaton, A.F. (2002). "Designing the User Interface for the Físchlár Digital Video Library." J. Digital Info. 2(4),

I have been looking at articles about researchers find their sources--primary and secondary--and whether the behaviors are specific to institutions. Tibbo's article was especially helpful: she found that (as of 2002) historians were using the Web to find the Web sites of repositories, and then contacting the location directly. She suggests that archivists are relying upon librarians to provide access to archival collections, and that they are failing to do so--I wonder if in fact archivists still believe this is someone else's job? Tibbo also found that researcher want finding aids online, but usually printed them out--but a frequent request is for more materials to be digitized and made available.

The Gustman piece suggests both possibilities and problems with creating digital collection of archival material. The paper details the creation of a digital library containing 116,000 hours of digitized video interviews in 32 languages from 50,000 Holocaust survivors. Oral histories seem to be a good candidate for digitization: the LBJ library has a small number of them online and they are hugely popular. Perhaps because of the narrative format they are less reliant on context than most other records? They help provide context, and also make easy-to-digest primary sources for teachers' use, for example. Once they're online, the oral histories can be linked to other documents--perhaps even to a finding aid?-- for context, and/or to illustrate points made in the interview. As Gustman et.al. point out also, they also provide an excellent opportunity for the development of Web 2.0 tools like user-created collections which can be visible to others. The Físchlár video browser suggests another possibility for Web 2.0 development, a recommender system--perhaps drawn from similar research done by other users of the collection.