Thursday 3 July 2008

Simplicity Over Elegancy?

Have had some useful feedback via the JISC Repositories mail lists - some direct feedback and some information through reading threads.

Using Standard Metadata Schema
It seems useful (based on research) that using an already standard metadata schema will prove more helpful than trying to develop anything myself. It will still be the case that, after reviewing validation documents, it will be a task to work out which of the schema will offer the elements that are most useful to the project - for example Dublin Core vs LOM.

A issue of harvesting
One issue that has come out through comments/research is related to harvesting and presenting this data (from searches). This has a baring on what elements are to be used in a schema (some elements will/may be redundant) as this will influence how searchable the documents will be (insofar as the users can search for the 'value' in the documents).

There is an issue related to elegancy, in that (in terms of library standards) it is good practice to create or use elements that relate to the data you want record. The alternative is to use more 'semantic' keywords in a text string (for example in the element) that allows the metadata inputter to put in 'tags' that offer an insight into the 'valuable' content held in the document.

For example:
<--subject--> Geography;<--/subject-->
<--description-->Validation Report;<--/description-->
<--keywords-->staff development issue; handbook improvement issue;<--/keywords-->
<--relation-->http:/hiverepository/3347.doc
<--/relation-->(a related document = the orginal handbook)

This alternative is not necesarily a poor alternative as the harvester (program) can search through the schema elements for words entered into a search (text) string. The harvester itself only then uses the metadata schema elements as mark-up to assist in presenting the data fields to the user - much like database fields would be - for example, php apache online database search. This would be useful when linking documents together.

The user then has the flexibility to search semantically - for example "Geography validation report and handbook issues" if they were looking for a document that outlined a Geography validation process that encountered an issue with their handbook. The search would produce the document and 'hopefully' show a link to the related original handbook.

The only issue with this approach is that a 'tag' cloud needs to be available - that is a list of keywords that exist, which will help the user know what types of 'tags' exist (aiding searches).

No comments: