Wednesday 23 July 2008

Post-Meeting Thoughts

Meeting with - Paul Johnson (Subject & Learning Support Librarian) / Ian Haydock (Information Systems Manager)

Overview of the meeting: a general open meeting about metadata and the concepts of the project. There was an element of scoping the project with some input from their experiences from their own work with metadata.

They agreed that the following approaches may prove useful:
  • Organising a database structure for the documents in HIVE (see image) - representing the collection (e.g. folder and sub-folder structure that reflects faculties and depts.)
  • Using Dublin Core Schema (simple) for top level searching of docs.
  • For 'value' and textual searches - developing a tagging cloud for the document (entered in the 'Subject and Keywords' meta tag element - see image below)
  • Using the relation meta tag to link 'process' documents
Searchable Tag Clouds (see image left)
Tagging has a a few issues, not least that tagging can be person and context specific. It was suggested that it may be necessary ask validation authors to enter keywords on the documents (aiding data entry for future documents). This solution is useful - especially if you ask users to select keywords from a library (similar to a category list). This challenges the issues associated with tagging synonyms - where 2 users may enter 2 different words but describing the same attribute - for example- Blue vs. Azure. One additional concern is that recommended tag 'keywords' must be developed as extensively as possible at the start of the project - as retrospective tagging will be resource intensive.

Findability
It was discussed that some work may be necessary to develop a search form that allows users to interrogate the HIVE database - for example offering a series of 'keywords' to search against (known tag cloud) - or allow them to 'drill-down' using the faculty school structure.

Additional comment
It was mentioned that documents could have an expiry date - so it may be worth investigating whether validation documents have a 'use-by' date in respect of them reflecting current validation policies and procedures.

What next?
  • Speak to current or experienced members of staff who have been involved in the validation process - asking them about how their experiences of it.
  • Examine historic validation documents to build up a picture of categories and associated keywords that could be used to reflect the value of the documents (for example - common issues that validation panels encounter) - this has been started and is on-going

A question of drill-down

[Note: Back from leave]

I am going to have a meeting today with some library staff members about the issues of meta-data. One interesting point I will raise will be about whether meta-data is relevant for the purposes of this project. This may be an odd thing to say - but in terms of what the project is trying to do - it is worth noting that we are trying to 'record' the value of documents and not just catalogue them. This means that the data will be largely 'unknown' and 'fuzzy' - as it will be case-specific. Therefore, designing or using a strict metadata schema could prove difficult. However, we could still use some basic elements of the meta-data schema to record key document attributes - with half a mind to record 'flexible' data in more string/text-based elements (such as 'description'. I already alluded to this an as approach in my previous entry - through using tags (from slowly built up tag library) that you can use - delimited by a semicolon.

I will see what the library staff think about this.

In terms of drill-down - you could search for key metadata fields first and then through the use of tags use search query strings (text) such as contain "x" AND "x" NOT "x" - to find specific documents. Again - as stated in the previous post - simpler but not an elegant solution,

Thursday 3 July 2008

Simplicity Over Elegancy?

Have had some useful feedback via the JISC Repositories mail lists - some direct feedback and some information through reading threads.

Using Standard Metadata Schema
It seems useful (based on research) that using an already standard metadata schema will prove more helpful than trying to develop anything myself. It will still be the case that, after reviewing validation documents, it will be a task to work out which of the schema will offer the elements that are most useful to the project - for example Dublin Core vs LOM.

A issue of harvesting
One issue that has come out through comments/research is related to harvesting and presenting this data (from searches). This has a baring on what elements are to be used in a schema (some elements will/may be redundant) as this will influence how searchable the documents will be (insofar as the users can search for the 'value' in the documents).

There is an issue related to elegancy, in that (in terms of library standards) it is good practice to create or use elements that relate to the data you want record. The alternative is to use more 'semantic' keywords in a text string (for example in the element) that allows the metadata inputter to put in 'tags' that offer an insight into the 'valuable' content held in the document.

For example:
<--subject--> Geography;<--/subject-->
<--description-->Validation Report;<--/description-->
<--keywords-->staff development issue; handbook improvement issue;<--/keywords-->
<--relation-->http:/hiverepository/3347.doc
<--/relation-->(a related document = the orginal handbook)

This alternative is not necesarily a poor alternative as the harvester (program) can search through the schema elements for words entered into a search (text) string. The harvester itself only then uses the metadata schema elements as mark-up to assist in presenting the data fields to the user - much like database fields would be - for example, php apache online database search. This would be useful when linking documents together.

The user then has the flexibility to search semantically - for example "Geography validation report and handbook issues" if they were looking for a document that outlined a Geography validation process that encountered an issue with their handbook. The search would produce the document and 'hopefully' show a link to the related original handbook.

The only issue with this approach is that a 'tag' cloud needs to be available - that is a list of keywords that exist, which will help the user know what types of 'tags' exist (aiding searches).

Getting real: getting to grips with the problem domain

I have spent the last couple of weeks researching the following key areas:
  • HIVE - digital repository
  • Metadata schema for digital repositories
  • XML mark-up language for Metadata
All of these technologies allow documents to be held in a repository along side key data that informs the user of its value.

Contacting external agencies for existing real-world case studies
I contacted and had a reply back from Prof. Balbir Barn (a leading researcher into metadata and business processes) . I asked him about whether he was aware of any schema that would assist in meeting the challenges of this project. Although he could not give me any specific information, he provided links to the following projects:
These links are interesting but somewhat 'over-my-head' at the moment - but I will keep referring back to them as the project progresses. For note, both Chris Gray and Mark Stiles (both Staffs Uni) are referenced in the COVARM project.

Getting to grips with the problem domain

I have requested some 'real' validation documents from Chris Gray. I will be reviewing these documents and assessing the 'value' that is contained within them - which will hopefully offer some guidance on the nature of the metadata that needs to be represented to the user.

Following this, there could be the case for using these findings and asking academics / participants in the validation process to have their input into what metadata is to be recorded.

For note, I have joined the JISC repositories mail list.