Koraljka Golub's Personal Home Page

Projects and Interests

Major areas of interest

Knowledge organization
Information retrieval
Digital libraries

Major projects

2017: Linnaeus University as a Unique iSchool (Role: Principal co-investigator)
2017: Creative attractive information landscapes for cultural events (Role: Principal co-investigator)
2016-2017: Digital Humanities Initiative (Role: Co-leader)
2010–2009: EASTER (Evaluating Automated Subject Tools for Enhancing Retrieval), UKOLN (Role: Principal investigator; Co-applicants: University of Glamorgan, City University, Dagober Soergel as consultant, and non-funding supporting partners; Amount: 280,000 GBP)
2008–2007: EnTag (Enhanced Tagging for Discovery), UKOLN (Role: Principal investigator; Co-applicants: University of Glamorgan, Intute, Science and Technology Facilities Council and non-funding supporting partners; Amount: 110,000 GBP)
2008–2007: TRSS (Terminology Registry Scoping Study), UKOLN (Role: Principal investigator; Co-applicants: University of Glamorgan and non-funding supporting partners; Amount: 45,000 GBP)
2008–2007: TILE (Towards Implementation of Library 2.0 and the E-framework), Sero Consulting Ltd. (Role: Project advisor)
2007–2003: European Delos Network of Excellence on Digital Libraries, Knowledge Extraction and Semantic Interoperability Cluster (Role: Research team member)
2006–2004: Alvis, Superpeer Semantic Search Engine, a European STREP project (Role: Research team member)
2005–2003: Intelligent Components of a Distributed Digital Library, Department of Electrical and Information Technology, Faculty of Engineering, Lund University, Sweden (Role: Research team member)
2005: The Humanities Subject Gateway, Royal School of Library and Information Science, Copenhagen, Denmark (Role: Project advisor)
2003–2002: Information and Knowledge Organization in the Electronic Environment, a Croatian government-funded project, Department of Information Sciences, Faculty of Philosophy, University of Zagreb, Croatia (Role: Research team member)
2002–2001: Curriculum Development in the Field of Information and Knowledge Organization, a TEMPUS project, Department of Information Sciences, Faculty of Philosophy, University of Zagreb, Croatia (Role: Research team member)

Projects

I have produced research consistently since 1999. While I have worked with a variety of interdisciplinary topics, my focus has been on subject information organisation in three major areas: automated subject indexing and classification; social tagging; and, methodology for evaluation of automated subject indexing/classification and social tagging in the context of end-user retrieval.

My doctorate from Lund University, Sweden, completed in 2007, was on the topic of automated subject classification. I explored the benefits of using a combined classification scheme and a thesaurus (Engineering Information thesaurus and classification scheme) in the process of automated subject classification of textual Web pages, and the benefits of applying it for hierarchical browsing of Web pages. While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches could be identified: machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); document clustering (algorithms for unsupervised document organization and automated topic extraction); and, string matching (algorithms that match given strings within larger text). The string-matching approach was tested using the Engineering Information thesaurus and classification scheme, containing pre-selected and pre-defined authorized terms, each corresponding to only one concept. Methodology included a log analysis, comparison against the gold standard, and an end-user retrieval study. Data collections used included over 80,000 Web pages and 35,000 bibliographic records. The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriate hierarchical structure, it would serve as a good browsing structure for the collection of automatically classified documents.

From December 2007 to July 2013, I worked for UKOLN, University of Bath, UK, where I had the role of a principal investigator on three projects related to knowledge organization systems:

1. EnTag (Enhanced Tagging for Discovery), which explored the potential of applying a controlled vocabulary for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Over 11,000 Intute metadata records in politics were used. There were 28 politics students who were each given 4 tasks in which a total of 60 resources were tagged in two different settings, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was Dewey Decimal Classification (DDC) with mappings from the Library of Congress Subject Headings (LCSH). The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology.

2. TRSS (Terminology Registry Scoping Study) dealt with terminology registries (TRs) as crucial parts of shared infrastructure services for resource discovery. The scope and options for a TR were derived, based on a review of published literature, existing TRs and a conducted survey involving 28 experts. Three main options for a TR are proposed: 1) TR provides metadata for each vocabulary and links to vocabulary provider; 2) TR provides metadata on any available terminology services and links to the services; 3) TR provides access to vocabulary content. Specific use cases for a TR, set in a lifecycle framework of controlled vocabularies, are given. Building on this, a core set of metadata elements is put forward. TRs can make their content available for both comfortable human inspection and for machine-to-machine access. Underlying standards for representation and identification of concepts, terms and vocabularies, as well as protocols, profiles and APIs are also addressed.

3. EASTER (Evaluating Automated Subject Tools for Enhancing Retrieval) aimed at methodology for evaluating different automated subject assignment tools. The rationale was that there are serious issues with existing mainstream research dealing with evaluation of automated subject metadata exist such as taking the gold standard approach against existing metadata. However, the gold standard should be thoroughly designed and built rather than just taken in the form found in an existing database such as a repository, or built ad hoc without detailed planning and quality control. Furthermore, more than just the gold standard should be used whenever possible in order to addresses different evaluation aspects and explore a variety of perspectives and contexts. A comprehensive evaluation would include the following: a) controlled gold standard creation, aligned with the purpose of the digital collection; b) a range of evaluation aspects and measures based on the gold standard; c) evaluation in the context of retrieval by end users; and, d) evaluation in the context of integration of the tool for automated subject assignment in an existing document processing workflow.

Contact Information

E-mail:

Skype:

LinkedIn: profile page

ResearchGate: profile page

GoogleScholar: profile page

OrcidID: 0000-0003-4169-4777