An ontology based fully automatic document classification system using an existing semi-automatic system

Tools

WIJEWICKREMA, Chaaminda Manjula and GAMAGE, Ruwan (2013) An ontology based fully automatic document classification system using an existing semi-automatic system. Paper presented at: IFLA WLIC 2013 - Singapore - Future Libraries: Infinite Possibilities in Session 112 - Classification and Indexing.

Bookmark or cite this item: https://library.ifla.org/id/eprint/159

Preview

PDF (356kB)

Language: English (Original)

Available under licence Creative Commons Attribution.

Bookmark or cite this item: https://library.ifla.org/id/eprint/159/1/112-wijewickrema-en.pdf

Preview

PDF (770kB)

Language: Chinese (Translation)

Available under licence Creative Commons Attribution.

Bookmark or cite this item: https://library.ifla.org/id/eprint/159/7/112-wijewickrema-zh.pdf

Preview

PDF (672kB)

Language: Spanish (Translation)

Available under licence Creative Commons Attribution.

Bookmark or cite this item: https://library.ifla.org/id/eprint/159/13/112-wijewickrema-es.pdf

Abstract

An ontology based fully automatic document classification system using an existing semi-automatic system

Automatic classification of documents has become an important research area due to the exponential growth of digital content and because manual or semi-automatic organization is not effective. On one hand, manual and semi-automatic classification is very painstaking and labor-intensive. On the other hand, misclassifications due to vagueness of the documents and classification schemes are inevitable in these two methods. Hence, the current study sought to shed a light on these issues. This research proposes an automated system that can completely classify a given text document by minimizing the vocabulary ambiguities. One of our previous studies has developed a semi-automatic system for document classification and here we propose to extend it furthermore to obtain a fully automatic document classification system.

利用现有的半自动分类系统开发基于本体的全自动文档分类系统

由于数字内容的指数增长和手动组织、半自动组织的非高效性，文档自动分类已经成为一个重要的研究领域。一方面，手动和半自动分类需耗费大量精力并且是劳动密集型，另一方面，这两种方法中由于文档的模糊性和分类表所带来的误分类不可避免。因此，本研究试图解决这些问题。本研究提出一个自动化系统，这个自动化系统完全可以通过最小化词汇歧义为一个给定的文本文档进行分类。我们前期已经开发了一个半自动文档分类系统，这里对其进一步优化以获得一个全自动的文档分类系统。

Una ontología basada en un sistema de clasificación de documentos totalmente automático utilizando un sistema semiautomático existente

La clasificación automática de documentos se ha convertido en un área de investigación muy importante, debido al crecimiento exponencial de los contenidos digitales y a que la organización manual o semiautomática no es especialmente eficaz. Por una parte, la clasificación manual y semiautomática es muy minuciosa y laboriosa. Por otro lado, son inevitables en estos dos métodos los errores de clasificación debidos a las imprecisiones de los documentos y de los esquemas de clasificación. Por tanto, el presente estudio trata de arrojar luz sobre estas cuestiones. Esta investigación propone un sistema automatizado que pueda realizar una clasificación completa de un documento de texto minimizando las ambigüedades del vocabulario. Uno de nuestros estudios anteriores ha desarrollado un sistema semiautomático para la clasificación de documentos y aquí proponemos extenderlo algo más, para obtener un sistema de clasificación de documentos totalmente automático. Palabras clave: Clasificación automática, clasificación textual, Ontología, función de frecuencia de término -tf idf

Item Type:

Conference or Workshop Item (Paper)

Conference details:

IFLA WLIC 2013 - Singapore - Future Libraries: Infinite Possibilities

Session 112 - Subject access: Infinite possibilities - Classification and Indexing

Monday 19 August 2013 16:00 - 18:00 | Room: Exhibition Hall 404-405 | SI

Track 5: Ideas, innovations, anticipating the new

Related URLs:

Congress website

Divisions:

Division 3 Library Services > Subject Analysis and Access

Authors:

Name	Affiliation	Country
WIJEWICKREMA, Chaaminda Manjula	Main Library, Sabaragamuwa University of Sri Lanka, Belihuloya	Sri Lanka
GAMAGE, Ruwan	National Institute of Library and Information Sciences, University of Colombo, Colombo	Sri Lanka

Translators:

Chinese version
Translators Affiliation Country
ZHANG, Shinan National Science Library, Chinese Academy of Sciences China
Spanish version
Translators Affiliation Country
JIMÉNEZ HUERTA, Pascual Biblioteca Nacional del España Spain

Uncontrolled Keywords:

Automatic classification, Text classification, Ontology, tf-idf weight function

Date Deposited:

04 Jul 2013 09:13

Last Modified:

18 Feb 2015 16:38

URI:

https://library.ifla.org/id/eprint/159

FOR IFLA HQ (login required)

Edit item

Search form

An ontology based fully automatic document classification system using an existing semi-automatic system

Abstract

An ontology based fully automatic document classification system using an existing semi-automatic system

利用现有的半自动分类系统开发基于本体的全自动文档分类系统

Una ontología basada en un sistema de clasificación de documentos totalmente automático utilizando un sistema semiautomático existente

Session 112 - Subject access: Infinite possibilities - Classification and Indexing

FOR IFLA HQ (login required)