Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Tools

GHORBANI, Mahboubeh and TORKASHVAND, Fattaneh (2018) Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications. Paper presented at: IFLA WLIC 2018 – Kuala Lumpur, Malaysia – Transform Libraries, Transform Societies in Session 115 - Subject Analysis and Access.

Bookmark or cite this item: https://library.ifla.org/id/eprint/2215

Preview

PDF (201kB)

Language: English (Original)

Available under licence Creative Commons Attribution.

Bookmark or cite this item: https://library.ifla.org/id/eprint/2215/1/115-ghorbani-en.pdf

Abstract

English

Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Persian reading and writing are associated with some difficulties due to specific features of this language. this paper attempts to examine automated indexing experiences, lessons, and outcomes of Persian language documents to provide effective solutions for improvement of indexing and retrieval of them. The most important problems in Persian language and script in automatic indexing include selection of an appropriate keyword, building a vocabulary, Semantic, Verb and word sense ambiguities in the sentences, Spaces and Pseudo-spaces in Persian script, isolated and cursive writing, morphology of Persian language, typographical and spelling errors. Removing the stop words, pre-processing of characters and script, identifying the boundaries of words, equalizing different spellings, the automatic stemming, Weighting and scoring of words, Detection of phrasal verbs and compound phrases, Spellchecking through creating morphological or even syntactic spellcheckers design of a corrector and proposer system , developing an infrastructure database for Persian language and script usage are solutions proposed to facilitate the automatic indexing of Persian texts.

Item Type:

Conference or Workshop Item (Paper)

Conference details:

IFLA WLIC 2018 – Kuala Lumpur, Malaysia – Transform Libraries, Transform Societies

Session 115 - Transforming Libraries via Automatic Indexing - Subject Analysis and Access

Sunday 26 August 2018

Related URLs:

Congress website

Divisions:

Division 3 Library Services > Subject Analysis and Access

Authors:

Name	Affiliation	Country
GHORBANI, Mahboubeh	National Library and Archives of Iran	Iran, Islamic Republic of
TORKASHVAND, Fattaneh	National Library and Archives of Iran	Iran, Islamic Republic of

Uncontrolled Keywords:

Persian language; automated indexing; storage & retrieval

Date Deposited:

19 Jul 2018 13:45

Last Modified:

19 Jul 2018 13:45

URI:

https://library.ifla.org/id/eprint/2215

FOR IFLA HQ (login required)

Edit item

Search form

Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Abstract

Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Session 115 - Transforming Libraries via Automatic Indexing - Subject Analysis and Access

FOR IFLA HQ (login required)