Date Available
8-5-2014
Year of Publication
2014
Degree Name
Master of Science (MS)
Document Type
Master's Thesis
College
Engineering
Department/School/Program
Computer Science
First Advisor
Dr. Jerzy W. Jaromczyk
Abstract
Information retrieval aims to extract from a large collection of data a subset of information that is relevant to user’s needs. In this study, we are interested in information retrieval in Arabic-Language text documents. We focus on the Arabic language, its morphological features that potentially impact the implementation and performance of an information retrieval system and its unique characters that are absent in the Latin alphabet and require specialized approaches. Specifically, we report on the design, implementation and evaluation of the search functionality using the Vector Space Model with several weighting schemes. Our implementation uses the ISRI stemming algorithms as the underlying stemming technique and the general Arabic stop word list for building inverted indices for Arabic-language documents. We evaluate our implementation on a corpus consisting of selected technical papers published in Arabic-language journals. We use the Open Journal Systems (OJS) from the Public Knowledge Project as a repository for the corpus used in the evaluation. We evaluate the performance of our implementation of the search using a classic recall/precision approach and compare it to one of the default multilingual search functions supported in the OJS. Our experimental analysis suggests that stemming is an effective technique for searches in Arabic-language texts that improves the quality of the information retrieval system.
Recommended Citation
Albujasim, Zainab Majeed, "Search Queries in an Information Retrieval System for Arabic-Language Texts" (2014). Theses and Dissertations--Computer Science. 23.
https://uknowledge.uky.edu/cs_etds/23