Sentiment analysis in Arabic: A review of the literature

1. Introduction

Sentiment Analysis (also called Opinion Mining) refers to the use of Natural Language Processing and Machine Learning to identify and extract subjective information in a piece of writing. Sentiment Analysis is extremely useful as it enables us to gain an overview of the wider public opinions or attitudes towards certain topics, products or services.

Today, with the proliferation of reviews, ratings, recommendations and other forms of expression thanks to the emergence of the participative Web, Sentiment Analysis has become one of the essential research fields whose application is clearly visible in a variety of domains such as politics, commerce, tourism, education and health.

Given the importance of Sentiment Analysis, many research works have been devoted to this research area. However, most of these studies have focused on English and other Indo-European languages. Very few studies have, actually, addressed Sentiment Analysis in morphologically rich languages such as Arabic. Nevertheless, given the increasing number of Arabic internet users and the exponential growth of Arabic online content, Sentiment Analysis in this language has gained the attention of many researchers in the last decade.

The objective of this paper is, therefore, to review the major studies that have been conducted on Sentiment Analysis in Arabic. The rest of this paper is organized as follows. Section 2 gives background information about Sentiment Analysis. Section 3 outlines the linguistic characteristics of Standard Arabic. Section 4 describes the challenges of Sentiment Analysis in Arabic. Section 5reviews the most important research studies carried out on Sentiment Analysis in Arabic. In Section 6, the findings are discussed and we conclude in Section 7.

2. Sentiment analysis: Background information

Sentiment Analysis is a research field that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes [1].

In Sentiment Analysis [1], an opinion is defined by making reference to a quintuple (o; a; so; h; t) that consists of:

–
Object ‘o’, which is the opinion target. It can be a product, service, topic, issue, person, organization, or an event.
–
Aspect ‘a’, which is the targeted attribute of the object ‘o’.
–
Sentiment orientation ‘so’ that indicates whether an opinion is positive, negative or neutral.
–
Opinion holder ‘h’, which is the person or organization that expresses an opinion.
–
Time ‘t’: the moment in which this opinion is expressed.

Sentiment Analysis is a highly challenging research area that involves different complex tasks. The most studied tasks are subjectivity classification, sentiment classification, lexicon creation, aspects extraction, aspect sentiment classification and opinion spam detection.

2.1. Levels of sentiment analysis

Sentiment Analysis can be investigated at three levels of granularity, namely document level, sentence level, and aspect level.

At the document level, the whole piece of writing is dealt with as one unit and is assigned to a positive, negative or neutral class. This level of analysis supposes that each document expresses an opinion on a single entity and has only one opinion holder. Therefore, it cannot be applied to documents that evaluate or compare multiple entities.

At the sentence level, Sentiment Analysis aims to identify whether the sentence holds an opinion or not, and to evaluate the sentiment orientation of subjective sentences. This level of granularity is more challenging by the fact that the sentiment orientation of words is highly context-dependent. Sentence level sentiment classification also deals with comparative and sarcastic sentences.

Aspect level performs finer-grained analysis. Its purpose is to discover the quintuple (o; a; so; h; t) of the opinion. The two key tasks in this level are aspect extraction and aspect sentiment classification. In aspect extraction, the attributes are identified. In aspect sentiment classification, the sentiment orientation of different aspects are defined. A positive opinion on an object can be positive on just an attribute of the object, but not on the object as a whole.

2.2. Approaches

Sentiment Analysis is usually carried out by using three typical approaches: supervised, unsupervised and hybrid.

The supervised approach is based on machine learning algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), Artificial Neural Networks(ANN) and K-Nearest Neighbour (KNN). This approach requires a large labelled data to train a classifier or a set of classifiers. The training process aims to build a classification model capable of predicting the sentiment of new texts. However, labelled data are often unavailable or a lot of efforts are required to annotate the collected data.

By contrast, the unsupervised approach makes use of the words and their corresponding sentiments. Each word in the lexicon has a score that represents its polarity, namely positive, negative or neutral. The lexicon could be created from existing dictionaries or from a corpus. Unlike the supervised approach, the unsupervised model does not require labelled datasets but it needs a wide-coverage lexicon that covers a maximum number of sentiment words.

Rather than using one of the above-mentioned approaches, some research studies have opted for a combination of both approaches. The new model, termed semi-supervised or weakly supervised approach, uses a great amount of unlabelled data and partially annotated data to build better classifiers. The system classifies unlabelled data using a supervised classifier trained with labelled data.

3. Arabic language

Given that Sentiment Analysis depends greatly on the morphology of the language being analysed, our objective in the present section is to give a brief description of Arabic and to detail the linguistic features that make this language one of the most challenging varieties for Sentiment Analysis researchers.

Arabic is one of the six official languages of the United Nations. It is the official language of 27 countries and is spoken by more than 422 million people in the Arab world. On the web, Arabic is ranked the fourth mostly used language and the fastest growing during the last five years with a growth rate of 6091.9% in the number of Internet users.

Arabic has three main varieties: Classical Arabic; which is the language of the Qur’an (Islam’s Holy Book); Modern Standard Arabic (MSA) and dialectical Arabic. MSA the most eloquent Arabic language variety used in writing and in most formal speech. Dialectical or colloquial Arabic refers to all oral varieties spoken in daily communication. These vary from one Arab country to another and from one region of the same country to another.

3.1. Arabic orthography

Unlike Latin languages, Arabic is written from right to left and is distinguished by the absence upper or lower cases. Its alphabet includes 28 letters: 25 consonants and only 3 vowels. But in addition to these vocal segments, the Arabic script uses diacritical marks as short vowels. These are placed either above or below the letters to provide the correct pronunciation and clarify the meaning of the word. The majority of MSA texts are written without short vowels. This is so because proficient speakers do not need diacritical marks in order to understand a given text. However, diacritical marks are often used in children’s books as well as books for Arabic learners. The absence of diacritical marks in the majority of texts presents a lexical ambiguity problem that challenges computational systems. For example the undiacritized word شعر may mean (شِعْرٌ poetry), (شَعْرٌ hair) or (شَعَرَ to feel).

3.2. Arabic morphology

Arabic language has a very complex and rich morphology in which a word may carry important information. As a space delimited token, a word in Arabic reveals several morphological aspects: derivation, inflection, and agglutination.

3.2.1. Derivational morphology

Derivational morphology is the mechanism of creating a new word based on an existing word with a part of speech possibly different, e.g. in English the adjective “weekly” is derived from the noun “week”.

Like other Semitic languages, Arabic Morphology consists of a root-and-pattern representation. All Arabic words are based on a “root”, which is a sequence of consonants that contain the base meaning of the word. The vowels and non-root consonants are added following specific patterns to create a variety of related words. For example, the three letters “ktb” is a root that means, “write”. If it is put in the pattern “1a2a3a” (kataba, كَتَبَ) where the numbers correspond to the root letters. By adding the long vowel (“a:” after the first letter we get a new pattern “1a:2a3a” and a new verb (ka:taba, كَاتَبَ) which means “correspond”. Table 1 lists some words derived from the root “ktb” together with their meanings.

Table 1. Words derived from the root “ktb”.

Words derived from the root		Pattern	Meaning
كَتَبَ	kataba	1a2a3a	Write
كَاتَبَ	ka:taba	1a:2a3a	Correspond
مكتب	maktab	Ma12a3	Desk
كُتُب	Kutub	1u2u3	Books
كاتِب	ka:tib	1a:2i3	Writer

3.2.2. Inflectional morphology

Inflectional morphology defines the variation of a word to describe the same meaning in different grammatical categories (e.g. in English: write, wrote, written). The set of these inflected word-forms is called a lexeme class. To represent the lexeme, a lemma, which is a particular form, is conventionally selected.

In Arabic, words inflect for seven categories: tense (past and present), person (1st, 2nd and 3rd), number (singular, dual and plural), gender (feminine and masculine), case (nominative, accusative and genitive), mood (indicative, imperative, subjunctive, jussive and energetic), and voice (active and passive). Table 2 presents the inflection of the verb “ktb” (write) depending on tense, person, number and gender.

Table 2. Inflection of the verb ‘ktb’ (write).

Empty Cell	Past		Present
1st person Singular	Katabtu	كَتَبْتُ	Aktubu	أَكْتُبُ
1st person Plural	Katabna:	كَتَبْنَا	Naktubu	نَكْتُبُ
2nd person, Masculine Singular	Katabta	كَتَبْتَ	Taktubu	تَكْتُبُ
2nd person Feminine Singular	Katabti	كَتَبْتِ	Taktubin	تَكْتُبِين
2nd person Dual	Katabtuma:	كَتَبْتُمَا	Taktuba:n	تَكْتُبَان
2nd person Masculine Plural	Katabtum	كَتَبْتُم	Taktubu:n	تَكْتُبُون
2nd person, Feminine Plural	Katabtunna	كَتَبْتُنَّ	Taktubunna	تَكْتُبنَّ

3.2.3. Agglutinative morphology

Arabic is an agglutinative language, which means that the word may be attached a set of clitics (affixes). These clitics are divided into 4 classes (cf. Table 3) and apply to a word base in a strict order: $CONJ + PART + DET + BASE + PRON$

Table 3. Classes of Arabic clitics.

Conjunction proclitics	+“و”	“w”	And
Conjunction proclitics	+“ف”	“f”	Then

Particle proclitics	+“ل “	“l”	to/for
	+“ب”	“b”	by/with
	+“ك”	“k”	as/such
	+“س”	“s”	will/future

Definite article	+“ال”	“al”	The

Pronominal enclitics	“ه”+	“h”	His, its, him, it,
	“ها”+	“ha:”	Her, its, it, him
	“هم”+	“hum”	Their, Them (Male, Plural more than 2)
	“هما”+	“huma:”	Their, Them (Double)
	“هن”+	“hunna”	Their, Them (Female, Plural more than 2)
	“ك”+	“k”	Your, you (single)
	“كم”+	“kum”	Your, you (Male, Plural more than 2)
	“كما”+	“kuma:”	Your, you (Double)
	“كن”+	“kunna”	Your, you (Female, Plural more than 2)
	“نا”+	“na:”	Our, us
	“ي”+	“y”	My, me

The English phrase “and with his work”, for example, corresponds to the Arabic form “وبعمله”. This word can be split into four parts (و + ب + عمل + ه): The conjunction “و” “and”, the particle proclitic “ب” “with”, the stem or the word base “عمل” “work”, and the possessive pronoun “ه” “his”. Multiple word prefixing, suffixing and affixing generate different word from the same stem.

The complexity of the Arabic word structure is one of the main difficulties that researchers face when dealing with Arabic Sentiment Analysis. The next section examines some of the key challenges facing efforts to set up an accurate system for Arabic.

4. Sentiment analysis in Arabic: Challenges

4.1. Morphological analysis

Morphological analysis is an important phase in Sentiment analysis. Its main purpose is to decompose words into morphemes and to associate each morpheme with a morphological information such as stem, root, POS (Part Of Speech), and affix. As we saw in the previous section, Arabic is a morphologically complex language. This complexity requires the development of appropriate systems that are able to deal with tokenization, spell checking, stemming, lemmatization, pattern matching, and part-of-speech tagging. Nowadays, many morphological analysers for Arabic are already developed; some of these are freely available while the rest have a commercial purpose. Among those referred to the literature are Xerox Arabic Morphological Analysis and Generation [2], and Buckwalter Arabic Morphological Analyser (BAMA) [3]. However, these systems suffer from significant limitations especially in handling ambiguity that can result from the omission of diacritics (vowels), the free word-order nature of Arabic sentence, or the presence of an elliptic personal pronoun.

4.2. Dialectal Arabic

For communication purposes, Arabic speakers usually use colloquial Arabic rather than MSA. There are around 30 major Arabic dialects that differ from MSA and from each other phonologically, morphologically, and lexically [4]. Furthermore, Arabic dialects have no standard orthographies and no language academies. Therefore, using tools and resources designed for MSA to process Arabic dialects generates considerably low performance. Recently, researchers have started developing parsers for specific dialects such as CALIMA [5] for Egyptian Dialect. However, these analysers still have low accuracy and are made only for particular dialects. Filling this gap in processing Arabic will improve information retrieval effectiveness specifically for social media data.

4.3. Arabizi

Arabizi, Arabish, or Romanized Arabic refers to a system of writing Arabic using Latin characters. It is widely used to write MSA as well as Arabic dialects in social media platforms. Dealing with this form of writing has been the subject solely of studies that aim at detecting and converting Arabizi into Arabic.

As far as Sentiment Analysis is concerned, the published works have not dealt with this problem, as texts are pre-processed to filter out all Latin letters. To the best of our knowledge, [6] is the only published work that has handled this task.

4.4. Named entity recognition

In Arabic, large portions of Arabic names are associated with positive adjectives. For example, the first name “سعيد” corresponds to the adjective “سعيد” which means “happy”. In addition, Arabic proper nouns are not capitalized as in Latin languages, a fact which complicates the identification of named entities. For this reason, a system of Named Entity Recognition is crucial in analysing Arabic texts and distinguishing between entity names and sentiment words.

5. Literature review

The objective of the present section is to review the most important research studies that have dealt with Sentiment Analysis in Arabic. Note, however, that the works that we shed light on below are not classified on the basis of the approaches or techniques that they opted for in their analyses but on the specific Sentiment Analysis task or tasks that every work is concerned with.

5.1. Subjectivity classification

Subjectivity Classification is the task of determining whether a text is subjective or objective. Such a classification of texts can be performed using machine learning as well as lexicon-based approaches.

In this respect, Abdul-Mageed et al. [7], [8] proposed a machine learning approach to perform a subjectivity and sentiment classification at the sentence level. The first work dealt with a dataset of newswire documents extracted from PATB and manually annotated (1281 OBJ and 1574 SUBJ). In [8], the authors collected and annotated 11,918 sentences from different types of social media services Classification is effected in two stages. In the first stage, a distinction is drawn between a subjective and an objective text. (i.e. Subjectivity Classification). In the second stage, a difference is made between a positive and negative sentiment (i.e., Sentiment Classification). In this work, Abdul-Mageed et al. used SVM as a learning algorithm together with language specific and general features. Language-independent features include n-grams, domain, unique and polarity lexicon features. Arabic specific features were added to investigate the impact of morphological information on the performance. The findings indicated that using POS tagging and lemmas or lexemes to extract the base forms of words has a positive impact on subjectivity and sentiment classification.

Nabil et al. [9] presented a 4 way sentiment classification that classifies texts in four classes: objective, subjective negative, subjective positive and subjective mixed. Their dataset has 10,006 Arabic Tweets manually annotated using Amazon Mechanical Turk (AMT) service. They applied a wide range of machine learning algorithms (SVM, MBN, BNB, KNN, stochastic gradient descent) on the balanced and unbalanced datasets. However, using n-grams as unique features in multi-way classification did not give good results.

A lexicon-based approach was proposed in [10] to perform subjectivity classification of both MSA news articles and dialectal Arabic microblogs from Twitter. In order to build a large lexicon, the authors use two available lexicons: MPQA which is an existing English subjectivity lexicon, and ArSenti, a manually created Arabic lexicon. The first one is translated into Arabic using Machine Translation, and the second is automatically extended using a random graph walk method. All the words in Tweets and in the lexicon were tokenized and stemmed. Polarity stems, as indicated in the lexicon, were used as input feature vector to the learning module.

5.2. Sentiment classification

Sentiment Classification aims at classifying subjective texts in two or more categories. The binary classification determines whether the text expresses a positive or a negative opinion. A multi-way classification labels texts according to the strength of the expressed sentiment, namely extremely negative, negative, neutral, positive or extremely positive.

Mountassir et al. [11] conducted a binary sentiment classification using three classifiers: NB, SVM and KNN. Two corpora were used: the first is developed by these authors and is composed of two domain-specific datasets (movies and sports). The second is OCA, a corpus of movie reviews developed by Rushdi-Saleh et al. [12]. Before the classification phase, the authors performed a pre-processing task by removing stop words, separating words from their clitics, eliminating terms used only once or twice in the dataset, and by replacing words by their stems. The authors found out that pre-processing, n-grams combination, and presence-based weighting improve the classification performance.

In [19], two forms of Sentiment Classification were explored: the polarity classification, which classifies reviews as either having a positive or negative sentiment, and the rating classification whose objective is to predict the rating of the review on a scale of 1–5. Aly and Atiya [19] created a Large-scale Arabic Book Review (LABR), a dataset of over 63,257 book reviews collected from www.goodreads.com. Reviews with rating 4 or 5 were labelled as positive. Negative reviews were those with rating 1 or 2 while reviews that were rated 3 were considered neutral. Since the number of positive reviews (42,832) was much larger than that of negative reviews (8224), they applied machine learning in both balanced and unbalanced data using SVM, MNB and BNB as algorithms and n-grams as features. For sentiment polarity classification, the evaluation of their dataset achieved quite good results (∼90% accuracy), but for rating classification there is much room for improvement (∼50% accuracy).

For their parts, El-Baltagy et al. [25] proposed a lexicon-based approach to establish a sentiment classification of Egyptian Arabic texts. After building a lexicon of 4392 terms, the authors used two datasets (Twitter dataset of 500 tweets and Dostour dataset of 100 web comments) to evaluate two unsupervised classification algorithms. The first calculates one score for each document by adding up weights of negative and positive terms. The second algorithm assigns a positive and a negative weight to each term in the lexicon and calculates positive and negative scores for each document. The authors achieved good results using the two algorithms on a Twitter dataset (83.8% accuracy).

In [48], El-Makky et al. combined Sentiment Orientation (SO) algorithms with a machine learning classifier to propose a hybrid approach. For each document in a Twitter dataset, they used the lexicon-based approach to compute Sentiment Orientation scores. These scores were integrated with different features such as unigrams, language independent features, Tweets-specific features and stem polarity features so as to create an input feature vector for the SVM classifier. This combination of the Machine Learning classification approach and the lexicon based approach led to slightly better results than a one-approach result (accuracy 84%).

Determining the sentiment intensity of Arabic phrases was the goal of task 7 of SemEval 2016 [49]. The task aims to provide a score between 0 and 1 that indicates the Sentiment Intensity (SI) of a phrase. A score of 1 indicates the maximum of “positive strength” and a score of 0 indicates the maximum of “negative strength”. The SemEval organizers provided the participants with a development set of 200 terms commonly found in Arabic tweets and a set of 1166 terms for the evaluation period. Three systems were submitted from three teams: NileTMRG [50], iLab-Edinburgh [51] and LSIS [52].

Using a supervised approach, the NileTMRG team [50] collected 249 K tweets by querying Twitter using test set terms. Then, they classified this collection using their own sentiment analyzer [42] developed with Complement Naïve Bayes classifier [53] and trained on 11,242 Arabic tweets. To assign the Sentiment Intensity to each term, the normalized point wise mutual information relative the positive class was computed and re-scaled so that the values would range from 0 to 1.

The system proposed by the iLab-Edinburgh team [51] employs a hybrid approach of rule-based methods and supervised learning. The first phase of the system uses Linear Regression trained with the labMT1.0 Sentiment Lexicon [54], which is a publicly available list of 10k Arabic positive/negative words. Each entry of the Lexicon is associated to its Sentiment Intensity score ranged between 1 and 9 (1 very-negative/9 very-positive). After re-scaling the Sentiment Intensity score to [0, 1], the Linear Regression model is used to calculate an initial Sentiment Intensity score. In the rules-based phase of the system, a set of hand-crafted rules and a combination of three publicly available sentiment lexica, namely ArabSenti [55], MPQA [56], and Dialect lexical [57]was used to adjust initial SI scores.

The third team [52] proposed an unsupervised method. Their system computes the degree of dependence between a term and a positive class in sentiment lexica using the pointwise mutual information. When the term is not found in the lexica, a web search engine was used to calculate its sentiment orientation based on its co-occurrence near positive or negative words.

Evaluation showed that the iLab-Edinburgh team presented the best performing system and the results obtained using supervised methods are distinctly higher than the results got using unsupervised methods. Nevertheless, the results achieved on Arabic Twitter data was markedly lower than the results reached on a similar English Twitter data set [49].

5.3. Aspect based sentiment analysis

Compared to the above mentioned tasks, Aspect Based Sentiment Analysis (ABSA) is the least studied in Sentiment Analysis in spite of its crucial importance. This challenging task aims to recognize entities (e.g. brand of a mobile phone) and their aspects (e.g., ‘battery’, ‘screen’) and to estimate the sentiment expressed towards each aspect. As far as the Arabic language is concerned, only very few research papers have been published on ABSA.

One of the earliest research works that dealt with this task is [58]. In this work, Alhazmi and Salim introduced a supervised approach to extract the opinion target from Arabic Tweets. To build a training dataset, they manually tagged the opinion target in 500 collected Tweets. After pre-processing Tweets, each word was considered as a training vector defined by POS, named entities, English words and tweet hash tags features. Classification was carried out by specifying that a given word is either an opinion target or not. Experiments were undertaken using three classifiers: Naïve Bayes, Support Vector Machine and K-Nearest Neighbour. The best result was reached using the K-Nearest Neighbour classifier with an F-Measure of 91%.

Within the same task, Al-Smadi et al. [59] set up a benchmark dataset of Arabic reviews. Their Human Annotated Arabic Dataset (HAAD) contains 1513 reviews selected from LABR [19] and manually annotated. Following SemEval2014 guidelines proposed in [60], annotators were asked, in the first phase, to identify aspect terms and provide the polarity for each aspect term. In the second phase, annotators recognized aspect categories and their polarities. As a benchmark dataset, the authors conducted an evaluation baseline of four tasks: aspect term extraction, aspect term polarity, aspect category extraction, and aspect category polarity. The adopted approach uses a majority baseline that assigns the most repeated polarity in the training data to all aspect terms and categories.

Al-Smadi et al.’s work was improved by Obaidat et al. [61] using a Lexicon-Based Approach to deal with aspect category extraction and aspect category polarity. The proposed approach for aspect category polarity employed a polarity lexicon extracted from the training set. The lexicon assigns every word in the training set to the class corresponding to its highest occurrence frequency. For word missing in the training set, the authors made use of machine translation and POS tagger to get polarity from SentiWordNet. The accuracy rate obtained with this approach (i.e. 71%) outperforms significantly the baseline results (42.6%). To deal with aspect category extraction, the authors created a category lexicon using seed words and Pointwise Mutual Information (PMI). The trained words were assigned to the category with the most similar seed words. Each entry in the lexicon was weighted by its PMI value. The category of a test review was defined by summing the weights of its words and selecting the aspect category with the highest sum. This approach showed an improvement of 8.2% over the baseline results.

5.4. Building resources

With the rising interest Sentiment Analysis in Arabic, developing multi-genre corpora and large Arabic Lexica for word-level sentiment evaluation has become of utmost importance. In this respect, several research studies have recently made attempts to make large Arabic resources available for Sentiment Analysis researchers.

Abdul-Mageed and Diab [55], for instance, presented a large scale multi-genre sentiment lexicon. This lexicon is made up of 224,564 entries covering MSA and multiple Arabic dialects. The authors collected and manually tagged two word lists from both Penn Arabic Treebank and Yahoo Maktoob. Lists were automatically developed using Google’s translation API of three existing English lexica: SentiWordNet, YouTube Lexicon, and General Inquirer. To expand the lexicon's coverage, they used a statistical method based on PMI to extract other polarized tokens from both Twitter and chat datasets. Despite the large size of the resulting resource, many of the entries are neither lemmatized nor diacritized, which limits the usability of their lexicon.

In their attempt to build Arabic multi–domain resources for Sentiment Analysis, ElSahar and El-Beltagy [37] proposed a semi-supervised approach to generate multi-domain lexica out of four multi–domains reviews datasets. This method makes use of the feature selection capabilities of SVM to select the most efficient unigram and bigram features. Although the created lexicon covers a variety of domains, it was extracted only from reviews, which restricts its usefulness just for social media Sentiment Analysis.

Badaro et al. [66] set up ArSenL, a lexicon for Arabic sentiments using two approaches based on English SentiWordNet (ESWN). The first method links each term in ArabicWordNet, on the one hand, with ESWN to get sentiment scores, and on the other hand with SAMA (Standard Arabic Morphological Analyser) to find the correct lemma forms. In the second approach, English glosses associated with SAMA’s entry were explored automatically to find the most similar synset in ESWN. The union of the two resulting lexica has a good coverage but is limited to MSA.

In [62], AWATIF a multi-genre corpus of MSA was collected from three different sources: Penn Arabic Treebank (PATB), Wikipedia Talk Pages, and web forums. One part of the corpus was labelled using crowdsourcing on Amazon Mechanical Turk (AMT) annotation. Annotation of the second part was carried out by students who received specific guidelines. The last part was also labelled by students but with simple guidelines. Abdul-Mageed and Diab observed that the value of inter-annotator agreement improves using specific guidelines. However, this corpus is not publicly available and is not used in any other work.

In addition to these corpora, an Arabic Twitter corpus was collected in [57]using Twitter API and cleaned in a pre-processing phase. Two native speakers of Arabic annotated manually 8868 Tweets using four labels: neutral, mixed, positive and negative. Morphological, syntactic, and semantic features were also added to the annotation.

5.5. Opinion holder extraction

An Opinion holder is the person or the organization that expresses an opinion. In most social media platforms, opinion holders are usually the authors of reviews. Therefore, we do not need to extract them. In some cases, as in news articles, the person or organization that holds an opinion is often explicitly stated in the actual text and needs to be extracted. Nevertheless, extracting opinion holders has drawn little attention from Arabic opinion mining researchers. To the best of our knowledge, there are only two research works that have been concerned with this issue, namely [73], [74].

In [73], the authors explored the problem in Arabic news articles using three different approaches. The first approach is semi-supervised and uses a set of handcrafted patterns. POS tags and key phrases were used to define 43 patterns, which were chosen to run a pattern matcher code and identify opinion sources on the tested data. The second is a supervised machine learning approach that uses the Conditional Random Field classifier (CRF). To train this classifier, the authors used features such as surrounding words, POS, Named Entity and sentiment words. The third approach is a combination of the two previous approaches using patterns such as CRF features. Their experiments showed that the CRF outperforms patterns in terms of recall and precision. Moreover, adding patterns as a feature to CRF is insignificant compared to other features such as the Named Entity feature.

5.6. Opinion spam detection

Opinion spam refers to false reviews written to promote a low quality product using positive opinions or to damage the reputation of a given product with negative opinions. The objective of this task is to detect automatically spam reviews using techniques that usually depend on three types of features: content of the review, review meta-data, and real-life knowledge about the product.

With the growing popularity of online reviews or comments and their major impact on business decision making, this illegal activity has become more developed in terms of both quality and quantity. The number of spam opinions have actually increased and the reviews are getting more and more sophisticated. The latter fact. actually, justifies the scarcity of studies devoted to this issue in Arabic.

A preliminary study that was concerned with opinion spam detection was conducted by Wahsheh et al. in [75]. The study was based on 3090 Arabic opinions collected from Yahoo-Maktoob social network. The authors employed ACLWSDS, which is an Arabic spam URL detection system developed in [76]. Opinions containing a URL were classified, either as high-level spam if the URL was considered as spam by ACLWSDS, or as low-level spam if the URL was considered as non-spam. In the absence of URLs and some specific metrics, the opinion was categorized as a non-spam. Evaluating this method with SVM algorithm achieved favourable results. However, using only the URL filtering technique may not be efficient.

In [77], Abu Hammad and El-Halees proposed a supervised approach for detecting opinion spams in Arabic. This approach combines techniques from data and text mining. The authors collected 2848 Arabic reviews from online accommodation booking websites namely, booking.com, tripadvisor.com, and agoda.ae. In addition, they integrated their dataset into a coherent form data and labelled each review with a spam or a non-spam label. For classification purposes, they used NB, K-NN and SVM classifiers with 10 folds cross-validation. Although their system was limited to hotel reviews, it was able to generate a high accuracy by combining data and text classification.

6. Discussion

Although research on Sentiment Analysis has started in Arabic since 2008, studies on this issue have witnessed a rapid increase in the last three years. This is mainly due to the exponential growth of Arabic text content on the web. The works that have been reviewed in the section above have revealed that researchers have dealt with different Sentiment Analysis tasks: Subjectivity Classification, Sentiment Classification, Aspect based Sentiment Analysis, building resources, extracting opinion holders, and detecting spam opinions.

Subjectivity Classification has been carried out in a very limited number of works. It is treated as an initial phase of sentiment classification to filter objective elements and not as a standalone issue.

Sentiment Classification is the most widely studied topic. As indicated in Table 4, Sentiment Classification at document level has been the subject of a lot of research studies. The limited number of studies on sentence level may due to the fact that it requires the detection of sentence boundaries which is another Arabic NLP problem. For both levels, the supervised learning approach outperformed unsupervised and semi-supervised methods. Note also that some supervised classifiers such as SVM and NB have repeatedly been applied. The effectiveness of these classifiers is probably the reason behind this choice. The same thing can be said about the used features; word stem and n-grams are frequently chosen as features. Regarding datasets, each researcher built and used his own dataset. Thus, no common dataset was used for benchmarking results and evaluating experiments. Early research analysed texts collected from the web and focused on MSA language. By contrast, researchers have recently treated principally social media texts and dealt with Arabic dialects as well as MSA.