Named Entity Recognition and Context Aware Sentiment Analysis in Multiple Domains

Estat: Atorgat

Entorn empresarial: Tinval sistemes, S.L.

Entorn Acadèmic: Universitat Pompeu Fabra -

Municipi: TERRASSA

Ambits: PE6 Computer Science and Informatics - PE7 Systems and Communication Engineering -

Titulació requerida:

Descripció del projecte

This doctoral project aims at the development and investigation on sentiment analysis of content written by users of Brandchats’ clients in social media. Current industrial sentiment analysis systems provide a single value (usually, positive/negative) per document, and do not take into account the different aspects on which people express their opinions nor the differences in opinion expressions across domains and different contexts. This project aims at covering these two aspects:
– to specify the aspects on which opinions are expressed (including the detection of Named Entities in documents)
– to create a system that takes into account the variability of opinion expressions across domains
1. Domain and languages:
The project will focus on the Financial and Pharmaceutical domains, as they are both commercially interesting fields and Brandchats has a great amount of data available for experiments. On the other hand, the type of document will be mainly tweets from Twitter due to its vast volume and popularity. Spanish and English will be the languages we concentrate on as annotated data under the Financial Domain in Spanish is already available and both are resourceful languages.
2. FIRST YEAR planning:
– Data Pre-processing and Normalization
– Named Entity Recognition (NER)
– Opinion Entity and Aspect Recognition
– Data Annotation for Evaluation
In the first year, the general objective is to build a Named Entity Recognizer (NER), i.e., a system that can detect entities about which opinions are expressed starting from the Financial Domain, while preparing data for experiments and evaluations. The NER system is expected to detect Named Entities and later categorize them into classes (i.e. Brands, Locations, Person…), using techniques like Pattern Recognition, Machine Learning or a combination of both. Another task will be finding entities with opinions (i.e. Financial Product, Online Service…), and aspects of these entities, by using a general
Sentiment Lexicon to identify opinion words first and extract target entities through linguistic patterns such as syntactic parses. Such technique will play an important role in further construction of a Contextualized Sentiment Lexicon.
Other tasks for this year are the pre-processing and normalization of texts and the preparation of annotated data under different domains and languages for system evaluation. These tasks are important as Twitter, which contains a rich set of opinions and great volume for both business intelligence and research purposes, is usually noisy due to its informality in expression.
3. SECOND YEAR planning:
– Domain Adaptation for Opinion Entity Recognition
– Domain Adaptation for General Sentiment Lexicon
– Detection and Categorization of Opinion Holder
For the second year, the main goal is to adapt existing systems to new domains as the polarity of some sentiment words depends on the domain or aspect. Therefore, the Opinion Entity Recognizer will be expected to be capable of working in multiple domains (i.e. Financial Domain to Pharmaceutical Domain) and the General Sentiment Lexicon will be enhanced with new Domain/Aspect Dependent Words (i.e. expand existing vocabulary under Financial Domain and Pharmaceutical Domain).
To construct a Domain/Aspect Dependent Sentiment Lexicon, we plan to use bootstrapping methods which allow the expansion of the dictionary starting from a small amount of data. Another approach can be building a basic Machine Learning as a General Sentiment Classifier and adapt it to a Domain Specific Classifier through methods like ‘co-training’.
In Sentiment Analysis, the subtopic of Detection and Classification of Opinion Holder is an interesting field, yet less explored. Depending on what kind of data we obtain, structured data from websites or data generated in Social Media, and in case there was no strong indication about the profession of the Opinion Holder, we will need to explore ways to find useful clues in the timeline of the user (i.e. through posts containing URL and looking for information in its meta description).
4. THIRD YEAR planning:
– Multi Domain Aspect Based Lexicon Integration
– Context Aware Sentiment Analysis
– General Polarity Summarization
The main task in the third year will be focusing on the integration of a multiple domain aspect based sentiment lexicon which will be used for Context Aware Sentiment Analysis.
The system will also be expected to summarize polarities on aspect level to general level taking into account the importance of each aspect as a feature in order to find out a more generic polarity about a certain entity.
According to observation and experience, the context dependent ambiguity problem in Sentiment Analysis is generally caused by domain and aspect information co-occurring with the opinion word, thus the integrated Multi Domain Aspect Based Sentiment Lexicon will be able to distinguish the difference accordingly.
Furthermore, factors like the category of the opinion word (i.e. verb, adjective, adverb…) and its uncovered relations within the context can also be important. Therefore, for better capture of the context information, we can also use additional techniques like word embeddings which allow a semantic representation of words in vector space for a better mapping of word relations and context information to gain more complex features and further improve the classifier’s performance.
Note: All experiment achievements can be submitted to relevant conferences or journals in related research fields.

Tornar a la lista de projectes

Galeta	Durada	Descripció
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wpEmojiSettingsSupports	session	WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Galeta	Durada	Descripció
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Galeta	Durada	Descripció
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_55600303_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
_hjIncludedInSessionSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's daily session limit.
_hjSession_*	30 minutes	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Galeta	Durada	Descripció
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
NID	6 months	Google sets the cookie for advertising purposes; to limit the number of times the user sees an ad, to unwanted mute ads, and to measure the effectiveness of ads.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Galeta	Durada	Descripció
__Secure-ROLLOUT_TOKEN	6 months	Description is currently not available.
_hjIncludedInSessionSample_2950888	2 minutes	Description is currently not available.
_hjSession_2950888	30 minutes	No description
_hjSessionUser_2950888	1 year	No description
BROWNIE	session	Description is currently not available.

Named Entity Recognition and Context Aware Sentiment Analysis in Multiple Domains

Descripció del projecte

Vols estar-ne ben informat O INFORMADA?

Copyright 2025 © Doctorats Industrials de la Generalitat

Descripció del projecte

Vols estar-ne ben informat O INFORMADA?

Copyright 2025 © Doctorats Industrials de la Generalitat

Consentiment