Practical 5

AIM: Data Pre-Processing and Text analytics using Orange

Theory:

  • Text Analytics: Text analytics is the automated process of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns. Combined with data visualization tools, this technique enables companies to understand the story behind the numbers and make better decisions. When searching for a definition of text analytics, you may have come across related concepts, like text mining and text analysis. So, before going into the details, we’ll outline the main differences between these terms. Text mining, text analysis, and text analytics are often used interchangeably, with the end goal of analyzing unstructured text to obtain insights. However, while text mining (or text analysis) provides insights of a qualitative nature, text analytics aggregates these results and turns them into something that can be quantified and visualized through charts and reports.
  • Sentiment Analysis: Sentiment analysis (also known as opinion mining is a text analysis technique that detects polarity (e.g. a positive or negative opinion) within text, whether a whole document, paragraph, sentence, or clause. Understanding people’s emotions is essential for businesses since customers express their thoughts and feelings more openly than ever before. Automatically analyzing customer feedback, such as opinions in survey responses and social media conversations, allows brands to listen attentively to their customers, and tailor products and services to meet their needs. For example, using sentiment analysis to automatically analyze 4,000+ reviews about your product could help you discover if customers are happy about your pricing plans and customer service.

Task 1: Data Preprocessing:

  • Discretization: It is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation data by the models.

  • Continuization: Given a data table, return a new table in which the discretize attributes are replaced with continuous or removed.
    • binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument zero_based.
    • multinomial variables are treated according to the argument multinomial_treatment.
    • discrete attribute with only one possible value are removed.
  • Normalization: It is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies.
  • Randomization: A method based on chance alone by which study participants are assigned to a treatment group. Randomization minimizes the differences among groups by equally distributing people with particular characteristics among all the trial arms.




OUTPUT - Preprocessed Data

No comments:

Post a Comment