element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Artificial Intelligence and Machine Learning
  • Technologies
  • More
Artificial Intelligence and Machine Learning
Blog Using AI to improve Natural Language Processing
  • Blog
  • Forum
  • Documents
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Artificial Intelligence and Machine Learning to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: Randy Scasny
  • Date Created: 18 Mar 2022 2:56 AM Date Created
  • Views 1678 views
  • Likes 6 likes
  • Comments 1 comment
  • artificial intelligence
  • nlp
  • transformer
  • bert
Related
Recommended

Using AI to improve Natural Language Processing

Randy Scasny
Randy Scasny
18 Mar 2022

Natural language processing (NLP) is an interdisciplinary field that makes use of computers to comprehend or process human (or natural) languages. NLP uses many artificial intelligence (AI) techniques, including several types of machine learning (ML). Automatic speech recognition (ASR) is an application of NLP that also uses several ML techniques.

image

NLP Techniques

NLP techniques are based on six basic principles. Let's discuss them in this section.

  • Stemming: This cluster of algorithms considers the common suffixes and prefixes of the language and analyzes them to derive a word’s infinitive form.
  • Lemmatization: These algorithms overcome flaws inherent within stemming. Lemmatization incorporates grammar and linguistic knowledge into the extraction of a word's infinitive form, leading to more accurate results.
  • Keywords extraction: Keyword extraction, also known as keyword analysis or keyword detection, is an NLP text analysis technique for automatically extracting frequent expressions and words from the body of a text.
  • Named Entity Recognition (NER): NER is a fundamental NLP technique that identifies and extracts entities, such as names, dates, and places, from a text body.
  • Topic Modelling: A text topic model uses several algorithms. Latent Dirichlet is the most common method. With Latent Dirichlet, the text undergoes analysis and is broken into words and statements, after which various topics are extracted.
  • Sentiment Analysis: This method's core function is to extract sentiment behind a text body via analysis of the containing words. Through this analysis, the text can be deemed to have negative, positive, or neutral sentiment.

NLP Building Blocks: Word Embeddings, RNN, and LSTM

By varying different neural networks, advanced NLP algorithms can be created. Machine learning-based NLP algorithms (and most AI) are Directed by Acyclic Graph (DAG) networks, which comprise artificial neurons arranged in connected layers that propagate into the output layer. These models represent how neuroscientists think our brains work.

Word Embeddings
Neural networks work based on numerical vectors, however, words are not numerical. Word embeddings are a set of language modeling techniques that are able to convert non-numerical vocabulary words into numerical vectors. This type of processing might comb through thousands of words, grouping words that are often seen closely together, assigning them close numerical values.

Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) takes into consideration past data. In an RNN, the output is looped back into the input, so the algorithms can process pass states. In NLP, an RNN is useful to determine context. For example, the sentence, “Bill thought the dog was huge, but it was not that big” contains the words “huge” and “big”, which might lead a computer to think that the dog was very large. An RNN will catch the inversion words “but” and “not”, and use them to better determine the meaning of the sentence.

Long Short-Term Memory (LSTM)
Most words are inconsequential in RNN sentence processing. To ignore the words that might not be important, a Long Short-Term Memory (LSTM) is employed. The LSTM has three cell states: the forget state, the output state, and the update state, each having a unique purpose. These states work in conjunction to populate a “forget layer” that determines if the information should be kept or forgotten.

Language Understanding: Transformer and BERT

Notable NLP architecture breakthroughs created the Transformers - an architecture enabling new NLP models. Transformers combine two common neural network architectures, the RNN and the CNN (Convolutional Neural Network). The Transformer model has the advantage of being non-sequential, meaning there is no need for the input sequence to undergo processing in a particular order.

This innovation allows parallelization and scaling of Transformers more so than preceding NLP models. Considerable research is being dedicated to the Transformer models due to their superior speed and performance to traditional models, translating into unique use cases in organizations. You can access them in this original Transformer and BERT paper.

Google AI Language researchers introduced and subsequently open-sourced Bidirectional Encoder Representations from Transformers (BERT) in 2018. The BERT model's key innovation is applying the Transformer model bidirectional training to language modeling.

In contrast to sequential text input directional models, the Transformer encoder reads text input bidirectionally. The BERT model result shows that a bidirectionally trained language model has an extensive sense of language context and flow compared to single-direction language models. The BERT model variants are highly successful in various NLP tasks, including document classification, document entanglement, and sentiment analysis.

NLP Applications

Typical NLP applications include speech recognition, understanding spoken language, and dialogue systems.

Fighting Online Bullying and Hate Speech: Facebook used the BERT algorithm, teaching it multiple languages simultaneously. The algorithm creates a statistical image of bullying or hate speech in several languages. Facebook employs automatic content monitoring tools for numerous languages. Their monitoring tool, RoBERTa, enables the company to automatically block unwanted content and minimize such unwelcome incidents by 70%.

NLP-Powered Chatbots: Using NLP allows chatbots to understand a wide range of questions. Because human’s do not adhere to formulas or keywords when chatting, NLP algorithms can be used to decipher what a user is asking. AI-powered chatbots are widely employed, and are becoming more useful as the technology advances.

  • Sign in to reply
Parents
  • DAB
    DAB over 3 years ago

    Nice overview.

    I have been watching this technology evolve over the last forty years or so.

    AI has come up with some interesting results.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
Comment
  • DAB
    DAB over 3 years ago

    Nice overview.

    I have been watching this technology evolve over the last forty years or so.

    AI has come up with some interesting results.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
Children
No Data
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube