Theoretical aspects of Natural Language Processing, as a prelude to Python programming.
This course is an introduction to several basic theoretical aspects of natural language processing (NLP). Text mining will be discussed and it will br shown how this technique relates to NLP. An introduction to NLP will discuss how this science is crucial to our current technological world.
Three libraries that cover NLP will be discussed and these libraries are:-
1. Natural language toolkit (NLTK)
2. Spacy
3. Sklearn
NLTK has many functions that are relevant to NLP, to include:-
1. Processing text data
2. Removing frequently used words
3. Sentence tokenisation
4. Word tokenisation
5. Blank line tokenisation
6. Frequency distribution
7. Stop words
8. Unikgrams, bigrams, trigrams, and ngrams
9. Stemming
10. Lemmatisation
11. Part of speech tagging
12. Named entity recognition
13. Chunking
14. Chinking
Spacy is a new library that is concerned with NLP and has several functions to cover this genre including:-
1. Lemmatisation
2. Part of speech tagging
3. Named entity recognition
4. Displacy
5. Pattern matching
Machine learning, deep learning, and neural networks are crucial to NLP because they are needed to make predictions on the text data that is mined.
Sklearn is Python’s library that carries out machine learning and it has several methods relating solely to NLP, being:-
1. CountVectorizer
2. TfidfTransformer
3. Cosine similarity
4. TfidfVectorizer
5. HashingVectorizer
6. DictVectorizer
Classifiers will be discussed because they are necessary to carry out sentiment analysis. Although there is a wide range of classifiers that can be used in NLP, the ones that will be discussed in this course are:-
1. Sklearn’s LinearSVC
2. NaiveBayes