Text Representation

ForHHeart發表於2024-05-03

1 Statistical Model

1.1 One-Hot

1.2 Bag of words(BOW)

https://web.stanford.edu/class/datasci112/lectures/lecture8.pdf

1.3 N-grams

1.4 TF-IDF

2 Word Embedding(Neural Network Model)

2.1 Word2Vec

https://projector.tensorflow.org/

Continuous Bag of Words(CBOW)

Skip-Gram

The goal is to get the word vector
Trainable weight is input weight matrix and output matrix

2.2 Glove

2.3 FastText

3 BERT

4 SBERT(Sentence Embedding)

Reference

https://deysusovan93.medium.com/from-traditional-to-modern-a-comprehensive-guide-to-text-representation-techniques-in-nlp-369946f67497
https://github.com/sawyerbutton/NLP-Funda-2023-Spring
https://github.com/sawyerbutton/LM-Funda-2024-Spring/blob/main/示例程式碼/Lesson3/LM_Lesson3_Embedding_demo.ipynb

相關文章