CS 589 Text Mining and Information Retrieval

This course is a graduate-level course on fundamental techniques in information retrieval and natural language processing. The first half of this course introduces fundamentals concepts of information retrieval, including vector space model, TF-IDF, BM25, IR evaluation, inverted index, and learning to rank. The second half of the course introduces fundamentals and frontiers of natural language processing, including word representation, word2vec, seq2seq model, neural machine translation, attention, Transformer-based models, generative language model, and other frontier topics in NLP. Students will also achieve hands-on programming skills in IR and NLP through assignments, including implementing BM25 and TF-IDF models, ElasticSearch, and building basic NLP pipelines using HuggingFace, and optimization for NLP models including model selection and hyperparameter optimization.

Credits

3

Prerequisite

((Grad Student) or (Junior or Senior and CS 115))

Distribution

Computer Science Program

Typically Offered Periods

Fall Semester Spring Semester Summer Semester