#transformers #research #documentsimilarity
⏩ Abstract: Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - / @techvizthedatascienceguy
⏩ OUTLINE:
0:00 - Intro and Background
02:50 - Aspect-free similarity Vs Aspect-based similarity
03:51 - Dataset creation for aspect-based document similarity from research papers
05:12 - Negative sampling
05:48 - Document similarity as Document Pair Classification (modeling strategy)
06:41 - My thoughts and concerns on approach
07:45 - Results
⏩ Paper Title: Aspect-based Document Similarity for Research Papers
⏩ Paper: arxiv.org/abs/2010.06395
⏩ Code: github.com/malteos/aspect-document-similarity
⏩ Author: Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp, Georg Rehm
⏩ Organisation: German Research Center for Artificial Intelligence, University of Konstanz, University of Wuppertal, University of Kiel
⏩ IMPORTANT LINKS
Full Playlist on Machine Learning with Graphs: • DEEPWALK: Online Learning of Social R...
Full Playlist on Evaluating NLG Systems: • Evaluation of Text Generation: A Surv...
Full Playlist on Query Expansion for Information Retrieval using NLP: • NQE: Neural Query Expansion for Code ...
Full Playlist on Text Generation Evaluation Techniques: • Evaluation of Text Generation: A Surv...
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at www.buymeacoffee.com/TechvizCoffee
*********************************************
⏩ Youtube - youtube.com/c/TechVizTheDataScienceGuy
⏩ Blog - prakhartechviz.blogspot.com/
⏩ LinkedIn - linkedin.com/in/prakhar21
⏩ Medium - medium.com/@prakhar.mishra
⏩ GitHub - github.com/prakhar21
⏩ Twitter - twitter.com/rattller
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - / @techvizthedatascienceguy
Tools I use for making videos :)
⏩ iPad - tinyurl.com/y39p6pwc
⏩ Apple Pencil - tinyurl.com/y5rk8txn
⏩ GoodNotes - tinyurl.com/y627cfsa
#techviz #datascienceguy #nlp #bert #xlnet #electra #Roberta