top of page

Student Presentation Reading Materials

1: Digital libraries, repositories, and search engines

Major Readings:

* "Digital libraries and autonomous citation indexing" by §by Lawrence, Giles, and Bollacker, 1999

* "Big Scholarly Data: A Survey" by Xia et al. 2017 

* “Search Engine Technology and Digital Libraries” by Summann & Lossau, 2010

Additional Readings:

* Digital Library:

* List of digital library projects:

* ACM Digital Library:

* Institutional repository:

* Dspace:

* HathiTrust Digital Library:

* ROAR content search: Registry of Open Access Repositories (ROAR) Content Search

* OpenDOAR project:

* OAISTER project:

* The Digital Public Library of America:

* Google Scholar:

2. Architectures of digital library search engines

Major Readings:

* "CiteSeerX: AI In A Digital Library Search Engine" by Wu et al. AI-Magazine 2015

* "ArnetMiner: extraction and mining of academic social networks" by Tang et al. KDD 2008 

Additional Readings:

* "Towards Building a Scholarly Big Data Platform: Challenges, Lessons and Opportunities" by Wu et al. 2014

* ETL model:,_transform,_load 

* RESTful API: 

* LAMP architecture model: 

* Apache Solr: 

* MySQL: 

* Apache Tomcat: 

* Apache UIMA: 

3. Textual metadata extraction: headers and citations 

Major Readings:

* "CERMINE: automatic extraction of structured metadata from scientific literature" by Tkaczyk et al. 2015

* "Neural ParsCit: a deep learning-based reference string parser" by Prasad, Kaur, and Kan 2018 

Additional Readings:

* "Evaluation of Header Metadata Extraction Approaches and Tools for Scientific PDF Documents" by Lipinski et al. 2014 

* "GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications" by Lopez 2009

* Dublin core: 

* Metadata Object Description Schema (MODS):

* "Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers" by Tkaczyk et al. 2018 

4. Non-textual metadata extraction: figures and tables 

Major Readings: 

* "Table Header Detection and Classification" by Fang et al. 2018

* "PDFFigures 2.0: Mining Figures from Research Papers" by Clark & Divvala 2016 

Additional Readings:

* "Extracting Scientific Figures with Distantly Supervised Neural Networks" by Siegel 2018 

* "Curve Separation for Line Graphs in Scholarly Documents" by Choudhury et al. 2016 

* Raster graphics: 

* Vector graphics: 

* pdffigures 

5. Semantic Information extraction: entities and relations 

Major Readings: 

* "Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions" by Shen, Wang,, and Han (2015) 

* "Construction of the Literature Graph in Semantic Scholar" by Ammar et al. (2018) 

Additional Readings:

* Wikipedia: 

* DBPedia: 

* Freebase: 

* YAGO: 

* Wikidata: 

* "SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications" by Augenstein et al. (2017)

* Word2Vec: 

6. Document classification

Major Readings: 

* "Classifying Document Types to Enhance Search and Recommendations in Digital Libraries" by Charalampous & Knoth (2017) 

* "Document Type Classification in Online Digital Libraries" by Caragea et al. (2016)

Additional Readings:

* n-grams: 

* A Gentle Introduction to the Bag-of-Words Model: 

* "Web page classification: Features and Algorithms" by Qi and Davison (2009)

7. Near-duplicate and plagiarism detection 

Major Readings:

* "Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information" by Ehsan & Shakery (2016) 

* "Near Duplicate Detection in an Academic Digital Library" by Williams & Giles (2013)

Additional Readings:

* "An Introduction to Duplicate Detection" by Naumann & Herschel (2010)

* Plagiarism Detection: 

* HelioBLAST: 

* Simhash on a blog: 

* Near duplicates and Shingles on the IR textbook: 

8. Ranking and recommendation of research papers 

Major Readings: 

* "Research-paper recommender systems: a literature survey" by Beel et al. (2016) 

* "Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding" by Xiong, Power, & Callan (2017) 

Additional Readings: 

* Recommender System: 

* Collaborative Filtering: 

* PubMed: 

* Docear: 

* CiteULike: 

* Mendeley on Wikipedia: 

* Click through rate: 

* What is the difference between content based filtering and collaborative filtering? :

9. Question answering systems based on SBD

Major Readings:
* "Novel knowledge-based system with relation detection and textual evidence for question answering research" by Zheng et al. (2018) 

* "Open Domain Question Answering via Semantic Enrichment" by Sun et al. (2015)

Additional Readings:

* "Search needs a shake-up" by Oren Etzioni (2011)

* "Building Watson: An Overview of the DeepQA Project" by Ferrucci et al. (2010)









© 2018 by Jian Wu. Proudly created with Last updated on 12/17/2018.


  • b-facebook
  • Twitter Round
  • b-googleplus
bottom of page