Dr. Wu is recruiting motivated PhD students who are interested in Applied Machine Learning and Natural Language Processing Systems. For details, please see this blog post


Dr. Jian Wu is an assistant professor in the Computer Science Department at the Old Dominion University, Norfolk, VA. Before joining ODU, Dr. Jian Wu was an assistant teaching professor in the College of Information Sciences and Technology (IST) at the Pennsylvania State University

Dr. Jian Wu received his bachelor's degree in 2004 from the University of Science and Technology of China (USTC) in Physics and Astronomy. He graduated from the Department of Astronomy and Astrophysics at the Pennsylvania State University in August 2011. After that, he joined the CiteSeerX team led by Dr. C. Lee Giles. Jian Wu is the tech leader of the CiteSeerX project. He led a small team to scale the CiteSeerX collection from 3 million to 10 million academic documents. 

Dr. Jian Wu has published nearly 30 20 peer-reviewed papers in ACM, IEEE, AAAI conferences, receiving one best application paper award and two best paper nominations. Dr. Jian Wu also processed and analyzed astronomical big data earlier in his career and published 7 journal papers in the Astrophysical Journal (ApJ), the Astronomical Journal (AJ), and Monthly Notices of the Royal Astronomical Society (MNRAS). Dr. Jian Wu was the Co-PI of NASA and NSF proposals.

Dr. Jian Wu has mentored at least 20 students towards their bachelor's or master's theses and teaches two undergraduate level courses.

Selected Publications in Information Sciences and Technology (refer to my CV  for a full list) 

  • Athar Sefid, Jian Wu, Jing Zhao, Lu Liu, Allen C. Ge, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles. "Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets." In: Proceedings of the 31th Innovative Applications of Artificial Intelligence Conference (IAAI 2019), January 29-31, 2019, Honolulu, Hawaii, USA. [pdf]


  • Jian Wu, Bharath Kandimalla, Shaurya Rohatgi, Athar Sefid, Jianyu Mao, C. Lee Giles. "CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset." In: Proceedings of the 2018 IEEE International Conference on Big Data (BigData 2018), December 10-13, 2018, Seattle, WA, USA. [pdf


  • Jian Wu, Athar Sefid, Allen C. Ge, C. Lee Giles. "A Supervised Learning Approach To Entity Matching Between Scholarly Big Datasets." In: Proceedings of the 9th International Conference on Knowledge Capture (K-CAP 2017), December 4-6, 2017, Austin, Texas, USA. [pdf] [bibtex]


  •  Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang, and C. Lee Giles. "HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities." In: Proceedings of ACM/IEEE-CS Joint Conference on Digital Libraries ( JCDL 2017), Toronto, Canada. [pdf] [bibtex]


  •  Jian Wu, Chen Liang, Huaiyu Yang, and C. Lee Giles. "CiteSeerX data: semanticizing scholarly papers." In: Proceedings of the International Workshop on Semantic Big Data (SIGMOD-SBD 2016), San Francisco, CA, USA. [pdf] [bibtex]








  • Jian Wu, Kyle Williams, Hung-Hsuan Chen, Madian Khabsa, Cornelia Caragea, Alexander Ororbia, Douglas Jordan and C. Lee Giles. "CiteSeerX: AI in a Digital Library Search Engine". In: The 26th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI 2014), Quebec City, Quebec, Canada. [Best Application Paper]. [pdf] [bibtex]


  • Jian Wu, Pradeep Teregowda, Kyle Williams, Madian Khabsa, Douglas Jordan, Eric Treece, Zhaohui Wu and C. Lee Giles. "Migrating A Digital Library to A Private Cloud". In: Proceedings of the IEEE International Conference on Cloud Engineering 2014 (IC2E 2014), Boston, MA, USA. [Best Paper Nomination] [pdf] [bibtex]



  • Cornelia Caragea, Jian Wu, Alina Ciobanu, Kyle Williams, Juan Fernandez-Ramrez, Hung-Hsuan Chen, Zhaohui Wu and C. Lee Giles. "CiteSeerX: A Scholarly Big Dataset". In: Proceedings of the 36th European Conference on Information Retrieval (ECIR 2014), Amsterdam, Netherlands. [pdf] [bibtex]



Publications in Astronomy and Astrophysics


  • My Ph.D. Thesis (public since September 2013). Chapter 2 includes my work on the evolution of the Baldwin Effect using the composite spectra generated from the SDSS. This work was never published except in my thesis because we do not have time yet to finish the whole paper, but the result is interesting and inspiring. 




Digital Libraries and Search Engines

The Seer Family

Semantic Entity Extraction

Digital Library

Knowledge Bases

Glossary and Dictionaries



Ruby on Rails

Web Crawling

PDF Text and Metadata Extractors








Course Notes

Data Repositories and Data Sets


Software Packages



Cloud Resources

Teaching Courses

 Job Finder

Conferences and Journals



Online Tools

Help is always here: Look out for the   ?   on the top right of the editor & anywhere you see it