top of page
Search
  • Writer's pictureJian Wu

Why People Prefer Google Scholar rather than CiteSeer

I think people prefer Google Scholar rather than CiteSeer in mainly two reasons:

  1. From the searchers' perspective, CiteSeer does not have as many papers as Google Scholar. Google Scholar claims it has 100 million, but CiteSeer only has 4 million. Of course, not all papers in Google Scholar have full-text copies, but we can still get as many as 27 million according to the estimation of Khabda and Giles (2014)

  2. Still from the searchers' perspective, CiteSeer contains too many non-academic documents, or contaminants. The users can often get some crappy documents in their search results. This is due to the simply rule-based filtering algorithm.

  3. From the authors' perspective, CiteSeer does not have a high quality metadata extraction. The metadata is too noisy to create a complete and reliable citation network. Many papers do not have correct titles, and authors extracted. This results in a very fragmented citation graph. Authors prefer Google Scholar because it gives complete set of papers and citation counts, so they can tracking their own academic achievements, and compare with other peers.

There are other shortcomings, such as a bad ranking function, but in my opinion, the three aspects above are the biggest ones that CiteSeer needs to work on to keep it competative.

Maybe we can do a different model. Instead of building a digital library search engine for all kinds of papers, we should go back to the root, and focus on computer science/information science papers. CiteSeer just does not have the resources to compete with Google Scholar, but we can do much better in specialized search to draw attention of a certain group of people.

The justification of this model resides on a strong bias on the CiteSeer collection: we only process PDF documents, i.e., all papers on CiteSeer are downloadable. However, not all academic papers have a free full-text version, such as a manuscript or a pre-print version. For example, in contrary to many computer and information scientiests, most astronomers (an physicists) do not post free full-text versions of their papers on their home pages.

280 views0 comments

Recent Posts

See All
bottom of page