Stamp Community Family of Web Sites
Thousands of stamps, consistently graded, competitively priced and hundreds of in-depth blog posts to read








Stamp Community Forum
 
Username:
Password:
Save Password
Forgot your Password?

This page may contain links that result in small commissions to keep this free site up and running.

Welcome Guest! Registering and/or logging in will remove the anchor (bottom) ads. It's Free!

Google Books Ngram Viewer For Stamp Collecting

 
To participate in the forum you must log in or register.
Author Previous TopicReplies: 7 / Views: 1,809Next Topic  
Pillar Of The Community
United States
837 Posts
Posted 10/30/2015   11:10 pm  Show Profile Bookmark this topic Add landoquakes to your friends list Get a Link to this Message
My son had a homework assignment that looked at Google Ngram Viewer that shows the appearance of certain phrases in books over time. Check out the Stamp Collecting Ngram, the peak in the 1940's makes sense with a slow decline in recent decades...


http://google.about.com/od/n/a/Goog...m-Viewer.htm


Send note to Staff
Edited by landoquakes - 10/30/2015 11:14 pm

Pillar Of The Community
United States
772 Posts
Posted 11/05/2015   3:46 pm  Show Profile Bookmark this reply Add chris2015 to your friends list  Get a Link to this Reply
Interesting. But wasn't stamp collecting quite popular in the 1920s and 1930s as well?
Send note to Staff  Go to Top of Page
Pillar Of The Community
United States
1624 Posts
Posted 11/05/2015   7:38 pm  Show Profile Bookmark this reply Add sdtom to your friends list  Get a Link to this Reply
There is certainly a decline.
Send note to Staff  Go to Top of Page
Moderator
1589 Posts
Posted 11/06/2015   09:21 am  Show Profile Bookmark this reply Add blcjr to your friends list  Get a Link to this Reply

Quote:
Interesting. But wasn't stamp collecting quite popular in the 1920s and 1930s as well?
Well, that's more or less what it shows. The top line is rising rapidly through the '20s and '30s, and through most of the '30s is higher then than it is now. WWII obviously impacted the hobby, and it is a bit curious why the decline that begins in 1940 did not level off or recover after the war. But the war brought about a lot of cultural and social change, and this appears to have impacted stamp collecting for years to come.

I've used NGRAM on SCF before (but don't recall the specific topics offhand). It is a pretty interesting tool.
Send note to Staff  Go to Top of Page
Moderator
Learn More...
United States
12330 Posts
Posted 11/06/2015   6:04 pm  Show Profile Bookmark this reply Add 51studebaker to your friends list  Get a Link to this Reply
I am unsure how to interpret NGRAM results. Google scans and indexes books in public libraries. Which libraries? Which books? All Books? Some Books? Only books which are in the public domain I assume. Without knowing which books were selected, which are clearly a subset of all books, I am unsure how many solid conclusions can be made.
Don
Send note to Staff  Go to Top of Page
Moderator
1589 Posts
Posted 11/06/2015   8:42 pm  Show Profile Bookmark this reply Add blcjr to your friends list  Get a Link to this Reply
Cannot answer all of Don's questions, but I do think Google does not limit itself to public domain works based on the lawsuit they just won.

https://en.wikipedia.org/wiki/Autho...Google,_Inc.
Send note to Staff  Go to Top of Page
Moderator
Learn More...
United States
12330 Posts
Posted 11/07/2015   02:39 am  Show Profile Bookmark this reply Add 51studebaker to your friends list  Get a Link to this Reply
blcjr,
Good day.
Yes, that was an interesting judgment and appeal. To this day the scope of the Google index is still unclear to me; they scan the entire book but can only display 'snippets'. So if I search for the word 'dog' they cannot return every instance of the word dog in a copyrighted book. But perhaps this doesn't matter for NGRAM since it is only counting the total number of books that contain the word 'dog'.

But I am still curious about the scope of the scanned books. Not all libraries are created equal; some might be heavier on research, fiction, or contain older vs. newer publications. How many libraries have they included? How do they choose the libraries? We also are unsure of how complete they are, do they scan 10%, 50% or 100% of all books in any given library? If they aren't scanning 100% of all books, what criteria is being used to select which books are scanned? For example, do the scan and include reference books like catalogs in the same percentage as fiction books?

I am also curious to know more about the date ranges that are included. In general, has the total number of books being published (and selected for inclusion in a library) grown over the years? It seems to me that any NGRAM chart screams for a baseline based upon this. Compare the number of books with the key words compared to the total number of books covered for that timeframe. Without this info, conclusions can become dicey. Was there a general surge in the total number of books included during the 1930s and 1940s and the chart above is simply reflecting that?

If they have a big enough database then some of these questions might not be important; statistically they may be able to overcome the scope of the scanning. But sadly, we simply do not know. And when we do not know, we have to question the value. Perhaps this is why this tool is considered 'obscure'?
Don
Send note to Staff  Go to Top of Page
Edited by 51studebaker - 11/07/2015 02:43 am
Moderator
1589 Posts
Posted 11/07/2015   07:08 am  Show Profile Bookmark this reply Add blcjr to your friends list  Get a Link to this Reply

Quote:
If they have a big enough database then some of these questions might not be important; statistically they may be able to overcome the scope of the scanning. But sadly, we simply do not know. And when we do not know, we have to question the value. Perhaps this is why this tool is considered 'obscure'?

Don,

I don't think "we simply do not know" is entirely accurate. Google developers have published academic descriptions of the database (a couple of links below) and strengths and weaknesses of it have been covered in the popular press, and in trade press for librarians (if I recall correctly). So you could find answers to some of your questions if you looked hard enough I think. The few times I've used it, I've always found the results intuitively reasonable. Thinking about it, I recalled a search I did once on the usage of "airmail" vs. "air mail" and still had the image of the search on my computer. Here it is:



For casual inquiries like I've made with it, I'm not too concerned about the questions you've raised. I think they are valid questions, but I also think the database and methodology, while no doubt not entirely free of criticism, would probably hold up pretty well against the questions you've raised.

If interested, here are a couple of links to technical papers (PDF):

http://aclweb.org/anthology/P/P12/P12-3029.pdf

http://www.dipanjandas.com/files/acl2014ngrams.pdf

Basil
Send note to Staff  Go to Top of Page
  Previous TopicReplies: 7 / Views: 1,809Next Topic  
 
To participate in the forum you must log in or register.

Go to Top of Page

Disclaimer: While a tremendous amount of effort goes into ensuring the accuracy of the information contained in this site, Stamp Community assumes no liability for errors. Copyright 2005 - 2026 Stamp Community Family - All rights reserved worldwide. Use of any images or content on this website without prior written permission of Stamp Community or the original lender is strictly prohibited.
Privacy Policy / Terms of Use    Advertise Here
Stamp Community Forum © 2007 - 2026 Stamp Community Forums
It took 0.16 seconds to lick this stamp. Powered By: Snitz Forums 2000 Version 3.4.05