Quote:
If they have a big enough database then some of these questions might not be important; statistically they may be able to overcome the scope of the scanning. But sadly, we simply do not know. And when we do not know, we have to question the value. Perhaps this is why this tool is considered 'obscure'?
Don,
I don't think "we simply do not know" is entirely accurate. Google developers have published academic descriptions of the database (a couple of links below) and strengths and weaknesses of it have been covered in the popular press, and in trade press for librarians (if I recall correctly). So you could find answers to some of your questions if you looked hard enough I think. The few times I've used it, I've always found the results intuitively reasonable. Thinking about it, I recalled a search I did once on the usage of "airmail" vs. "air mail" and still had the image of the search on my computer. Here it is:

For casual inquiries like I've made with it, I'm not too concerned about the questions you've raised. I think they are valid questions, but I also think the database and methodology, while no doubt not entirely free of criticism, would probably hold up pretty well against the questions you've raised.
If interested, here are a couple of links to technical papers (PDF):
http://aclweb.org/anthology/P/P12/P12-3029.pdfhttp://www.dipanjandas.com/files/acl2014ngrams.pdfBasil