l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
August 5: Social gathering
Next Installfest:
TBD
Latest News:
Jul. 4: July, August and September: Security, Photography and Programming for Kids
Page last updated:
2005 Jan 13 14:51

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] [help@google.com: Re: [#19464334] Searching fordotfiles]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] [help@google.com: Re: [#19464334] Searching fordotfiles]



On Thu, 13 Jan 2005, Peter Jay Salzman wrote:

> On Thu 13 Jan 05,  1:25 PM, Mark K. Kim <lugod@cbreak.org> said:
[snip for brevity]
> > The syntax for /etc/vimrc (or more accurately, /usr/share/vim/.../vimrc)
>
> Why more accurately?

Debian testing (my desktop):

  $ls /etc/vimrc
  ls: /etc/vimrc: No such file or directory
  $ls /usr/share/vim/vimrc
  /usr/share/vim/vimrc

Redhat (bolt.sonic.net):

  $ls /etc/vimrc
  ls: /etc/vimrc: No such file or directory
  $vim --version
  ...
  system vimrc file: "$VIM/vimrc"
  ...
  fall-back for $VIM: "/usr/local/share/vim"

SunOS (aludra.usc.edu):

  $ls /etc/vimrc
  /etc/vimrc: No such file or directory
  $vim --version
  ...
  system vimrc file: "$VIM/vimrc"
  ...
  fall-back for $VIM: "/usr/usc/vim/6.1/share/vim"

Etc.  "local" is dropped on system installations, and USC has a custom
installation directory.

> > is same as that of ~/etc/.vimrc.  So why would one be motivated to search
> > for .vimrc when one simply wants to find out the syntax of vimrc files in
> > general?
>
> Because most people are users who edit .vimrc.  Not administrators who edit
> /etc/vimrc.

If you're looking for vimrc file format but ".vimrc" works, then you rule
out all pages that may have useful information but mentions only "vimrc"
not ".vimrc".  It also targets a more specific group of users (UNIX users
without admin privileges), which means less benefit for the bigger group
(UNIX users).  But the reality is, most UNIX users know that most dotfiles
have a global counterpart, so they'd prefer to search for either.
".forward" is an exception to the case.

> > In such scenario, the search engine dropping the dot is the
> > right behavior.
>
> That, my friend, is called axiomizing an unproven principle.  :)

????

Engrish are me 2th langage...

No, I'm saying that the *optimal behavior* for the search engine is also
the *generic behavior* for the search engine that *can* benefit a greater
group of people, therefore there's no compelling argument to add such a
feature.  ".forward", once again, is an exception, but can be handled
within the bounds of the system by adding extra search phrases -- once
again, there's no compelling argument to add the extra feature.

> > The only scenario I can think of where searching
> > specifically for ".vimrc" is more desirable over "vimrc" is if "vimrc"
> > isn't searchable, making ".vimrc" the more appealing search term, which is
> > the case for ".forward" because "forward" is a generic English term.
>
> Good example!  Can you think of any others?

It's a continuation of an argument.  Please read all three paragraphs.

> > But let's look at the statistics.  How many dotfiles do you know that
> > isn't searchable besides .forward because it conflicts with another, more
> > popular or broader, term?  Just go ahead and check your home directory.
>
> .acrobat
> .adobe
> .audacity
> .Audacity
> .bidwatcher
> .bittornado
> .bittorrent
> .calendar
> .cvs
> .ddd
> .gnome
> .gtk
> .hugo
> .loki

.acrobat - you're not meant to edit it. it's also a directory.
.adobe - you're not meant to edit it. it's also a directory.
.audacity - graphically editable.
.Audacity - graphically editable.
.bidwatcher - what is this? bet it's not meant to be edited by hand.
.bittornado - you're not meant to edit it. it's also a directory.
.bittorrent - you're not meant to edit it. it's also a directory.
.calendar - what is this? bet it's not meant to be edited by hand.
.cvs - you're not meant to edit it. it's also a directory.
.ddd - it's a directory. internals are editable graphically.
.gnome - it's a directory. internals are editable graphically.
.gtk - it's a directory. search for gtkrc if you want to edit the rc.
.hugo - what is this?
.loki - you're not meant to edit it. it's also a directory.

> Many others.

Names like "GTK" is unique enough that all searches will be GTK-related.
If you can't refine your search enough to find what you need, it's
probably not an indexed page.  In that case, many of them have homepages
and mailing list you can search easily.  If it's not in their docs, it's
not meant to be edited by hand, but if you must you can ask on their
mailing list or FAQ.

> > Knowing this statistics, can a gigantic search engine like Google (or any
> > generic search engine) justify making an exception for such class of
> > files?
>
> You already know my answer.  I assume that's supposed be rhetorical?

-.-'

I guess.

> > If Google started making exceptions for everything, it really is going to
> > slow it down for everone.
>
> Will it?
>
>    if (serachterm does not have dot as first character)
>       go about my normal business

Nope.  If they make an exception for everything with "if" statements,
then they're gonna end up with a code like this:

   if (not dotfile) then     # what you want
     if (not c++) then       # what google already does
       if (not in france) {  # france govt demands filtering of nazi pages
         ...
            then go about normal business

Then there are cases for all the "else"es.

All these exceptions will add more and more delay to every single search
you make, whether you're searching for dotfiles, not dotfiles, whatever.

Now as a Google engineer, can you justify the addition of this specific
exception for dotfiles when everyone wants Google customized for their own
circles?

> > And remember Google has to make these
> > exceptions for the googols of searches it really does to every day.  Each
> > exception should be justified, and in my opinion, dotfiles isn't
> > justifiable.
>
> Spoil sport!!!

=P

> > > > I don't think a generic search engine like Google should be that specific.
> > >
> > > I'm simply advocating returning searches for "foo" if you ask for "foo".
> > > How is that specific?
> >
> > Then it should make that exception for every word, not just for dotfiles.
>
> Please justify this statement.

Making an exception for dotfiles benefits only a small group of people --
the UNIX users.  I call this "specific" since it's specific to a small
percentage of people that use Google.  To be more general, and benefit a
greater group of people, you'd have to allow ALL searches with symbols in
them.  But that's just waaaaaaay to much data to index if you think about
all the unique symbol combinations that exist (just think of all the pages
about regex -- pretty much every single regex example is unique on the
Internet. It's better to not index any of those, at least from Google's
perspective.)

> > But as they explained, doing that is too much for Google.
>
> And as Rick explained, that was a boiler plate response.  They fooled you
> with a canned response.   :(

No no no, Rick was saying that the e-mail was a boiler-plate response and
I agree.  And the original e-mail won't end up anywhere.

But the statistical information about what the users as a whole want will
be reflected to the engineers at some point, and just as they made an
exception for the "c++" search term because it's so common, I think
they'll make an exception for dotfiles IF there is enough demand for it.

> > And as I mentioned in an earlier e-mail, making an exception for all words
> > with punctuation in front of the word has some problems as well.
>
> And as I explained, I don't care about all words with just any punctuation
> in front of them.

Google doesn't care what any small group of people want.  They want to
benefit a greater group of people.  And I'm simply stating why it's not
beneficial for Google to add support for dotfiles and why they're making
the right call on this.

> > > > It's up to the person doing the search to use the right keywords:
> > > >
> > > >    forward unix email
> > >
> > > Perhaps.  Good point.  But "forward unix email" isn't quite what a person
> > > searching for information on ".forward" is looking for.  Results can range
> > > anywhere from exactly what the person wants to configuring sendmail.
> >
> > Yes, but the right query will bring the right page to the top.
>
> Maybe.  Maybe not.

Let's analyze that when we end up with another search term people struggle
with.  But as I demonstrated, .forward file can be brought up to the top
with a few extra words, and, as I stated, when a term can be searched
within the supported structure of the search engine, there's no incentive
for the search engine to modify itself.  So if there's a need to modify
the engine, let's find that unsearchable term first before we criticize
the engine that has been so good to us so far.

-Mark

PS: Remember when all you could do with Google was just type in words?  No
boolean, no spaces, no symbols, no nothing...


-- 
Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.php?uid=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
PGP key available on the homepage
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
EDGE Tech Corp.
For donating some give-aways for our meetings.