Topic: standarizing a compatibility metric


Author: allan_w@my-dejanews.com (Allan W)
Date: Wed, 12 Feb 2003 20:08:13 +0000 (UTC)
Raw View
danielgutson@hotmail.com (danielgutson@hotmail.com) wrote
> What I propose is standarize a list of opensource applications, and
> measure how many times a [new candidate] keyword appears there.
> I know how inaccurate this might be, but I think this could be
> something for starting instead of 'nothing', just as a metric.

As you recognize, the biggest problem with new keywords is that it
could conflict with existing code. At some point, each shop would
have to get someone to go through every module of source code
looking for the keyword. This could be done en masse before
adopting a new compiler that implements the keyword, or it could
be done on a per-project basis the first time that the project
would be recompiled with that new compiler. Either way, the process
is slow, painful, and error-prone.

What you're suggesting is doing a limited version of this process
in advance. Rather than using our own proprietary code (the code that
we have a vested interest in maintaining!), you suggest that we use
opensource code -- presumably because it's both neutral and verifiable.

The trouble is, doing this is still slow, painful, and error-prone.

What might make more sense, is for one interested party (i.e. you)
do this in advance, for ALL possible keywords.
   1. Collect as much opensource source code as you can. Record the
      date and version number of each source for reference purposes.
   2. Write a preprocessor that strips out comments and literals,
      and tokenizes what's left, except for non-alphanumerics.
   3. Combine, sort, and count.
   4. Make the database (and the preprocessor) available publicly.
Slightly more work than your original proposal. But this database
need only be generated once (well, maybe once a year). It would be
invaluable for all new proposals that might introduce new keywords.

Hypothetical example:
    "I looked it up -- you're right, the word 'with' was used 3
    times. We should use 'withobject' instead, which wasn't found
    at all."

It might also be interesting to examine the usage statistics of
existing keywords.
    "The word 'using' was found 15,200 times, but the word
    'mutable' was found only 4,218 times..."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: danielgutson@hotmail.com (danielgutson@hotmail.com)
Date: Fri, 14 Feb 2003 06:38:15 +0000 (UTC)
Raw View
allan_w@my-dejanews.com (Allan W) wrote in message news:<7f2735a5.0302121107.166733a7@posting.google.com>...
> What might make more sense, is for one interested party (i.e. you)
> do this in advance, for ALL possible keywords.

Wow, never late for surprises :) I thought that nobody read the
message :)

Ok, I agree with your suggestion, and I will do it (hopelessly non
uselessly).

I will include in my personal page an entry for suggesting
opensources, submitting the URL and the 'domain' (application type:
user interface, scientific, etc.).

I will make a simple preprocessor (already have one) that extracts
comments, and split 'keywords' versus 'identifiers'.
The statistical information I propose to gather is:
 [word, ident/keyword] ---< [application domain, # occurencies]

 ( ---< means one-to-many )

and also the following:
   total_occurrencies / total_LOC
   total_occurrencies / total_#_sources

Finally, a ranking of identifiers sorted by these two coefficients
will be available.

A simple querying form will be available also in my page.

Any additional suggestion?

      Daniel.

ps: no promises about 'when'  :)
ps2: an experiment for answering '0' occurrencies, gotten from a
dictionary (never used; I guess that 'acquitance' was an example ;) )

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: danielgutson@hotmail.com (danielgutson@hotmail.com)
Date: Wed, 29 Jan 2003 14:05:36 +0000 (UTC)
Raw View
Hi there,
  I suggest some 'compatibility' metric for proposals.
This can be very complex, but for starting we could focus on new
keywords.
What I propose is standarize a list of opensource applications, and
measure how many times a [new candidate] keyword appears there.
I know how inaccurate this might be, but I think this could be
something for starting instead of 'nothing', just as a metric.

1 - I guess everybody agree that some kind of metric should be used.
2 - given that, I propose to vote a reference list of opensource
applications;
This list could involve different 'kinds' of applications (GUIs,
compilers, etc.) and libraries.

What do you think about this?

pd: dreaming, ideally, if this works, somebody could provide a site
for this.

  Daniel.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]