[Prev][Next][Index][Thread]

Re: Tale and the SRH-reorg (Was: Re: Charter changes?)



In article <ghenDuGMCI.B7w@netcom.com>,
Ajay Shah  <ajay@mercury.aichem.arizona.edu> wrote:
>Anotyher untruthful claim based on untruthful and perhaps intentionally skewed
>statistics, compiled during the first 2-3 months of the operation of
>soc.religion.hindu.  

Look at http://www-ece.rice.edu/~vijaypai/srh-meth.html

SRH-Stats: an elaboration

About the soc.religion.hindu stats page...

Since I was the one who compiled most of the data on the SRH
Statistics page, http://www-ece.rice.edu/~vijaypai/srh-stats.html, I
feel that I should be the one to explain how the data was compiled,
what the data means, and how one can independently verify its
correctness.  Hopefully, this article will put to rest the baseless
"skewed stats" claims people have been making.

How the data was compiled: 

I downloaded the entire contents of the SRH archive one night, and
stored it on my local machine. From this, I extracted all of the
posting date lines in the files. The archive is built with a program
called MHonArc, so as a result, it has a predictable
structure. MHonArc can be used as either a mail archive, or as a news
archive. When it is used as a mail archive, the submissions to SRH get
archive. When it is used as a news archive, the articles themselves,
as they would appear in the newsgroup, get stored. If it is used as a
mail archive, it cannot generate thread indexes, because the
messageID's for the mail submissions are not the same as the
messageID's for the posts sent on the newsgroup. If you notice, all of
the posts are from uc.edu, whereas the mail messageIDs will originate
>from  your host. Therefore, a quick check of the SRH archive will show
that it is a news archive, not a mail archive.

The datestamps in the archive show the time the articles appeared on
the newsgroup itself, and from this, we can determine how frequently
Ajay cleared messages. Since the messages are cleared in "batches",
the datestamps will also appear clustered. From a list of the
datestamps, it's quite easy to see when a new "batch" was cleared,
since there's a long gap between two consecutive messages. So, I took
all of the articles in the archives, extracted their datestamps, and
worked from there.


What the data means:

The SRH Stats page goes to a fair amount of trouble to explain what
the data means, but it's not surprising that some people have not read
it, or give explanations which try to avoid it. The main part of the
data is a graph which shows when posts were approved, and the time
since the last approval event. The time between approvals is called
the approval gap, and is a reflection of how often the moderator
clears posts. It is not based on anything other than the articles
which appear on SRH, since the only timestamp data available from the
archive are the times when the articles appeared on SRH.

Now, someone might claim that the timestamps are not tied to the times
the articles were cleared, but instead are the times when the articles
were archived, but that is clearly incorrect, and the next section
will explain how to verify that the claim is false.

It's been claimed that the 10-day delay is "mostly hoax", whatever
that means. It has been suggested that some of those articles required
bouncing from one address to another, or consultation with the author,
or something else. However, what this claim misses is that the delay
measured isn't the delay from the time the article was submitted to
the time it was posted - it is the delay between approval events. So,
the 10 day gap is not a maximum at all. In fact, if the moderator
claims that posts can wait in some queue before they get considered,
and if this happened to any of the posts that were posted after that
10-day gap, then these posts could have encountered _more_ than 10
days between their submission and their approval. However, I have made
no attempt to measure this, because that data is not available from
the archives.

Further evidence which suggest that the "mostly hoax" explanation is
itself a hoax is the chart of posts cleared versus date
(http://www-ece.rice.edu/~vijaypai/plot4.gif). Note that on the day
after the 10-day gap, 22 posts were cleared. I wouldn't suggest that
all 22 of them were submitted immediately after the previous approval
event, but the data on the graph of # posts versus delay
(http://www-ece.rice.edu/~vijaypai/plot5.gif) seems to suggest that
around 10 of them were probably submitted immediately. So, the
explanations offered for the delay clearly don't explain away the
10-day delay. On one hand, the "explanation" provided by the moderator
doesn't factor into the measurement of the approval gap, and on the
other hand, the gap affects more than a mere handful of posts.


Independent verification:

The crux of verification is that the datestamps reflect the time the
post appeared on the newsgroup, not the time the post was sent to the
archive. Independent verification is easy, since there are tools which
archive UseNet. One of these is called Alta Vista, and is available at
http://altavista.digital.com. Another is DejaNews, and it is available
at http://www.dejanews.com. Using either of these tools, you can find
articles which were posted to soc.religion.hindu. Once you find an
article, you can compare its timestamp with the timestamp of the same
article in the archives. By doing so, you can see that the timestamps
in the archives reflect the approval time, not the archive time.


In conclusion, I stand by the correctness of the SRH Stats page, and
if anyone still wishes to make the claim that the stats are "skewed",
the burden of proof is now upon them.

-Vivek


Follow-Ups:
Advertise with us!
This site is part of Dharma Universe LLC websites.
Copyrighted 2009-2015, Dharma Universe.