Are we wasting our time?/Some thoughts...
carolynwhite2
carolynwhite2 at aol.com
Fri Sep 24 14:41:59 UTC 2004
There seem to be two issues here, or possibly three if you include
the legal POV.
Firstly, what people are likely to want from an improved search
function.
Secondly, whether the way we are working is the best/most efficient
way of creating the improved search.
(1) What do people want?
IMO, people want the following:
(a) to search for threads on topics/characters
(b) to search for posts by particular author
(c) to be offered content to browse
(d) to have irrelevant or repetitive content filtered out
(e) to have their attention drawn to outstanding posts
I think it is important to have functions that enable you to look for
what you want, but also menus to browse, which may prompt a whole
string of new ideas. People really have no idea what is in the
archives, and alongside their quest to find everything ever written
on eg Snape, they should also be offered the chance to find
collections of posts on a myriad of other topics that they'd never
even thought of.
Once people find the type of content they are looking for, I cannot
see why anyone would want to plough through endlessly repetitive,
incorrect or trivial/mainly OT posts in their attempt to read up on
past ideas. By the same token, surely anyone would like to have what
are considered to be FPs flagged up, if only for the opportunity of
disagreeing with the assessment?
I am stating the blindingly obvious perhaps, but I think it is
necessary in order to point up the disadvantages of simply text-
searching the complete body of past posts. Anyone who uses Internet
search engines regularly will be painfully aware of the difference
between knowing exactly what you are looking for and locating it, and
searching for relevant hits on questions you are still formulating;
the rubbish quotient is formidable. As Debbie says (648):
>>>Text searching on a CD will bring up thousands of posts that will
be
eliminated from the catalogue. Thus, the catalogue should be a much
better tool for directing readers to posts that are worth reading.
As a result, if we ever want FPs to actually be written, we'll want
a catalogue.<<<<
This issue has come up before. David (526) said:
>>>I confess part of my thinking here is that, given we have a
database
of the entire list here, why not use it? It would be great to be able
to call up all posts that, say, include the text
string 'Hermione' and are categorised as 'Snape'. This would give a
different (but overlapping) set to those categorised as 'Hermione'
and including the string 'Snape'. What about all posts
categorised 'Snape' but *not* including the text string 'vampire'?
Or all such posts whose author was Pippin, posted between GOF
release and OOP release. Really, I think you are sitting on
something of a goldmine here, and to focus only on the categories is
to miss out.<<<
My point is that it is important to have both the category approach,
to prompt investigation, plus other types of search functions to
locate content when you are more sure of what you are looking for.
However, I am saying that it is not worth our while, or IMO, the
members' time, to include all past posts in the content to be
searched. Currently we are rejecting over 60%, and I think that is a
good thing.
Paul has recently put forward some new ideas on how to build this
part of the catalogue which we are going to discuss with Tim shortly.
Just as soon as something clear emerges, I'll post it here for
comment from you all. It is perhaps the second most important part of
the catalogue, apart from the initial weeding, sorting and
categorising of the posts themselves.
Finally, Paul suggests (650) that the practicality of the CD approach
is severely limited by people's computing power:
>>>Once they get the files,
their PC must be zippy enough to search the files. I imagine only a
tiny percentage of people have the tools to speedily and conveniently
search these files. I know I don't. (Example I know would be X1 or
Lotus Magellan.)<<<
The eventual catalogue presented to the members will be easily
searchable by anyone who can access the HPfGU website. To aim for
anything less defeats the object - which is to try and get everyone
to read, think and add to what has gone before, rather than keep re-
inventing the wheel.
I also really like Tim's idea of have a 'what's hot' section
eventually, which keeps tabs on the areas attracting most posts at
any particular time. Ahem, this assumes a catalogue team able to
continue on and on into the far distant future..
(2) Our approach - right/wrong?
Dan comments (647):
>>>Carolyn - oh, absolutely, if it's possible. This is what I was
doing on my own
as well. I always felt that the catalogue project, if possible,
should take this
kind of thing and provide "approaches" to the text, ways to look at
it,
suggestions of catagories or kwic (keyword in context) ways to see
it, results
from some people's "regular expression" searches on the material.
What it doesn't do, however, is deal with cataloguing FPs.<<<<
>From a past YM conversation with Dan, what I think he is referring to
here is an approach which takes the existing body of 100,000+ posts,
and first of all sets out to allocate them to a number of main
catagories. He then suggests that we go over those five main
categories again using ever finer definitions.
The five main categories he suggested we might use were: reject,
meta, star (funny, cute etc.), plot (incl characters, WW), outcome
theory. He felt that reading posts initially to reject or accept
would be very fast, and would quickly separate out the wheat from the
chaff. The task of refining the coding on the accepted posts would
then be a lot quicker.
The kwic (key word in context) approach would be used (I think) to
determine both the initial main categories and the subsequent sub-
categories. Kwic describes how people quickly decide what gets their
real attention, and what is just scanned (for example in assessing
posts to read on the main list). It equates to a list of personal
buzz words/phrases, in effect.
Essentially, he is arguing against building up/refining the list of
categories from scratch as we have done, based on what we find post-
by-post, but proposing a top-down approach, based on people's
collective search preferences. [Correct me if I have misunderstood,
Dan; I am also not sure how far you are suggesting that any of this
is automated].
My reservations about this approach are as follows:
(a) It is just not possible to run through 100 000+ posts any quicker
than we are doing. If we are reading them once, and deciding whether
to accept or reject, in my view it takes very little extra time to
add the appropriate coding. The ones that take a long time to read
and code would take just as long either route.
(b) The resulting main categories and sub-categories from the kwic
approach are unlikely to be vastly different from those we are
working with anyway.
(c) The organic, bottom-up approach reflects the way the list
developed and evolves with the subjects. As we continue, we will be
able to see which subjects attract the most posts in a highly
scientific way, and make decisions on which topics to drop through
lack of interest, and which to fragment further to reflect their
growing complexity as ideas build on ideas. I think this reflects the
membership's interests over time as accurately as Dan's proposed
method.
However, if there is a consensus that we have set ourselves an un-
doable task and we would be better to stop and re-think, and consider
a different approach, then do make some constructive suggestions, all
of you.
Carolyn
Masochistically pleased to be able to get back into the catalogue
again after a few days absence.
More information about the HPFGU-Catalogue
archive