Welcome and questions on Stage One
a_reader2003
carolynwhite2 at aol.com
Mon Feb 23 18:57:42 UTC 2004
Welcome everyone to the new HP catalogue group, and thanks again to
Kelley for setting this up so quickly for us; I hope it helps with
communication.
As we have all got the various background documents, I'll launch
straightaway into a series of topics that we probably need to discuss
and agree before we go on. I've divided them up into different posts,
as it will probably help to deal with them separately. The topics
are: -
1 (this post) stage one of the plan and some technical stuff
2 stage two reject codes
3 stage two subject categories
Looking forward to your responses and further questions of your own
STAGE ONE
We are looking to index approximately 100 000 posts on the main,
current HPfGU list, plus approx 7000 posts on a pre-Aug 2000 archive
list. (The early list is the beginning of the group before it
transferred to Yahoo.)
It's a lot of posts, and the first time-consuming task is to get the
complete message index in one place for people to work from. In my
plan, under `Stage One', I simply suggested cutting and pasting the
message indexes from the HPfGU site onto Excel worksheets. There are
two possible ways this can be speeded up.
Firstly, Paul Kippes, the Admin team's technowizard (who maintains
the back-up archives) could probably cut and paste the whole index in
one go onto Excel for us. Kelley has just sent him an email to see if
he would be prepared to give us some help on this project, and this
is one of the first questions I'd like to put to him. My other
questions, apart from can he do it are:
- what is the best size of spreadsheet to work with ? (Units of
500, 1000 posts or whatever ? Threads are no respecter of arbitrary
boundaries.. it has to be easy to follow on from one sheet to the
next when tracing an argument)
- where should the Excel sheets be based on our individual
computers, or on a server somewhere ? They certainly need to end up
in one central place for obvious reasons.
Secondly, there is a rather more complex and time-consuming solution,
which would save a lot of time later. The gist of this is that all
the old posts, with their existing headings could be put into a
database (not Excel), and Carolina could write a link programme
between this database and the website we use to access it, to enable
us to not only work simultaneously on it, but also to overcome any
computer incompatibility problems that might crop up, especially as
more people work on the project as we go on. My questions relating to
this solution are:
- Can the archive posts just be dumped into such a database,
and would you do it all at once, or in orderly chunks of eg 1000 at a
time?
- How long would it take?
- Where would the database be located (what server, as before
with the Excel sheets)
- Which website would we use to access it and would there be
any major problems in multiple people accessing it any one time?
- will it still be possible to speed read the posts one after
another if we do this? (central to rejecting and coding them up)
The overall advantage of this second solution, although it sounds
rather complex, is that once you have coded up the posts, you can
then do much more sophisticated searches for topics than you would
be able to do in Excel. This will help later, when you want to group
sets of posts together in threads for further editing (if we ever get
that far..). It would also become a great resource for HPfGU members
to search, if it were put up on the main site in a read-only form.
However, if we stick with Excel and solution one, it is possible to
write search routines that can find strings of words and characters,
to enable us to group together individual posts belonging to threads.
Thoughts please (and Carolina, apologies if I have not explained
correctly).
Carolyn
More information about the HPFGU-Catalogue
archive