Glory be! The ratio! (OoP)

Wed Jan 15 16:12:50 UTC 2003

> Tabouli:
> >The length of these books seems to be going up exponentially. In 
fact... this
> >sounds like a mission for David. There must be some mathematical 
way to check
> >whether the page count best fits an exponential curve, mustn't 
there?

Ksnidget:
> Well at the risk of displaying my inner geek in public.
> 
> Using the following data (page numbers for earlier books taken
> from the Bloomsbury shop online web site)
> 
> 1	    190
> 2	   256
> 3	   317
> 4   	636
> 5   	768

Oh, a *rival*.

OK, lets quarrel about the data first.

You have used Bloomsbury's pagination: the trouble with this is that 
GOF was produced in a larger font (but same page size) than the 
first three.  And we don't know what the font will be for OOP.

Because of this, I have used the mid ranges of Devin's data on 
Scholastic, who have been consistent: see message 9849 here.

You can then take the figures of 191,000 and 255,000 words for GOF 
and OOP from Bloomsbury's announcement to estimate a Scholastic-
equivalent pages number of 981 for OOP, assuming that JKR's average 
word length is unchanged.

That gives the following:

Linear: R2 0.90, B6 1100, B7 1250
Quadratic: R2 0.99 B6 1400, B7 1900
Exponential: R2 0.94 B6 1300, B7 1700

all by comparison with the Scholastic editions.

Quadratic is bound to give a better fit (to the existing data) than 
linear since it has an extra degree of freedom.

The exponential fit is pretty good, though it's impossible to answer 
Tabouli's question about 'best'.  We only have 5 data points so it 
is easy to find a curve (e.g. a quartic) that fits perfectly (ie R2 
= 1) but gives virtually no confidence about the length of future 
books.  Indeed the best fit quartic predicts 500 pages for book 6 
and *minus* 1800 for book 7.  So much for curve fitting.

I would view any simple extrapolation based on curve fitting alone 
with extreme suspicion without any model of the actual writing 
process.  Why should the books be getting longer?  If we can explain 
that we might have a basis for prediction.  Most series I can think 
of have tended to maintain a fairly consistent book length.

After all, a decent model should be able to say something about FB 
and QTTA, too.

In message 9881 I suggested bases for calculation - which shows how 
wrong one can be, as I plumped for the *average* of the previous 
books as the best estimator (this would now lead to 560 pages for 6 
and 7).

David