The number of dies used to produce a particular coin issue is of considerable interest to both numismatists and those interested in the economics of the ancient world in general. For the benefit of interested readers I summarise here some formulae which can be used for estimating the number of dies from the data now available to us*.
Basic estimators – Good’s Formula.
For die estimation we would ideally have a large sample of the type in question, with full details of the make up of the sample, i.e, we would know the total number of coins in the sample (n), the number of “singletons”, i.e, dies represented by only one coin (d1), the number of dies represented by two coins (d2), and so on, together with the total number of different dies represented in the sample (d).
In practice we will often have only some of these figures, but fortunately we don’t need all of them to obtain a reasonable estimate of the total number of dies originally used, which we will denote by D.
The basic factor we have to estimate is the “coverage” of the sample, meaning, in the current context, the ratio of the number of coins originally produced by the dies in the sample to the total number of coins produced by all dies. If we assume, for the moment, that all dies produced equal numbers of coins, then the coverage will also equal the ratio d/D, so that the coverage, C, will be given by the ratio:
(1) C = d/D = (D – d0)/D = 1 – d0/D
where d0 is the (unknown) number of dies not present in the sample.
Now you might think that the factor C depends in a complicated way on the detailed distribution of the coins in the sample, but in fact, following on the work of Good in 1953, it turns out that C can be estimated quite generally by the simple formula**:
(2) Ce = 1 – d1/n
so that our first approximation to the total number of dies D, which we can write as D0, is just
(3) D0 = d/Ce = d/(1 – d1/n)
which can also be written as D0 = nd/(n – d1)#.
To get a feel for this formula we note that for (relatively) small samples, (i.e, samples where n << D, so that the sample includes only a small fraction of the original dies), then there will be few die duplicates, and most of the coins in the sample will be singletons. In this case the coverage as defined above will be small; mathematically, d1 won’t be much less than n, and hence Ce will be considerably less than one, so that D0 in formula (3) will be >> d, the number of different dies in the sample.
For the simple case where we have n coins with only one die match, then d1 will equal n-2, so that D0 = ½ nd (or ½ n2 close enough) – with two die matches, D0 = 1/4 nd, and so on (for not too many matches).
On the other hand, if most of the dies in the sample are represented more than once, then the sample will probably include most of the original dies, so that in this case there will be few singletons, d1 will be << n, and hence Ce will be not much less than 1. As a result D0 will be not much greater than d, and in fact we can then write D0 = d(1 + d1/n), approximately. The above formula (3) is a good start, but it makes a key simplifying assumption, namely that each die produced exactly the same number of coins. This is of course quite unrealistic, and probably tends to underestimate the actual number of dies because low output dies will tend not to show up in the sample; we therefore need to improve the formula by effectively including a “spread” factor (S) to allow for the spread in the number of coins produced by the different dies, so that our basic formula now becomes: (4) D = D0 x S = d/(1 – d1/n) x S So how to estimate the spread factor S? Well, this is difficult to do from first principles, so instead let us bypass the problem for the moment by going directly to the die frequency distribution, i.e, the values of the dr in the sample as a function of r. We assume that this distribution can be described, at least approximately, by what are known as Negative Binomial functions. We needn’t go into these in detail at this point, but generally they produce a distribution something like a Gaussian bell curve, but skewed towards the higher outputs. The relative width of the distribution is described by a “shape parameter” k – a lower value of k means a broad distribution, while larger values give narrower curves##. Now for this type of distribution it is not too hard to show that D will be given by%: (5) Dk = d(1 + d1/kd)/(1 – d1/n) Here we see what is probably the main justification for using the Negative Binomial model, namely the fact that Eqn 5 includes the basic Good coverage factor (1 – d1/n). But as well, this equation includes a modifying factor recalling that in Eqn 4, namely (6) S = 1 + d1/kd Now it is tempting to conclude that this modifying factor can be interpreted as the factor S in Eqn 4 due to the spread in die outputs. However, this is not strictly justified at this point since S in Eqn 6 derives not from any assumptions about the underlying die outputs but rather from the choice of the Negative Binary functions used to model the sample frequency distribution. Essentially Eqn 5 is really just an heuristic equation, i.e, it calculates D by assuming that the observed die frequency distributions can be modelled by Negative Binomial functions and then simply extrapolating these functions to give the value of d0, and hence of D = d + d0. Hopefully though this approach will represent a reasonable approximation to reality, so that Eqn 5 will yield a worthwhile estimate of D, in which case Eqn 6 can be taken as a reasonable estimate of the spread factor in Eqn 4%%. Note that the higher the value of k, the closer S tends to 1, and hence the closer Dk tends towards Good’s basic estimate D0. Also, irrespective of the value of k, we see that as the size of the sample increases, the ratio d1/kd decreases so that S tends towards 1 anyway, and hence for large samples (with the sample size factor R = n/d >> 1), Dk will not be too much greater than D0. In practice this means that for reasonably large samples S will generally not make a great difference to the final estimate of D, which will then be determined mainly by Good’s basic coverage factor, so that the accuracy (or even the theoretical validity) of the modifying factor, and the precise value of k, are not so important.
In the past it has often been assumed, for reasons that we will come to later, that the value of k in Eqn 6 can be taken as 2. However recently it has become evident that in actual samples k values closer to 1 give a better fit to the data in many cases, so putting k =1 we finally get:
(7) D1 = d(1 + d1/d)/(1 – d1/n) (k = 1).
The Die Output Distribution in the real world.
As we have seen, Eqn 5 as stated is basically just a convenient heuristic formula. So can we provide a more solid theoretical foundation for this equation, i.e, can we show that a Negative Binomial die frequency distribution, or something like it, is what we would actually expect in practice, given some plausible assumptions about the underlying die output frequencies? Or, to turn the problem around, given the observed die frequency distribution can we work backwards and infer something about the original die output distribution?
Well, in general this is not a simple problem. However, according to Esty where the die output frequency distribution is negative binomial, then the resulting frequency distribution in a random sample should also be negative binomial. In particular, again according to Esty. where the die output frequency distribution is exponential (i.e, geometric, with shape parameter of k = 1), then the resulting frequency distribution should also be exponential, so that it can be also described by a Negative Binomial function with k = 1.
Now, as noted above, actual frequency distributions are in fact often roughly exponential, so if Esty is right then it is tempting to conclude in these cases that so are the underlying die output distributions. This is important, since an exponential die output distribution implies that die life is determined solely by random breakage at a constant average rate, which yields a die survival curve that declines exponentially with the number of coins struck, and hence also an exponentially declining die output distribution. (In practical terms this means an exponential die output density function, i.e, the number of dies producing 1000 to 1999 coins is less than the number producing 0-999 coins by some constant factor, and so on).
However, I have my doubts about this – in my experience the roll-off of the frequency distribution in smaller samples is usually faster than exponential, suggesting perhaps that an exponential die output may lead to a value of k greater than 1 for the resulting frequency distribution (at least while the sample is still small), and hence we probably shouldn’t press conclusions about die output distributions too far. On the other hand if the value of k is going to be close to 1 anyway, then this also means that Eqn 7 will usually give a reasonable estimate of D more or less irrespective of the actual die output distribution.
(Note incidentally that in the past it was commonly assumed that the die output distribution (strictly, the die output density function) could be approximated by a Gamma distribution (essentially a continuous version of the Negative Binomial distribution) with a shape parameter p of 2, and this value was then simply carried over to the value of k in Eqn 5).
While the above formulae may seem to be based on some fairly specific assumptions they seem to give good results in practice. In particular, computer simulations have been run on model populations of coins with varying compositions (i.e, numbers of dies, and spread factors) to produce test samples of various size, and from these simulations we find that estimations of D made using the formulae listed here generally match the original die numbers quite well, or at least, within the expected margin of error, with the important proviso that we have a reasonably random sample to work from.
Also, these formulae (with k values between roughly 1 and 2) seem to give generally correct answers in real world cases where the dies were individually numbered and hence we actually have a good idea of their total numbers anyway (e.g, the Norbanus and, particularly, the Crepusius denarius issues)$.
Alternatively, if you don’t want to make assumptions about die output spreads, you can simply take D0 in the basic Good Formulae 3 above (or my Formula 9 below, or in fact any equal output formula) as giving the number of “efuds”, i.e, equivalent full use dies, or the number of dies that would have been used if all dies produced the same standard “full use” output. Such a figure may not be strictly realistic, but in most cases it can be taken as a useful standard estimator of the relative sizes of different issues.
In any case it should be realised that these estimations of die numbers are only approximate, and hence the expected margin of error can be quite large. We can also produce estimates of the possible error range, but I will not do that in detail here, except to say that for samples with R = n/d less than about 3, and n-d at least 10, a good estimate of the 95% error range in D is given by +/- 2D/sqrt(n-d) – this means that unless we have at least a dozen or so die matches the possible error range in the die estimate can easily be plus or minus a factor of 2. This may seem a lot, but it is still quite sufficient to tell us whether the issue we are dealing with involved only a dozen or so dies, or scores of dies, or many hundreds. (For more detailed error formulae see Esty’s article in Num. Chron. 166, 2007*).
Formulae involving only R = n/d.
Next I come to a formula of my own devising, useful for quick estimates of D, which uses only the number of different dies d in a sample of size n. It is based on a very simple formula which gives a good approximation to Good’s Formula 2 for the coverage, at least for smaller samples (i.e, where there are not too many die matches, so that R = n/d is not too much more than 1). Specifically, I approximate d1 by d/R$$, so that from Formula 2 the coverage is now estimated by:
(8) Ce = 1 – 1/R2
Applying this as in Equation 3, we get:
(9) D0(R) = d/(1 – 1/R2)
This last formula assumes equal die outputs, so add a total spread factor of your own choice. Formula 6 above becomes here:
(10) S = 1 + 1/kR
(11) Dk(R) = d(1 + 1/kR)/(1 – 1/R2) (any k, low R).
Eqn 11 applies for any value of k for low values of R, but for the particular case of k = 1 the formula reduces to the simple form:
(12a) D1(R) = d/(1 – 1/R), (k = 1, any R).
which is in fact valid for any R value, i.e, any sample size.
This last formula can be conveniently written as D1 = nd/(n-d), a form used in recent publications by Esty for exponential frequency distributions. (Note that Eqn 11 is generally valid only for smaller samples, with R less than, say, 1.5, but Eqn 12a, where k = 1, is valid for any R value).
For the specific case of k = 2, another formula for D(R), also due to Esty, is:
(12b) D2(R) = 2n/[R-4+sqrt(8R+R2)], (k =2, any R).
This formula is derived directly from Eqn 5 and is valid for any R value.
Note that where we have only one die match n-d = 1, in which case D1 = nd (= n2 close enough), twice the value for the equal die output case (see earlier), essentially because in this case k =1 and so S in Eqn 10 is roughly 2 here, as opposed to S =1 with k = infinity for the equal output case.
Note also that we can take Eqn 12a and rearrange it to give R = 1+n/D, so that for exponential outputs a sample total equal to the number of original dies yields an expected R value of 2, and so on.
More Small Sample Formulae.
If we write R as 1+x then for small samples x << 1 and Eqn 12a can be written as:
(13a) D1 = n/x (k = 1).
a form which is particularly instructive for low values of R, where we can also write D1 = d/x approximately.
For k = 2 we can show that (for x << 1):
(13b) D2 = 3n/4x (k = 2).
And similarly for the ideal case of equal die outputs (and small x):
(13c) D0 = n/2x (k = infinity).
which once again is half the value of D1 in the same circumstances.
(This should be contrasted with the case of large samples considered earlier, where we saw from Eqn 5 that for high values of R, where d1 << d, Dk will not be much larger than D0 for any realistic value of k).
Yet another formula – the “Truncated Exponential” Estimate.
As we have seen, it is now often assumed that die output distributions are generally exponential, resulting in an exponential frequency distribution. In practice however we often find that for small samples, where R = n/d is significantly less than 2, the frequency curves often start out with a more or less geometric decline, they often ultimately roll off at an increasing rate for higher r values, more in line with an ordinary binomial distribution (for examples see now the Numbered Die Issues to produce coin).
Nonetheless in these cases we can still model at least the beginning of the frequency distribution curve with an exponential function. This is convenient, since it allows us to develop a simple estimate of the original die numbers which is particularly appropriate for small samples (although it is also valid for larger samples).
For the ideal exponential frequency distribution we have d0/d1 = d1/d2 = d2/d3 etc. so that we can estimate d0 simply by d0 = d12/d2, giving ultimately the following formula for the number of original dies:
(14) Dte = d + d0 = d + d12/d2
We use Dte here to signify the “truncated exponential”, or perhaps “asymptotic exponential”, estimate of D, as it extrapolates D from d using only the first two values of dr. For a strictly exponential distribution (with k = 1) this formula is equivalent to Eqns 7 and 12 for D1 above, although for real sample distributions the various equations will give somewhat different results for D. For distributions with higher k values (say around 2) Dte gives results which are lower than D1 and closer to, but generally greater than, D2, as we might hope.
For smaller samples the main contribution to Dte will be from d0, and the relative error in this term comes from d1 and particularly the smallest factor, namely d2. We won’t go into a detailed error calculation here, but in most cases a reasonable estimate of the 95% error range in Dte will be given by +/- 2d0 x (2/sqrt(d1) + 1/sqrt (d2)), which can easily be +/- 50% or more. Ideally to minimise possible errors we would prefer d0 to be rather less than d, but this will often not the case for small samples.
In my experience this formula seems to give better estimates of D for the uniquely numbered republican denarii than the standard exponential estimators for D1 such as Eqns 7 and 12, which generally seem to overestimate D (at least for lower values of R where the actual samples are generally not exponential).
However it should be remembered that Eqn 14 is generally not very accurate as it is particularly susceptible to variations in the value of d2, so overall the best approach is to crosscheck the results of as many different formulae as possible against each, and use what seems to be the most appropriate in all the circumstances.
Testing the Frequency Distribution.
In much of our discussion we have been assuming that the frequency distribution in real world samples is more or less exponential, but how do we check whether this is actually the case?
We can do this by comparing the sample data with the theoretical values of the dr. For the exponential distribution these can be calculated from the basic formula dr = d x b(1-b)r-1 for r = 0 to n, where b is a scale parameter appropriate to the sample&. The value of b can be approximated here by b = d/n = 1/R (which effectively fits the exponential distribution to the values of n and d, as in Eqn 12), giving d1 = d/R, d2 = d1(1-1/R) and so on. The fit between the actual and theoretical distributions can then be assessed by the usual methods (Chi squared, K-S maximum distance etc. – the degrees of freedom will be n-2 since n and d in the formula are determined by their sample values, thus putting two constraints on the possible values of the calculated dr, as d = sum(dr) and n = sum(rdr)).
Alternatively it may be more convenient in some cases to express the basic formula for the frequency distribution in strict Negative Binomial terms, with D as the basic parameter rather than b; doing this we get dr = D x b(1-b)r , where here b = 1-d/D (see note below for details). This form is useful as it enables us to set D to some expected value in order to test the fit with the sample.
In theory, this approach can be extended by fitting higher order Negative Binomial functions to the frequency data, so as to find the combination of shape parameter k and scale parameter b (or die number estimate D) which best fits the data (while matching the sample values of n and d). In practice this is probably worth only doing if the frequency data are clearly statistically inconsistent with an exponential distribution to begin with. This will normally require a sample with R = n/d at least equal to 2, as if both k and D are variable, then with smaller samples a wide range of k values can give a reasonable match to the sample data.
In fact with small samples, where only the first few dr values are non-zero, and the distribution function is not very sensitive to the value of k, the mathematics means that a value of D given by Eqn 5, or thereabouts, will normally give a good fit to the data for just about any value of k from 1 to infinity (although some k values will give a better fit than others). In particular, a simple Binomial distribution (effectively, k = infinity) with the probabilities calculated using D values close to Good’s estimate in Eqn 3 will usually give an acceptable fit (with small samples) no matter what the real distribution is. Obviously this will normally underestimate the real value of D, which shows that small samples generally can’t be used to put meaningful limits on both k and D simultaneously.
This means that for smaller samples the best approach is probably to choose a suitable approximate value of k (say k =1 for basically exponential samples, or 2 for others), and then calculate D values on that basis. As a check, use Eqn 14 to calculate a third estimate, and then assess the results against each other.
For larger samples, with R significantly greater than 2, the value of k is not so important, as we have seen, and the various estimates will tend to converge towards the value of D0 anyway.
Now that you have the basic formulae go to an archive site (CNG, Coin Archives or the like), search on your favourite coin, save off the results and get to work on a die number estimate. Note that for large issues you don’t need to check every single coin that comes up – just select a manageable number (say not more than 30 to begin with) of coins in reasonable condition and go to it. (To make life easier for yourself set the formulae up in a small spreadsheet). Note also that for scyphate types you need to check both the left and right sides of the “obverse” (the convex side in this case), as these types are usually double struck, and not infrequently different obverse dies are used for the two strikes – it seems the die holders often dropped their dies and had to grab another (cooler) one.
But be warned – die estimating can become addictive, and is rather time consuming.
* The basic formulae in this note are mostly taken from an article by Warren Esty in Numismatic Chronicle Vol. 166, 2006, p. 359-364. They are discussed in more detail in an article by the same author in Numismatic Chronicle 1986, p. 185-216.
** The essence of the proof is to show that, while the proportion in the sample (the frequency) of coins from all dies appearing r times in the sample is rdr/n, the proportion of coins from these dies in the population is best estimated by (r+1)dr+1/n, the difference being due to the coins in the population from dies not in the sample. The sum of these latter frequencies from r = 1 upwards is the coverage C, and is evidently 1 – d1/n (since the sum of all the sample frequencies rdr/n from r = 1 upwards will necessarily total 1).
We therefore see that the proportion in the population of coins from dies not appearing in the sample can be estimated by d1/n, which, assuming equal die outputs, equals d0/D, the proportion in the population of dies not present in the sample. In this case, therefore, Ce = 1 – d1/n also estimates 1 – d0/D, the coverage of dies in the sample.
Informally, we can argue like this – given that we have a sample that includes d different dies, the probability that the next coin examined will be from a known die is d/D, i.e, it equals the coverage C. Now the relative fraction of singletons in the sample is d1/n, so that we might intuitively expect that the probability that the next coin will be another singleton, and hence from a new die, is also d1/n, so that the probability that next coin will be from a known die is 1 – d1/n. Hence Ce = 1 – d1/n. However, in the end intuition is not formally rigorous proof.
Note incidentally that the coverage in terms of coins issued, as Good originally showed, is represented quite generally by Eqn 2 irrespective of the outputs of the various dies.
# For the particular case of equal die outputs we can also derive Eqn 3 directly, without invoking Good’s work. Here the statistics will be Binomial, i.e, for a sample of n coins from an issue with a total of D dies the probability that a particular die will appear r times is nCrpr(1-p)n-r, where p = 1/D is the probability that a given coin comes from the die in question. Hence the average number of dies that do not appear in the sample will be d0 = D x (1-p)n, while the number that appear only once will be d1 = D x np(1-p)n-1. Thus d0 = d1 x (1-p)/np = d1 x (D-1)/n = d1 x D/n close enough in most cases, and so we have D = d + d0 = d + D(d1/n), leading to D = d/(1-d1/n). (Strictly D = (d-d1/n)/(1-d1/n)).
## Strictly, here we use “zero-truncated” Negative Binomial distributions, as d0 is assumed to be zero to match the sample. The theoretical value of d0 in the full distribution is then taken to estimate the unknown value for the sample.
% Assuming that the frequency distribution can be described by a Negative Binomial function with a shape parameter k, we have dr = D x k+r-1Ck-1pkqr, where q = 1-p = n/(n+kD). Hence d0 = Dpk and d1 = Dkpkq, so that d0 = d1/kq, and hence we have D = d + d0 = d + d1(n+kD)/kn, leading to D = d(1+d1/kd)/(1-d1/n) as in Eqn 5.
%% In an actual sample the spread in the frequency distribution will be determined as much, or more, by the sampling process as by the spread in the die outputs. As an extreme example, in the case of equal die outputs there is no spread at all in the original output frequencies, but the sampling process means that the frequency curve of the sample will follow a Binomial distribution whose spread depends on the ratio of the sample size n to the number of original dies D (in fact the spread in the distribution, as measured by its standard deviation, will be sqrt(n/D) close enough, since for a Binomial distribution the variance is npq, where p = 1/D here, and q=1-p). For our purposes we can approximate this distribution with a similar Negative Binomial function which will have an arbitrarily large value of the shape parameter k, but much the same mean and spread as the Binomial ditribution. In this case Eqn 6 will give S = 1, and hence Eqn 5 will give the correct result for D.
$ Note that Esty now thinks (Dec. 2010) that in most cases k = 1 gives the best fit to the data in most cases, i.e, implying that die life is mostly determined by simple breakage at a constant rate. However, I have found that k values higher than 1 seem to give a better fit to reality with some of the uniquely numbered Republican denarii where we know the actual number of dies. On the other hand, the samples I have been using are relatively small and the statistical significance as yet low, and in any case what this may actually indicate is that for smaller samples the frequency distribution is not strictly exponential for random breakage anyway.
$$ This approximation is derived from the fact that if d = n – a, then for a << n, (i.e, for small samples, where R = n/d is not too much more than 1), d1 will usually equal d – a, or not much more, so that d1 = n – 2a = d2/n, approximately (as d = n – a, and a << n). Thus we have d1 = d/R, close enough, just as d = n/R. For low R values, this approximation is evidently generally correct for any value of k, and in fact, as shown in more detail below, it is also correct for all values of R in the case of exponential frequency distributions (k = 1).
& This formula can be easily derived from first principles. We note that for an ideal exponential (geometric) distribution dr+1 = dr x q, where q is some constant less than 1, so that dr = d1 x qr-1. Evidently d = d1+d2+d3… = d1 x (1+q+q2+…) = d1/(1-q), giving d1 = d(1-q), with d2 = d1 x q etc. Putting b = 1-q then gives our basic formula dr = d x b(1-b)r-1. Further, we note n = d1+2d2+3d3… = d1 x (1+2q+3q2+4q3…), so that n-d = d1 x (q+2q2+3q3…) = qn, i.e, q = (n-d)/n and hence finally b = d/n = 1/R. (This means incidentally that the calculated frequency distributions for all samples with the same level of coverage have the same roll-off factor q = 1-b = 1-1/R. The roll-off is thus steep for small samples with R values close to 1, and flattens out as R increases).
Note also that since d0 = d1/q = d(1-q)/q we have D = d + d0 = d + d(1-q)/q = d/q = d/(1-b) = nb/(1-b). For the ideal distribution b = d/n, as we have just shown, in which case we have D = nd/(n-d), i.e, Eqn 12. For an actual sample Eqn 12 gives a value of D which best fits an exponential distribution to the sample data while matching the observed values of n and d.
The basic formula for the dr above is not quite in conventional Negative Binomial format, but we can easily remedy this, and at the same time make D the basic parameter rather than b. From the the general formula for D above we have d = Dq = D(1-b), so putting this in the basic formula we get dr = D(1-b) x b(1-b)r-1 = D x b(1-b)r, where b = 1-q = 1-d/D.
20 Dec. ’10: Esty’s preference for k = 1 noted.
22 May ’11: Separate Frequency Distribution section added.
4 July ’11: Nomenclature tidied up.
11 Aug. ’11: “Yet another formula” added.
24 Aug. ’11: Direct derivation of Eqn 3 added (see subnote #).
11 Oct. ’11: Discussion of frequency distributions and Formula 14 revised.
16 Nov. ’11: Discussion of Eqn 6 revised (yet again) clarifying relation between the shape parameters p and k.
18 Dec. ’11: Error estimate for Eqn 14 revised.
30 Dec. ’12: Separate Small Samples section added.
20 Oct. ’15: Esty’s formula for D2(R) added.