We maintain a search service that serves data from
MongoDB. Our Mongo production instance is arranged in a 4 node replica set across four
physical servers.
The database is
comprised of several small collections and one large collection. The large collection
has the following
characteristics:
- number of
documents: 35 million - average document size: ~4.2
kB - collection size: 151
GB - storageSize: 157
GB
Over the
next year we anticipate that the number of documents in this collection will double to
~70 million and a doubling in the size of the
collection.
I am conscious that the "Sharding
Existing Collection Data Size" section of the href="http://docs.mongodb.org/manual/reference/limits/" rel="nofollow
noreferrer">Mongo Reference Limits document, it's specified that
"For existing collections that hold documents, MongoDB supports enabling
sharding on any collections that contains less than 256 gigabytes of data. MongoDB may
be able to shard collections with as many as 400 gigabytes depending on the distribution
of document sizes". Consequently, we would like to shard well before we reach
the 256 gigabytes of data.
We are have some
constraints on resourcing and we are not (yet) in a position to virtualise. However, we
are in a position where I can purchase two new servers, bringing the total to six
production machines.
My question is, is it
possible to split Mongo into two shards where each one is a 3-server replica set with
only six physical servers? I am conscious that in addition to the replica sets we
require three config
servers and a
mongos
server?
Should
we even be sharding? Our current RAM usage and the number of connections are currently
well within acceptable levels. Is there other strategies we might adopt to enable our
database to grow that doesn't involve sharding?
Comments
Post a Comment