Aug 4: Full Volume Database Refresh - Part 1
This weekend will be the final practice run (there's been one already) for a database refresh procedure I'll be working on for a further four weekends. It's a full copy of one of our production databases that forms part of a user acceptance test environment for a significant application upgrade due in the next couple of months.
Companies should run more development and test databases than production ones, to support multiple development streams and different phases of testing, such as system or user acceptance testing. Many of those databases will contain the same schema objects as production, but a tiny subset of the data. They pose an interesting challenge that I might write about one day.
But nothing is as reassuring as testing on the same database that's running in production already. It's an expensive business, of course, because you need sufficient disk space to support it and most businesses don't have 300Gb USB drives as an option. What I mean to say is that disk space isn't quite as cheap as I keep hearing because of the surrounding infrastructure and support costs. You might also want to use an identical server to run it on if you're interested in testing absolute performance. It depends on what your testing is supposed to achieve.
In this case, the main concerns relate to the data. It's a complex system and the testers need to be able to see the same data as production, which will then go through a week long flow of changes and checks and the same overnight batch runs as production. Essentially, the testers will simulate 'a week in the life' of this system.
I said earlier that this test database will form 'part of' a UAT environment, and that's adds a little to the complexity. Is it just me or does everyone want to design applications these days to use as many different databases, all talking to each other as possible? I strongly suspect the hands of architects here
OK, down to business and the point of this blog. I'm not going to give you all of the details of a client's process, but let me higlight some of the main stages, challenges and possible solutions. I think this is fairly typical of other refresh procedures I've seen at other sites.
Copying the Data
We need the production data to appear in our full-volume test database. This can be the most time-consuming process (4 hours in our case), but it depends on what approach you take.
- Export the production data and import it into test. This is the most prone to failure and probably the slowest solution, even with Data Pump (which we can't use - this is a 9i database). It does have certain benefits, that will appear later in this discussion, but isn't a sensible option for large databases in my view. The database we're working on is around 140GB. Not too large by today's standards, but a little tricky to handle with exports. However, exports have been used for this refresh in the past, and I pity the poor DBA who had to do them!
- Copy the database files from one set of filesystems to another, possibly on a different server. This is essentially the approach we're taking because we're going to restore one of the hot backups of production to the test server and then perform recovery on it.
- Have an extra set of disk mirrors that can be temporarily broken away from the main production disks and mounted elsewhere. This can be a super-fast option but it's not available to us in this particular situation.
We're going to restore files to a new location, in other words 'Cloning a Database'.
Actually, I'll leave it there for now and post more later.
#1 - Pete_S said:
2007-08-04 10:04 - (Reply)
Is it just me or does everyone want to design applications these days to use as many different databases, all talking to each other as possible? - it's just you ![]()
Some of this possibly comes from the "single truth" notion - if you haven't copied it from another database it can't be out of sync, and in part it SoX and other governance ideas segregating data, then of course it could be data architects .
Having big enough data volumes to test on is no joke for sad people like me - in data warehouses a lot of our test is about query performance; to test year-to-date 2007 vs year-to-date 2006 we actually need a lot of data. Even getting query plans to match can be a struggle, just copying statistics does not always hack it - there are quirks with export table stats that annoy me greatly
#1.1 - Doug Burns said:
2007-08-04 10:08 - (Reply)
Some of this possibly comes from the "single truth" notion
Good point. In fact, these are called 'SDS refreshes' where SDS stands for Single Data Store (I think).
But I wonder what all those *other* databases are for. Seriously, every new app needs a new database these days ![]()
#2 - Marco Gralike said:
2007-08-04 10:46 - (Reply)
Three years ago I introduced the phenomena virtual machines in my company. Despite the first ROI regarding consolidating servers, it afterwards exploded into more environments (VM's that is). After discovering that it was relatively easy to clone complete environments, every team wanted more environments.
VMware can be a solution to these work intensive cloning, at least the moment Oracle also is goning to certify this. As far as I know, at least RAC will not be certified.
VMware is becoming very smart and even high volume environments are easy to clone. Of course you can ask yourself if testing is still done on the exact same environment and/or if you introduced other variable which cause you to compare apples against apples.
I started off with ~ 30 fysical servers voor development with almost 1 to 1 oracle environments (databases or Oracle application servers). Nowadays we have 6 servers running with approx. 8 oracle vmware dedicated environments on it. Regarding hardware costs we are cheaper, regarding work (maintaining it all)... Pfff
#3 - Dominic Delmolino said:
2007-08-04 20:39 - (Reply)
Good luck, Doug! We did this all the time from Prod to QA. We reduced the number of databases by consolidating the schemas -- new app = new schema, not new database. Challenges we encountered were related to encryption (can't have production keys outside of production) and localization of data values (stuff like production URLs being stored in table columns which needed to be updated to point at test URLs). Fun stuff -- but very useful if you can script it and do it repetitively. Try to make the filesystems the same -- makes things a lot easier. And really, why would they need to be different?
#3.1 - Doug Burns said:
2007-08-04 23:57 - (Reply)
Hey, you're stealing material from part 2!
Only kidding ![]()
#3.2 - Pete_S said:
2007-08-05 08:52 - (Reply)
... and there is the need to obscure production data yet maintain integrity - some of my customers are quite secretive. Mangling names and addresses can be a headache and sometimes pointless if we are then comparing test results against live as some volume test where the unmangled is plain to view ![]()
#3.2.1 - Doug Burns said:
2007-08-05 09:05 - (Reply)
Sheesh! You think people would let me *finish* before supplying all the material via comments!
Again, only kidding ![]()
#4 - Dominic Brooks said:
2007-08-04 23:30 - (Reply)
>Is it just me or does everyone want to design applications these days to use as many different databases, all talking to each other as possible?
Yep, they're all mad out there. I find myself fighting this concept all the time. And most of the time, these different applications end up wanting to share their data around so you have to do so much extra work...
Also, many companies are focused on reducing their costs. So they're considering SE over EE etc, but they could make a killing by just briefly stopping when a new application comes along and asking themselves whether it could fit in new schemas on an existing database rather than needing a new database on new hardware, etc, etc.
The people with common sense lack influence...
#5 - Eric said:
2007-08-05 09:32 - (Reply)
Every application it's own database is a concept I know from the Ingres and Sybase servers I have worked with. And since the architecture of SQLServer is derived from Sybase 4.7 or thereabouts, the same probably goes for that environment. The concept of a database is somewhat different and the footprint of a new database is definitely much smaller than that of an extra Oracle database. Hence it is more logical to introduce a new database for a new application.
When I encounter architectures like this I try to convince everybody the concept of a new database in Oracle most of the time can be implemented by a separate schema. And no separate database is necessary. Especially if I include arguments about backup schedules and other maintenance tasks that can be shared when using separate schema's instead of separate databases, people seem to agree more easily with me.
#5.1 - Doug Burns said:
2007-08-05 09:41 - (Reply)
Eric,
I think you're right because I often see applications that were originally designed for Sybase and ported to Oracle that the vendor recommends should still use multiple databases. I can think of one recent example of this that caused no end of problems.
Then again, I see enough applications that have been developed in-house, specifically for Oracle, where there is still a tendency for everyone to want their 'own' database. You can't always make multiple schemas work, but I'm surprised how rarely I see people attempt it. I've worked at a few sites recently, though, where we've started to encourage designers and developers to move in that direction.
#6 - Mathew Butler 2007-08-05 21:00 - (Reply)
I've used clones for similar reasons in the past. I'm actually surprised that I've come across more than a few DBA that are not that familiar with them - in particular their restrictions. For example, the requirement to assign a new SID if the clone is of a database that will run on the same physical system as the clone.
As an aside, I seem to remember you mentioned in your blog some time ago about reading one of The Practical Performance Analyst OR The Art of Computer Systems Performance Analysis. I just wondered how you got on and what you thought...
Mat
#6.1 - Doug Burns said:
2007-08-07 05:56 - (Reply)
Mat,
As an aside, I seem to remember you mentioned in your blog some time ago about reading one of The Practical Performance Analyst OR The Art of Computer Systems Performance Analysis. I just wondered how you got on and what you thought...
[Slightly embarassed mumbling ...]
It was The Practical Performance Analyst. I started to read it, didn't get on too well with the writing style although I'm not sure why, skim-read some other bits but will revisit it properly. (I'm still not sure it's my cup of tea, though, even though everyone else seems to swear by it.)
The problem is that I have the following in a reading pile.
Jonathan's CBO book (a proper read, performing the examples).
Applied Mathematics for Database Professionals.
Relational Database Index Design and the Optimisers.
And I haven't even bought Tom Kyte's most recent book.
Etc, etc, etc.
I'm struggling to find the time to concentrate properly on these and the list just keeps growing, but I suspect I won't get back to Gunther until I've read those.
I suppose a bit less time blogging might help!
#6.1.1 - Mathew Butler 2007-08-07 14:20 - (Reply)
Funnily, these are on my todo list as well, along with a bundle of J2EE stuff that I keep trying to get my head into.
Unfortunately, there is never any time at work, and doing too much of this stuff at home is a no-no.
I really don't know how you manage to find the time to maintain a blog ( and now two!)
M.
#6.1.1.1 - Doug Burns said:
2007-08-07 18:28 - (Reply)
I really don't know how you manage to find the time to maintain a blog ( and now two!)
Oh, it's easy! I just don't read any useful and interesting books any more.
If things go according to schedule, I should have regressed to a state of total ignorance by Christmas ![]()
Really, it's not that much of a commitment. I spend much more time on other things.
#7 - David Aldridge said:
2007-08-06 12:26 - (Reply)
There's a lot to be said for having that ability to run fast copies of databases. Here our major systems (Siebel, Portal Infranet etc) are cloned overnight using SAN-based magic to be used as reporting instances the next day, and we have separate and full-sized performance testing environments for every major system plus uaer acceptance, integration and development databases.
Filling in the db access forms to get a user id actually takes longer than making a copy of the database, it seems.
#7.1 - Doug Burns said:
2007-08-06 20:45 - (Reply)
Filling in the db access forms to get a user id actually takes longer than making a copy of the database, it seems.
Well it would be no fun creating all of these environments if we actually let you access them!
#7.1.1 - David Aldridge said:
2007-08-06 20:54 - (Reply)
Oh, you are _so_ hired. When can you start? ![]()

