Full Volume Database Refresh - Part 1

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Aug 4: Full Volume Database Refresh - Part 1

(... or the Refresh from Hell)

This weekend will be the final practice run (there's been one already) for a database refresh procedure I'll be working on for a further four weekends. It's a full copy of one of our production databases that forms part of a user acceptance test environment for a significant application upgrade due in the next couple of months.

Companies should run more development and test databases than production ones, to support multiple development streams and different phases of testing, such as system or user acceptance testing. Many of those databases will contain the same schema objects as production, but a tiny subset of the data. They pose an interesting challenge that I might write about one day.

But nothing is as reassuring as testing on the same database that's running in production already. It's an expensive business, of course, because you need sufficient disk space to support it and most businesses don't have 300Gb USB drives as an option. What I mean to say is that disk space isn't quite as cheap as I keep hearing because of the surrounding infrastructure and support costs. You might also want to use an identical server to run it on if you're interested in testing absolute performance. It depends on what your testing is supposed to achieve.

In this case, the main concerns relate to the data. It's a complex system and the testers need to be able to see the same data as production, which will then go through a week long flow of changes and checks and the same overnight batch runs as production. Essentially, the testers will simulate 'a week in the life' of this system.

I said earlier that this test database will form 'part of' a UAT environment, and that's adds a little to the complexity. Is it just me or does everyone want to design applications these days to use as many different databases, all talking to each other as possible? I strongly suspect the hands of architects here ;-) Might it be for performance reasons? Probably not when all of the databases sit on the same server! I don't know why, but I'm seeing this approach used more often. In this case, the test database is going to be co-operating with at least three other databases. However, we don't want most of those other databases to be refreshed, just the main one and another SQL Server database that the new application uses!

OK, down to business and the point of this blog. I'm not going to give you all of the details of a client's process, but let me higlight some of the main stages, challenges and possible solutions. I think this is fairly typical of other refresh procedures I've seen at other sites.

Copying the Data
We need the production data to appear in our full-volume test database. This can be the most time-consuming process (4 hours in our case), but it depends on what approach you take.

- Export the production data and import it into test. This is the most prone to failure and probably the slowest solution, even with Data Pump (which we can't use - this is a 9i database). It does have certain benefits, that will appear later in this discussion, but isn't a sensible option for large databases in my view. The database we're working on is around 140GB. Not too large by today's standards, but a little tricky to handle with exports. However, exports have been used for this refresh in the past, and I pity the poor DBA who had to do them!

- Copy the database files from one set of filesystems to another, possibly on a different server. This is essentially the approach we're taking because we're going to restore one of the hot backups of production to the test server and then perform recovery on it.

- Have an extra set of disk mirrors that can be temporarily broken away from the main production disks and mounted elsewhere. This can be a super-fast option but it's not available to us in this particular situation.

We're going to restore files to a new location, in other words 'Cloning a Database'.

Actually, I'll leave it there for now and post more later.
Posted by Doug Burns Comments: (17) Trackbacks: (0)
Defined tags for this entry: Database Refresh
Related entries by tags:
Full Volume Database Refresh - Part 4
Full Volume Database Refresh - Part 3
Full Volume Database Refresh - Part 2

Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)

#1 - Pete_S said:
2007-08-04 10:04 - (Reply)

Is it just me or does everyone want to design applications these days to use as many different databases, all talking to each other as possible? - it's just you :-)
Some of this possibly comes from the "single truth" notion - if you haven't copied it from another database it can't be out of sync, and in part it SoX and other governance ideas segregating data, then of course it could be data architects .
Having big enough data volumes to test on is no joke for sad people like me - in data warehouses a lot of our test is about query performance; to test year-to-date 2007 vs year-to-date 2006 we actually need a lot of data. Even getting query plans to match can be a struggle, just copying statistics does not always hack it - there are quirks with export table stats that annoy me greatly

#1.1 - Doug Burns said:
2007-08-04 10:08 - (Reply)

Some of this possibly comes from the "single truth" notion

Good point. In fact, these are called 'SDS refreshes' where SDS stands for Single Data Store (I think).

But I wonder what all those *other* databases are for. Seriously, every new app needs a new database these days :-(

#2 - Marco Gralike said:
2007-08-04 10:46 - (Reply)

Three years ago I introduced the phenomena virtual machines in my company. Despite the first ROI regarding consolidating servers, it afterwards exploded into more environments (VM's that is). After discovering that it was relatively easy to clone complete environments, every team wanted more environments.

VMware can be a solution to these work intensive cloning, at least the moment Oracle also is goning to certify this. As far as I know, at least RAC will not be certified.

VMware is becoming very smart and even high volume environments are easy to clone. Of course you can ask yourself if testing is still done on the exact same environment and/or if you introduced other variable which cause you to compare apples against apples.

I started off with ~ 30 fysical servers voor development with almost 1 to 1 oracle environments (databases or Oracle application servers). Nowadays we have 6 servers running with approx. 8 oracle vmware dedicated environments on it. Regarding hardware costs we are cheaper, regarding work (maintaining it all)... Pfff

#3 - Dominic Delmolino said:
2007-08-04 20:39 - (Reply)

Good luck, Doug! We did this all the time from Prod to QA. We reduced the number of databases by consolidating the schemas -- new app = new schema, not new database. Challenges we encountered were related to encryption (can't have production keys outside of production) and localization of data values (stuff like production URLs being stored in table columns which needed to be updated to point at test URLs). Fun stuff -- but very useful if you can script it and do it repetitively. Try to make the filesystems the same -- makes things a lot easier. And really, why would they need to be different?

#3.1 - Doug Burns said:
2007-08-04 23:57 - (Reply)

Hey, you're stealing material from part 2!

Only kidding ;-)

#3.2 - Pete_S said:
2007-08-05 08:52 - (Reply)

... and there is the need to obscure production data yet maintain integrity - some of my customers are quite secretive. Mangling names and addresses can be a headache and sometimes pointless if we are then comparing test results against live as some volume test where the unmangled is plain to view :-)

#3.2.1 - Doug Burns said:
2007-08-05 09:05 - (Reply)

Sheesh! You think people would let me *finish* before supplying all the material via comments!

Again, only kidding ;-)

#4 - Dominic Brooks said:
2007-08-04 23:30 - (Reply)

>Is it just me or does everyone want to design applications these days to use as many different databases, all talking to each other as possible?
Yep, they're all mad out there. I find myself fighting this concept all the time. And most of the time, these different applications end up wanting to share their data around so you have to do so much extra work...

Also, many companies are focused on reducing their costs. So they're considering SE over EE etc, but they could make a killing by just briefly stopping when a new application comes along and asking themselves whether it could fit in new schemas on an existing database rather than needing a new database on new hardware, etc, etc.

The people with common sense lack influence...

#5 - Eric said:
2007-08-05 09:32 - (Reply)

Every application it's own database is a concept I know from the Ingres and Sybase servers I have worked with. And since the architecture of SQLServer is derived from Sybase 4.7 or thereabouts, the same probably goes for that environment. The concept of a database is somewhat different and the footprint of a new database is definitely much smaller than that of an extra Oracle database. Hence it is more logical to introduce a new database for a new application.
When I encounter architectures like this I try to convince everybody the concept of a new database in Oracle most of the time can be implemented by a separate schema. And no separate database is necessary. Especially if I include arguments about backup schedules and other maintenance tasks that can be shared when using separate schema's instead of separate databases, people seem to agree more easily with me.

#5.1 - Doug Burns said:
2007-08-05 09:41 - (Reply)

Eric,

I think you're right because I often see applications that were originally designed for Sybase and ported to Oracle that the vendor recommends should still use multiple databases. I can think of one recent example of this that caused no end of problems.

Then again, I see enough applications that have been developed in-house, specifically for Oracle, where there is still a tendency for everyone to want their 'own' database. You can't always make multiple schemas work, but I'm surprised how rarely I see people attempt it. I've worked at a few sites recently, though, where we've started to encourage designers and developers to move in that direction.

#6 - Mathew Butler 2007-08-05 21:00 - (Reply)

I've used clones for similar reasons in the past. I'm actually surprised that I've come across more than a few DBA that are not that familiar with them - in particular their restrictions. For example, the requirement to assign a new SID if the clone is of a database that will run on the same physical system as the clone.

As an aside, I seem to remember you mentioned in your blog some time ago about reading one of The Practical Performance Analyst OR The Art of Computer Systems Performance Analysis. I just wondered how you got on and what you thought...

Mat

#6.1 - Doug Burns said:
2007-08-07 05:56 - (Reply)

Mat,

As an aside, I seem to remember you mentioned in your blog some time ago about reading one of The Practical Performance Analyst OR The Art of Computer Systems Performance Analysis. I just wondered how you got on and what you thought...

[Slightly embarassed mumbling ...]

It was The Practical Performance Analyst. I started to read it, didn't get on too well with the writing style although I'm not sure why, skim-read some other bits but will revisit it properly. (I'm still not sure it's my cup of tea, though, even though everyone else seems to swear by it.)

The problem is that I have the following in a reading pile.

Jonathan's CBO book (a proper read, performing the examples).

Applied Mathematics for Database Professionals.

Relational Database Index Design and the Optimisers.

And I haven't even bought Tom Kyte's most recent book.

Etc, etc, etc.

I'm struggling to find the time to concentrate properly on these and the list just keeps growing, but I suspect I won't get back to Gunther until I've read those.

I suppose a bit less time blogging might help!

#6.1.1 - Mathew Butler 2007-08-07 14:20 - (Reply)

Funnily, these are on my todo list as well, along with a bundle of J2EE stuff that I keep trying to get my head into.

Unfortunately, there is never any time at work, and doing too much of this stuff at home is a no-no.

I really don't know how you manage to find the time to maintain a blog ( and now two!)

M.

#6.1.1.1 - Doug Burns said:
2007-08-07 18:28 - (Reply)

I really don't know how you manage to find the time to maintain a blog ( and now two!)

Oh, it's easy! I just don't read any useful and interesting books any more.

If things go according to schedule, I should have regressed to a state of total ignorance by Christmas ;-)

Really, it's not that much of a commitment. I spend much more time on other things.

#7 - David Aldridge said:
2007-08-06 12:26 - (Reply)

There's a lot to be said for having that ability to run fast copies of databases. Here our major systems (Siebel, Portal Infranet etc) are cloned overnight using SAN-based magic to be used as reporting instances the next day, and we have separate and full-sized performance testing environments for every major system plus uaer acceptance, integration and development databases.

Filling in the db access forms to get a user id actually takes longer than making a copy of the database, it seems.

#7.1 - Doug Burns said:
2007-08-06 20:45 - (Reply)

Filling in the db access forms to get a user id actually takes longer than making a copy of the database, it seems.

Well it would be no fun creating all of these environments if we actually let you access them!

#7.1.1 - David Aldridge said:
2007-08-06 20:54 - (Reply)

Oh, you are _so_ hired. When can you start? :-D


Your comment was successfully added.

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl