Full-stack Philosophies

James Morle's Blog

RSS Feed

Spotting the Red Flags (Part 1 of n)

Posted on 2:15 pm September 24, 2009 by James Morle

As a consultant I get to see many different systems, managed by different people. A large proportion of these systems are broken in exactly the same ways to others, which makes spotting the problems pretty straightforward! It occurred to me at some point that this is a bit like a crime scene - somebody had murdered the system, and it was my job to find out 'whodunnit', or at least 'whatdunnit'. The 'what' is quite frequently one (or few) actions performed by the administrator, normally with good intention, that result in system performance or availability carnage. I call these actions 'Red Flags', and spotting them can save a lot of time.

A couple of years ago at the Hotsos conference, I did a small impromptu survey. It's not all that often that you have 500 Oracle geeks in a room that you can solicit opinion from, so it seemed like a good chance to cast the net for Red Flags. I seeded the process with a few of my personal favourites and then started writing as the suggestions came in. Here are the 'seed' entries:

  • Any non-default setting for optimizer_index_cost_adj
  • Carefully configured KEEP and RECYCLE pools
  • Multiple block sizes
  • Some kind of spin_count configuration

Remember, these are not necessarily the wrong thing to do to your system, but they probably are. That's why they attract my attention in the first instance.

I got a whole load of responses. Of those responses, some had missed the point and just identified something that was broken. This is more subtle than that - these are things that are forensic evidence of system homicide. Some of the decent ones I got back follow:

  • /*+ RULE */ hint in abundance (I know, lot's of apps do this, and it sometimes makes sense)
  • Numeric data in VARCHAR2 columns
  • Indexes and tables on 'separate disks'
  • All columns in a table NULLable

Of course, there were more cynical responses too, such as  (he he): cluster_database=true

I was going to write some kind of presentation about it, but I think this might work nicely as a blog entry, potentially multiple entries! If anyone has some good suggestions, please let me know and I'll start adding them into part 2, along with any more I think of... 🙂

The amazing truth about the Red Flags, though, is that they are incredibly hard to dislodge. Once optimizer_index_cost_adj is set to anything but default, it is nigh on impossible to convince anyone that the system is probably now tuned on its head. Same with a multiple buffer pool configuration, and same with multiple block sizes.


25 comments on “Spotting the Red Flags (Part 1 of n)

  1. Numeric data in VARCHAR2 columns

    And its cousin: date data in VARCHAR2 or NUMBER columns (I will never get why people choose to store dates in anything other than a DATE datatype!).

  2. Hi James,

    I beg to differ on OICA and multiple blocksizes.

    - In 9i and into 10g, OICA MUST be changed in some cases to force the optimizer to use indexes properly (even after histograms and system_stats).

    - Multiple blocksizes are used in many large OLTP shops to segregate I/O with great success.

    http://www.dba-oracle.com/oracle_tips_multiple_blocksizes.htm

    Don't forget, the Exadata benchmark used a 32K blocksize:

    http://tpc.org/results/FDR/tpch/HP-1TB-Superdome-64c-FDR.pdf

    db_block_size = 32768

    • Hi Don,

      I'm afraid that you have missed the point of the blog. I am not saying that any of the Red Flags are 100% wrong, and so they hardly need to be defended. My point is that, when I see these kind of things, alarm bells go off. I would say that 80-90% of the sites I've seen with OICA set non-default have not understood what they are doing. They use it as a silver bullet to make Oracle use nested loop plans for every statement and neutered the optimiser's chances of ever doing the right thing. OICA=10, for example.
      Same deal with multiple blocksizes, though I'm not sure you and I are referring to the same thing. I am referring to multiple blocksizes in the same database. Your article about multiple blocksizes appears to only address changing the default blocksize for the whole database. May I suggest an improvement for that article, by the way? When reporting observations from tests, such as "The final result was a 270 times improvement over the original, changing only the db_block_size," it would be more useful to also demonstrate your research into why this was for the specific system you were testing, and what the actual test was. What was taking the additional time? Additional redo? Something else? I think the reader would benefit from seeing why there is such a big difference rather than taking away the inevitable silver bullet of "use a 4k blocksize"!
      FYI - the TPC link you posted was for HP Superdome not for Exadata, and used a single block size - 32KB.

  3. A system that is non-trivial (though not "big"), with a "next-next-next" setup (ie all defaults, noone has bothered to think about anything)

    • Don't think it fits into the "actions performed by the administrator, normally with good intention, that result in system performance or availability carnage" category. Its either ignorance, fear or both. Maybe we need a 'yellow flag' category.
      But as an extension to your idea, if you've got a production database with objects in a USERS tablespace that would be another 'yellow flag'

  4. RI - either disabling it on live systems "for performance" or disabling it on dev "to help develop more easily"

    Views based on a dozen tables.

    Mentioning certain web sites as useful.

    🙂

  5. Nice one James, rings very true.

    Possible other flags:
    - Extensive, long term use of outlines
    - Logon-triggers that set session-level parameters based on user-names or time-of-day.
    - Shifting of SGA-size parameters based on time-of-day (or state of the moon...)
    - and my fave: Cursor-sharing=force, often combined with flush-shared-pool (because the "dynamic" vendor said so ...)

  6. Pingback: James Morle : Spotting the Red Flags (Part 1 of n) « RNM

  7. Hi James,

    >> My point is that, when I see these kind of things, alarm bells go off.

    OK, fair enough, sorry . . .

    I see your point now, and yes, OICA can easily be abused!

    >> They use it as a silver bullet

    In the sense that changing it changes the entire SQL execution landscape, I agree. But, that's why Oracle has given it to us . . . Plus, the new 11g RAT and SPA are designed specifically for testing of "silver bullet" changes (optimizer_mode, new stats, patches, init.ora changes, &c)

    >> I am referring to multiple blocksizes in the same database.

    Me too. I don't quite undrstand why it would be a red flag, since multiple blocksizes have been used for decades. I first used them in IDMS back in the 1980's, and I've also seen them used in DB2.

    >> it would be more useful to also demonstrate your research into why this was for the specific system you were testing, and what the actual test was

    I'd love to, but I can't. These days, I have virtually zero clients who don't insist on an NDA, and I must assume that your have the same issue.

    >> May I suggest an improvement for that article, by the way?

    I agree! It's too easily misunderstood by beginners, and I will re-work it.

    >> FYI – the TPC link you posted was for HP Superdome not for Exadata,

    That's the URL that was cited by Oracle for Exadata, unless I goofed: See here, it says Exadata:

    http://www.tpc.org/results/individual_results/HP/HP_BladeSystem128P_090603_TPCH_ES.pdf

    Anyway, in the sense that Multiple blocksizes can be misunderstood by beginners, sure I agree.

    The potential for abuse applies to lots of other techniques too.

    I might suggest more red flags including:

    3NF design - Somebody who read a book on database design

    Over-indexing - I once saw a DBA make an multiple indexes on every column of a 100 column table!

    Under-indexing - It's a red flag for me whenever I see a OLTP database with more data tablespace than index tablespace.

    Object-oriented SQL - Huge potential for abuse there:

    http://www.dba-oracle.com/t_sql_patterns_exotic_products.htm

  8. I'll add one. How about purposefully configuring buffered file system files because really poorly tuned SQL statements happen to be doing thousands of short-table scans and stepping out from the SGA for a handful of blocks a few thousand times a second.

      • Hmmmmm.... If the poorly tuned SQL is locked away in application code and you are running a release that does not allow profiling, then the only solution in this case is a KEEP/RECYCLE cache partition. Or more memory. Take your pick.

        Of course the problem is: once things get upgraded to later releases, *no one* remembers to get rid of the specifics of the old one.
        And the poor sod stuck with the old release gets blamed for not having "fixed things" for the new one, regardless of not having been in that site for decades...

    • Ha!!!

      This sounds very familiar! 🙂

      Also, there have been people who say no to enabling unbuffered (concurrent) IO as they still want to "use the benefit" of OS caching. And in order to work around the CPU overhead due all this double buffering,VM pagein ops, syscall handling and buffer cache LRU management, they "scale out", go with RAC instead ... 🙂

  9. Red flag for me if no-one can explain to me why there was chosen for some setting or action, e.g. hidden parameters, rebuilding indexes, the existence of abundant indexes.
    Any question about any choice that was made could do to test the depth of knowledge by the quality of the answer.
    Procedures ( we cant afford an expensive testenvironment) or wrong idea's ( we don't use database features, we program for database independency) are main contributors of errors. More than wrong settings alone; as they say :not the absent of knowledge is dangerous, but the perceived possession of it. So ignorance about a setting is less harmful than thinking that database independence is great and easy to implement without penalty.

    Disclaimer: i wrote this with a red wine intoxicated brain.

  10. @Niall:

    Indeed! One of my pet peeves, that one. Just saved a bundle on licensing costs by doing precisely that: move schemas and apps to the right license setup on the right hardware.
    Wish I had a buck for every time I've heard "we have an enterprise setup here, we should be using EE"...
    @Ilmar:
    Amen! Taken me ages to get these folks to consider using partitioning. Sure: initially we were not licensed for it. Now we are, thanks to the savings we made on other licenses. And I finally got the first bunch of tables partitioned. A lot more to come!

  11. Hey, I have a full set of O7 hardcopy in my basement! (5 and 6 too) 🙂

    How about:
    many scripts with names like *fix*
    "odd" backups:
    o running copies of data files without backup mode
    o depending entirely on snaps or SAN
    o no RMAN
    o no disaster procedures
    ODBC connections for analytics "dashboards"
    All DBA work dependent on EM
    No DBA
    All DBA work dependent on db-blind vendor
    All DBA work done by MS-SQL DBA
    SA is DBA (sometimes)
    All DBA work by developers

    I hope these are examples that are beyond mere "something is borked"

  12. Pingback: vSwitch Load Balancing – Stop Getting it Wrong! « Graham Morley's Blog

  13. Pingback: vSwitch Load Balancing – Stop Getting it Wrong! « Graham Morley's Blog

Leave a Reply