2014/09/23

OED 12c

Yeah...    Aka: Oracle Enterprise Damager...

For those who might not know - yes indeed, a few of the so-called "cognoscenti" are not all-knowing! - we've been engaged in using grid control to monitor all our db servers for quite a while.

That's both MSSQL and Oracle RDBMS servers.  For 4 years now!

In a nutshell and to cut a very long story short:

 - we started with 10g.  The lesser said about that one, the better. It didn't work, period!

- then went up to 11g grid control in the hope of finding something in the product that actually worked. That was mostly it: just hope.  Slightly better for the basics but a total disaster in any advanced functionality including host and MSSQL monitoring.

- finally about a year ago, we got 12c and after the usual bunfight about where to run it and licensing issues, ended up doing so in RH under a vm.

Pete Sharman - the product manager for this product - planted enough earworms on me about how much better this particular release was. And he was absolutely RIGHT!

OEM 12c compared to what preceded is like a breath of fresh air. True "night and day" stuff!
(yes, I used the word "stuff"- why, got a problem? Tough! :) )

Far from perfect, mind you.  But sooo much better than the nonsense that GC was before, there is simply no comparison.

All this to introduce the main subject of this post.

As part of the effort to monitor and manage our dbs with this product, we have been investigating and actively using its features.

A lot of them. Both for monitoring and for managing dbs. 

As in: "real life" work.  Not just "powerpointing" and "blogging" about how great it is and how it's gonna replace boiled water and all dbas while producing heaps of vapor clouds.  Like others do...

 And of course, as such, we've hit problems.  Which mostly we've tried to work around as much as possible rather than just get stumped or wait for patches that may never arrive.

Why?

You see: we actually have to deliver finished projects, measured MONTHLY by agile sprints.  Not just powerpoint presentations and travel and chinwag time.

Treating every obstacle as insurmountable is not an option in such a framework.

So, we have to think and apply past experience to resolve these problems or find workarounds.
And quickly!

One happened just a few days ago and it got us stumped.

It's easy as pie to reproduce, if you have access to OEM 12c:
  1. Create a super-admin login, and add to it a group with a chart.
  2. Logged in as that, go to the chart display page and using the right-top menu with the name of the login, pick option 2 to make that screen the default startup screen for that user.
  3. Now, logout and login again to confirm that indeed you end up in that screen. 
  4. After this, go to the main menus and get rid of that group altogether, then logout.
    Note that you did not change the default startup screen for that user.
  5. So now, login again and watch what happens. 
Yeah!  Blank screen, no menus anywhere!

One would think that in a well thought out - and well tested - product there would be an admin option somewhere to assign a default screen to any login, via SYSMAN.  Or at the very least to stop us from deleting something that is used somewhere else as a default!

Nup, no way. Too easy! Gotta make life hard for all those "bad dbas" out there, ain't it?

So, in came Support.  And one of the most exemplar and stellar lateral thinking workarounds I've had the luck of witnessing!


Indeed!  A perfect example of lateral thinking by someone genuinely interested in helping resolve a problem, rather than pontificating or worse yet - patronizing others.

It's simple.  Goes like this:
  1. Login with the affected user and get the blank screen in one browser.
  2. Login to SYSMAN in ANOTHER browser (not page, another browser! I use both Firefox and Chrome, so this is dirt easy).
  3. In the second (working) browser, go to a screen that you know the first login can access.  Click on the browser's url area, copy the url to the clipboard.
  4. Then simply go back to the first browser with the blank page, paste the url and hit Enter.
And Bob's a close relative of your mother!  The screen now displays properly in the first login's browser. Then it's simply a matter of picking a suitable screen and assigning it as a new default via the now existing and operational top-right menu.

It is indeed a brilliant workaround, making full use of the web-based nature of the OEM UI.

I've left it to the support person to decide if the root cause of the problem should be passed on as a bug or as an enhancement.


To my way of thinking, it should be reported as a bug as it can leave a login non-operacional, with possible dire consequences if that is a super-admin.

But adding an option to the security section to add any given page as the default one for any given login is most definitely an enhancement. Not a bug fix.

Couldn't make up my mind on which way to go, so I left it to the good judgement of the support person to decide.

Here is hoping that someone with a brain somewhere in the OEM dev team actually gets this, understands its importance and finds a solution. We lost a full day of work fiddling to try and get a solution/workaround.  Until I decided to go straight to Support and get them to resolve the issue or provide us with a workaround.


Still: here it is with all its gory details, hoping it might be of use for anyone at some stage.

All's well that ends well!


But indeed there is still a lot to be done to make OEM a truly reliable and usable product.

For EVERYONE.

Not just for those who pretend to know a lot about it but rely instead entirely on direct access to developers to resolve any problems. Dirt easy that way...





Anyways, here's some fun stuff of the last few weeks. Yes, I do have a life outside work and blogs and such. That's why I refuse to participate in so-called "community activities" that stop it from happening.



José Ramos-Horta is an old friend since our high school days in Timor Leste.  He is, as some might be aware, a Nobel Peace Prize recipient.   He received recently a Companion degree of the Order of Australia, perhaps the highest civil award possible in this country.  I attended the reception and we had some good times there, reminiscing.  Here are some photos of the event:


The introduction of the host, His Excellency Dr. José Ramos-Horta.



Sister Susan Connely who is the nearest thing I can think of to an angel for East Timor.  As usual she brought the house down with her words.  May your God bless you always, sister.  Respect!




This young girl brought tears to everyone's eyes when she sang both the Australian and the Timor-Leste anthems with her beautiful voice and young grace, accompanied by an adult player and singer. 




José and his sister, Rosa.  She was married to another independence activist - João Carrascalão - who passed away early last year.  I visited his resting place in September 2013.  All of them, my friends of the last 46 years.  It's a lifetime!




But we STILL can get a simple laugh and a honest smile.  Means a lot.




And here with another good friend, Fatinha Pires, who currently lives in Australia.



And that's about it for the time being, folks.
Catchyalata!

2014/03/06

Unintended, but interesting consequences

It's interesting how from time to time something happens that makes sense and seems logical afterwards, but at the time it causes a bit of a surprise.  Part of the fun of working with this type of software!

A few days ago we had an incident in an Oracle DW database when a developer tried to load an infinitely big file from a very large source.  Yeah, you got it: big-data-ish! 

Suffice to say: 180GB and still going before it hit the limits I've set for that particular tablespace.
Yes, I do use auto-extend. It's a very effective way to ensure things don't get out of control while still allowing for the odd mis-hap without any major consequences.  Particularly when the storage is a SAN and any considerations about block clustering and disk use distribution are mostly irrelevant.

And no: if carefully setup it really does not impact performance that much.  Yes, I know it CAN be proven in certain circumstances to cause problems.  "can" != "must", last time I looked!

Anyways, the developer did the right thing in that the table was loaded without indexes, no-logging, etcetc.  It just ran out of space, due to my tight control of that resource.

So the developer did the next best thing: he killed the session and waited for the rollback.
And waited...
And waited...
...waited...

You got it: one of those "never-ending" single-threaded rollbacks that Oracle goes into in some circumstances. The tablespace is locally managed and uniform size 40MB, so there were quite a few things pending for the rollback to go through.

No biggie, seen a LOT worse!  But of course, it needed attention. And I didn't really want to crash the instance and do a startup parallel rollback - read about that here .

Upon investigation, I noticed that indeed the session was marked "KILLED" and upon checking v$transaction (yes, I know about v$longops: that doesn't work in this case!) it looked like we were up for another 4 hours at least of single-threaded rollback!

Pain!  And of course we needed the space after the rollback returned the tablespace to nearly empty, to run other tasks of the DW.

I also noticed that the session's background process (Unix, single-thread dedicated) was still there.

Now, when a session gets killed the background process eventually goes away once the rollback is done.
In this case, it of course didn't - rollback still going.

But it wasn't clocking up any CPU time.  If it was doing anything, it'd have been totally I/O bound.  PMON and DBWR on the other hand were quite busy.  To be expected - that is the correct behavior for these conditions.

Trouble is: PMON and DBWR do this in a single-thread fashion.  Which takes FOREVER!

Now, here comes the interesting bit.

I decided based on past experience to actually kill the original background process.  As in "kill -9" after getting its "pid" through a suitable v$session/v$process query.  This, because in theory SMON should then take over the session and roll it back, freeing up any contention around PMON and DBWR.

Now, SMON is ALSO the process that does this if the database crashes and a startup rollback takes place. The thing is: it does it in parallel to make the startup faster, if one has FAST_START_PARALLEL_ROLLBACK set to LOW or HIGH. We have it set to LOW. My hope was that it would also do a parallel recovery here.

And indeed it did!  While watching the proceedings via OEM monitors, I saw the same session number show up in a totally different colour (pink) in the Top Activity screen, this time associated with the SMON process.  After clicking on the session, I saw that a number of PQ processes were actually active associated with it!

And guess what?  The rollback finished in around 30 minutes.  Instead of 4 hours!

Now, THAT is what I call a somewhat interesting outcome.  I did not know for sure SMON would do a "startup recovery" with full parallelism.  My hope was it would take over DBWR work and do a partial parallel rollback.  As it turns out, it did a LOT better than that!

And that is indeed something to be happy about!

Now, before anyone transforms this into another "silver bullet" to be performed everytime a rollback of a killed session takes too long:


  1. This worked for Aix 7.1 and 11.2.0.3 patched up. Don't go around trying this in your 7.3.4 old db!!!
  2. It was a decision taken not lightly, after consulting the manuals and checking with the developer the exact conditions of what the program had done and how and for how long.
  3. This database is under full recovery control with archive logging.  Worst came to worst, I could just lose the tablespace, recover it online from backup and roll it forward to get things back into shape without major outages of the whole lot.

As such, think before taking off on tangents off this. If I get some free time, I'll try and investigate a bit more about it and how far it can be taken.  But given the current workload, fat chance!  So if anyone else wants to take over the checking, be my guest: there is enough detail above to construct a few test cases and it doesn't need to be 180GB! A third of that should be more than enough to already cause a few delays!

Anyways, here it is for future reference and who knows: one day it might actually help someone?

Speaking of which: our Sydney Oracle Meetup is about to restart its operations for this year.  We'll be trying as hard as usual to make it interesting and fun to attend.  And we need someone to replace Yury, who is leaving us for the Californian shores.  So, if you like to contribute to the community and want to be part of a really active techno-geek group, STEP UP!






Now, for the lighter side of my posts.  There is such a thing as a lighter side of life!



Almost every Summer we get these beautiful guests in our garden.  They are StAndrews spiders, around 3 inches in size.  Mostly they build their web in protected areas of the garden where wind and birds can't easily get to them.  And like this one, they pick up a partner and lay heaps of eggs. The partner unfortunately ends up as fodder...   
They are also some of the best weather predictors I've ever seen: want to know if tomorrow is going to rain?  Check the spider late evening.  If it is building a new web, it won't rain. Otherwise, it's virtually guaranteed it will!  100% hit rate, better than Connor's "Custom BCHR"!



 

Folks who follow me on FB know that we have a little pooch.  Above is Tiago, our now fully grown Great Dane.  All 75Kgs of him, against a young Irish Wolfhound who particularly loves to play with him at the local dog park.  He is indeed one huge dog, and quite docile to boot!



Perhaps the best way of conveying his size is to picture him next to something folks can relate to.  That is my daughter's Toyota 4Runner.  For those who know these cars, they are BIG.  Check Tiago against it!



He's also a major sook. Gentle as gentle can be, thank the Gods! Otherwise, something this big would be completely unmanageable.  Lovely animal, we love it to pieces. Came from Manning Danes and his dad is Ghostwalker, an absolute champion.  We don't show him - he's family.




Catchyalata folks, and keep smiling!

2014/03/04

Latest for the folks who have to deal with Peoplesoft

Dang, been a while since the last posts!  A lot of water under the bridge since then.

We've ditched a few people that were not really helping anything, and are now actively looking at cloud solutions, "big data" use, etcetc.

Meanwhile, there is the small detail that business as usual has to continue: it's very easy to parrot about the latest gimmick/feature/funtastic technology that will revolutionize how to boil water.
But if the monthly payroll and invoicing and purchasing fail, I can promise you a few HARD and concrete financial surprises!

Speaking of payroll...

A few years ago I made some posts here on the problem with PAYCALC runs in Peoplesoft and how hard it is to have correct CBO statistics for the multitude of scratchpad tables this software uses to store intermediate results during a pay calculation run - be it daily, weekly or monthly.  Those posts are regularly visited and comments added to them by folks experiencing the same problem.

(Fancy that!  A problem that is common to others and not caused by yours truly  - the "bad dba", according to marketeers from so many companies hellbent on selling us con-sultancy!)

This lack of statistics is mostly caused by the fact that the software truncates the tables at the end of the pay calc process. And does not calculate statistics after re-populating them with intermediate results during the next pay cycle!

With the obvious result that no matter how often and regularly a dba recalculates statistics on these tables, they cannot possibly ever be correct at the most important time!

One way around this I mentioned at the time was to actually turn on debugging for the modules that use those tables - assuming it's easy to find them, not all folks know how to do that!

The end result is Peoplesoft code will detect the debug flag and fire off a ANALYZE just after re-populating those tables.  Not perfect ( analyze if anything is a deprecated command!) but better than no stats!   At the end of the run, the output of the debug is simply thrown away.  Slow, but much better than before!

Another way was to force estimates to be taken for those tables by the CBO.  Again, not perfect.  But still better than pay calc runtimes in the tens of hours!

Enter our upgrade last year to Peoplesoft HCM and EPM 9.2!  Apart from using the "fusion" architecture for screen handling (slow as molasses...) this release introduces a CAPITAL improvement to all pay calc runs!

What Oracle has done is very simple: from 9.2 onwards, they do NOT truncate the scratchpad tables at the end of each pay calc!

So instead of guessing/analyzing stats for empty tables, the dbas can now analyze them with significant, relevant and timely data contents, do histograms if that is their penchant, etcetcetc!  With the immediate result that instead of the previous pot luck runtime for the paycalcs, now they are reliably constant in their duration and much, much faster! In simple terms: the CBO finally has SOMETHING accurate to base its "cost" on!


Man!  You mean to tell me it took Peoplesoft to become available in Oracle cloud for them to realize what was being done before was eminently STUPID and making it impossible for any dba to have a predictable paycalc execution time?


Apparently when it became their cloud responsibility, the solution was quickly found and implemented...


Of course, if they had actually LISTENED to the frequent and insistent complaints by dbas who actually deal with their software day to day for years on end rather than playing "ace" cards, things would have perhaps progressed a lot faster!... 



But let's ignore stupidity and rejoice: there is a light at the end of the tunnel and it's not incoming!

Get those upgrade juices going, folks, and try to convince your management that 9.2 is THE release of Peoplesoft to aim at.

It'll help everyone. (Including Oracle - but I don't think their marketing is smart enough to detect that...)





Anyways, on to funnier and hopefully nicer things.



As some know, I go to East Timor for holidays every time I can afford it.  Mostly because I grew up for a while there, many of my school time friends are still there and we are still very close.  But also because I get a chance to view things like the ones below, and actually visit these places.






Hope you enjoy them as much as I did!




The above was taken from Fazenda Algarve.  It belongs to the well known Carrascalão family and it is likely one of the largest coffee farms in the world.  Some of my good friends already departed are buried nearby. 
The flat empty concrete floor you see on the bottom is where the coffee beans are sun dried and lose their skin and pulp (coffee beans are the seed of a small, very sweet fruit).  Once dry, they are put in the tanks on the left and shaken until the dry husks fall off and the beans can be packed and sent for sale.
I don't know about you but with a view like this I wouldn't be doing much work!  The far mountains are around 40Kms away already in Indonesian territory and the dry river bed is the largest "river" of East Timor.  It's called "Lois".  I first saw this as a young teenager.  To now come back and see this magnificent, stunning  view once again is indeed a rare, truly cherished privilege!



And looking directly back from the previous image, is the above view.  That peak in the distance (around 20Ks) is Tata-Mai-Lau, aka Mt Ramelau, the highest point in the whole island: >3000m
A truly magnificent mountain from where the whole island can be seen at sunrise on a good, clear day.
I have not yet been back there.  It's a very long climb done at night and I'm no spring chicken anymore, so I have to pass on climbing it again without mechanical aid.  But recently a friend of mine climbed it for the first time and had an epiphany experience at the top during sunrise.  I don't think she'll ever forget it.  It is one of the most impressive views available anywhere in the world!




 Dunno about you, but I'd definitely like to have an office with a view like this!  But don't be deceived - that guy is not having fun, he's very likely collecting dinner for his family.  Otherwise the only thing they'll eat on that night will be boiled rice and coconut.  It's a hard life for the population over there, and why I always leave the place with a sharp pain in my heart...


Speaking of heart pain, that's my Yggdrasil.  Look it up, Norse mythology.  Part of my paternal family is descendent from a Norse crusader who joined the first crusade and who fought in Jerusalem and after settled in Portugal.  I lived for 3 years right behind this tree and there was not one morning I did not marvel at its size.  If anything it is even bigger now.
For the curious, it's a Banyan tree. Banyans are the national tree of the Republic of India, and for good reason: they are venerated over there as almost sacred.  The name was originally a Portuguese word.  It later was adopted into English and there are quite a few members of this family of trees around the world, including at least 3 varieties in Australia, one specifically from Sydney.  A really large Indian Banyan tree can grow to an hectare in width, by spreading across other trees as a parasite.
In Timor, they are known as "gondoeiro" and are thought to harbor good spirits from folks dead a long time ago.  Dunno, but everytime I visit Díli I go to this tree, hug it and talk to it.  It always feels warm and soft to the touch and against it I can only hear the breeze in its foliage.  Hallucination?  Maybe.  Don't care. It's my Yggdrasil and it's the closest thing in my life to true religion.



Catchyalata, folks. Do me a favour: smile and have fun!

2013/12/16

Back to shell scripting basics

Some quick and basic Unix and ksh stuff that crossed my path recently.

As part of the move of our Oracle dbs to our new P7+ hardware (more on that later...), I'm taking the opportunity to review and improve a few of the preventive maintenance and monitoring scripts I have floating around.

Some were written a few years ago by other dbas and have been added to by myself as needed.  Others were written from scratch by yours truly.

One of the things I do in most if not all of my ksh scripts is to add upfront a function called "alert".
This function is very short, basically just this:
function alert {
mhmail ${MAILTO} -subject "${ORACLE_SID} online" -body "fullbackup:${*}"
logit "email alert sent:${*}"
where MAILTO is set to:
export MAILTO='db_admin'
and of course db_admin is an email alias with all dba email addresses - both internal and external standby ones.


"logit" is another  internal function that just adds a message to whatever has been defined as the script's log file.  That can be a strict logfile or just simply file unit 2.

In a nutshell: whenever I want to be notified by email of a particular step or stage or error in any script, all I have to do is stick in a

alert "alert message"

and bang! up in my email and that of all other dbas comes up a new msg.

OK, all nice and dandy.

Been working for ages.

But this time in one of the new P7+ servers, all of a sudden, one of the scripts stopped sending me an email with its end status!

When I tested email in that server from the command line, all was well.  It was only from INSIDE my scripts that the mhmail command was not working.

WTH?  Could this be some P7+ weirdness?

A little bit of debugging with a test script confirmed that indeed the mhmail command in the alert function was not producing anything inside ANY script, but worked perfectly otherwise.

Time for some serious thinking cap stuff.  After pulling the debug concepts book out of the shelf and doing a lot of log-scanning, I found out that indeed the script was executing an "alert" command, it was just that the command and the function weren't apparently the same!

Weird and weirder...

Time to go back to basics and find out exactly what the "alert" command line was producing inside the script.  It turned out it was simply doing a "cd" to the trace directory in 11g: $ORACLE_BASE/diag/rdbms/$ORACLE_SID/$ORACLE_SID/trace!

It was then that the proverbial little light went on: as part of the review of the scripts, I had added a ksh command alias to my .profile file that did exactly that - change to the trace directory - and it was named "alert"!!!

Bingo, problem identified: 
an aliased command takes precedence over any function of the same name in  a ksh shell script.

Now, that is something I had never seen before nor was I aware of that particular trait of the Korn shell - I don't know if bash will do the same? Ah well, learn until I die...

Time to fix the root problem, once and for all:
  1. All alias names that do "cd" to somewhere are now prefixed with "go".
    or
  2. All script function names are now prefixed with  "f".
I did 1, but am thinking seriously of going with 2 as well. You know: tobesure-tobesure, just the Irish-dba in me coming to the surface! 

Dang, it's been a while since I had to do this much script debugging!  It definitely pays to stay up to date on shell script debugging tips and tricks.

 Anyways, nuff with the techo stuff!



This was made using the Microsoft Ice stitch software and a few of my recent photos from Jamison Valley in NSW.  AKA, the "Blue Mountains" national park on the outskirts of Sydney.




(I think blogspot shrinks even the small version.  See it here in full "small" glory)


That's the small version.  The full size version is >28000 pixels wide by >3000 high!

Basically, one plonks a normal or slight telephoto lens in a camera, and proceeds to take multiple photos while rotating over as much an angle as needed.  This one was nearly 160 degrees across and it was made up of 6 separate images.  The Ice software then takes care of equalizing distortions and slight differences in luminosity and colour balance between all the images and out comes a really huge pano.

Printed and spread on a wall, it makes for a very nice decoration for those who cannot afford daVinci paintings!


The photo was taken from the lookout behind and over the Three Sisters, which you can see here:


The top of the first of those 3 knolls can be seen front plan around the middle of the panorama shot.

I can see myself liking this stitched panorama stuff!


Catchyalata, folks!

2013/07/27

On SQL Developer

First of all, apologies for the long delay between posts.  We've been evaluating various avenues for this cycle of hardware and software refresh in our data centres and for obvious reasons I could not make any public postings or comments that might be mis-interpreted by the various suppliers.
    
In these situations there is a confidentiality protocol that must be followed and simply cannot be broken. Hence the absence. We've now reached the end of the first phase. More on that front soon.

Followers of this blog would have noticed I started using Twitter a few months ago - @wizofoz2k.  Mostly as a result of Paul Vallée and Yury Velikanov from Pythian shaming me to join that social site.  For nearly 10 years I could not be part of the emerging "social media" phenomena, due to my military clearance mandate between 2001 and 2004.

But that is now all gone and I can participate in these sites as any normal person - and also present at conferences and such.  Which I'll be doing this year at the INSYNC13 AUSOUG event here in Sydney!

But, I digress. Back to the core of this post.

It is a result of a recent interchange between yours truly and Jeff Smith, of SQLDeveloper fame.

In a nutshell:

A few years ago I asked our folks to start using SQLDeveloper as an alternative to more Toad licensing.  SQLDeveloper is of course free - in fact, one of the very few pieces of excellent software unca Lazza hasn't yet managed to license to insane expensive levels!  (I'd likely should not have mentioned that...)

As a result, they started to use it more and more.  Now it's one of the essential pieces of software in their arsenal.  Note: these are internal developers - not contractors or temps.  As such, they have slightly different - and higher - status than the "duhvelopers" one hears about around the place.

They are charged with fixing problems in our inhouse apps - mostly DW/BI but also some bespoke ones. Which from time to time involves connecting to production databases as the schema owner, to fix problems that cannot be fixed any other way in a usable timeframe.

Now, the problem with using SQLDeveloper for that is: it is essentially a development tool
(Duh, name says "Developer", doesn't it?).
In that sense, it does have optimizations for development work that are not necessarily compatible with the workflow of a production environment.  In particular, it does things like keeping cursors open for as long as possible, placing locks on tables and other objects to avoid them changing under the developer's feet.  And so on.

All legitimate. And very, very desirable.  In a development environment.

Do I need to explain what sort of an impact this type of approach might have in a production database? Imagine a long running overnight PL/SQL process being locked out of DDL or change access to a table because someone has left a SQLDeveloper window open with a SELECT FOR UPDATE cursor in it?

Yeah.  That sort of thing.  Dumb thing to do, yes.  But, hey: s**t happens! And in the best families.  A big part of my job is to ensure bad effects of those are minimized or eliminated.

So, it's not a great idea to let developers use SQLDeveloper in a production environment.  Sure, sometimes they might absolutely need to - emergencies.  Otherwise, no way Josay!

Now, how to get around the issue that one thing is what is desirable, the other is what folks do?  Yes, of course: like any other place, our developers from time to time go into our production DW db and leave some query window open overnight in their desktops.  With the result that we get some major delays in overnight processing, or even major lockouts.

6 years ago I implemented a crontab job that killed all SQLDeveloper and Toad sessions in production after 6pm.  Our folks quickly learned to not do silly things like leave uncommitted windows in Toad and SQLDeveloper.  And in the interest of good relations, I removed the crontab job 2.5 years ago.

But we are still getting the odd overnight processing overruns.  Nothing dramatic, simply unexplained.  After a long exercise in monitoring and checks by one of the app experts in our team, it became apparent the issue was caused by SQLDeveloper windows with PQ SQL "grabbing" a bunch of "pnnn" background processes Oracle uses to perform parallel queries. And of course not letting overnight processes grab as much of those as needed for smooth, fast execution.

Those "pnnn" processes don't come in infinite numbers, so we had to manage that somehow. I didn't want to go back to the 6pm "kill them all" approach: some legitimate instances of their work would be affected by that.  Locking them out of access to the main schema(s) wouldn't be possible either: they are not external people and their use of those schemas *is* necessary.

Enter Twitter and Jeff!

A few days ago, I had the idea of getting in touch with him and asking what his advice would be to handle the problem.  Interestingly, Vishal Gupta jumped in on the conversation as well with the same PQ problem: it seems to be more widespread than I initially thought.

Now, Jeff heads one of the few Java developer teams I trust.  Mostly because I've seen how good SQLDeveloper is and how often requests for improvements get heard and acted upon - by contrast with ConFusion and the imbecile dystopia that is J2EE.

And Jeff is very open to comments and interaction in Twitter - which is indeed a breath of fresh air!

So what did we talk about, then?  To cut a long story short - the whole conversation is visible in our twitter accounts - Jeff suggested I use a login trigger to control what folks can do with PQ.  And that was indeed the way to handle this problem!

We already use a login trigger to set current_schema by session, thereby avoiding the need to code everything with <owner>.<object> syntax or the need for public synonyms.  Child's play to query the PROGRAM column in v$session at login time and add in more session control if it's "SQL Developer" or "Toad.exe".

And of course: if  one absolutely needs to use PQ, one can always enable it back in the SQLDeveloper window sessions!

Thanks heaps, Jeff!

So: we wanted to make sure no one accidentally used extensive parallel query in production through SQLDeveloper.  While still allowing legitimate and controlled use of the feature and not interfering in any way with the existing processes and their access.

How to actually do that turned out to be interesting!

There is no easy way to turn off parallel query at session level, even in Oracle 11gr2!  This basically adds to my long running gripe with Oracle development in not providing enhancements to the CONTEXT functionality to allow full session parameter control - not just half a dozen session info hooks.

Like, I'm not paying enough maintenance fees already so I can have one of my "desirables" considered for inclusion? Hey, I'm not even an "Ace travel points" expense - and I refuse to be one!

But, that is another story and I digress...

The only avenue open was apparently to use the db resource manager and create a special profile that would turn off parallel processing.  But, here was the problem: resource profiles work by becoming attached to a login.  And of course both developers and our overnight processing use the same schema login!  So if we turned PQ off for one group, we'd turn it off for everyone!

Of course, that was a no go!

Alternatives?  Well, all the parameters that control parallel query up to 11gr2 are essentially SYSTEM-wide ones: they cannot be changed at session level!

Clearly, we needed something more subtle. And it turns out the automatic parallel query tuning in 11gr2 was just the ticket!

Three SESSION-level parameters control that. 

PARALLEL_ADAPTIVE_MULTI_USER turns on the adaptive parallel query tuning. 
PARALLEL_DEGREE_POLICY controls if adaptive query tuning takes place manually - essentially controlled by the "DEGREE" of the table, same as in previous releases - or automatically, with a couple of subtleties thrown in.
PARALLEL_DEGREE_LIMIT throttles the amount of parallel query processing from a number linked to the CPU cores assigned to the db, to IO-balanced, to a fixed number > 1.

One of my long-term tasks for this year was precisely to start using the adaptive parallel query tuning in 11gr2: one of the reasons we upgraded everything to that release!  And of course adaptive query tuning can be set by default to "same as always", for those less adventurous.

There are heaps of posts out there on how to control auto DOP, the adaptive query parallel tuning environment and how to avoid potential problems/bug/quirks/oddities.  So I won't repeat all that jazz here, now.  Google it, if it's unknown to you.

What was the end result, then?

Well, I turned on the adaptive parallel query tuning option in our acceptance test db.
Then I did set PARALLEL_DEGREE_POLICY to AUTO system wide.
PARALLEL_DEGREE_LIMIT was set to CPU system wide.
And the login trigger now looks up the PROGRAM column of v$session - for the row corresponding to the login session - and set PARALLEL_DEGREE_LIMIT to 2 if it is SQLDeveloper or Toad.  Parallel 2 is OK for reasonably fast emergency queries without causing any undue extensive grabbing of parallel query service processes.

And it all worked peachy!   I'm going to leave it on for a month or so and follow the performance impact on overnight processing.  So far, it's been exactly what the doc ordered!  No more "strange" overruns!

If it all comes together as I expect, by September we'll include this in our normal production DW database. And hopefully the problem will be gone once and for all!

I guess joining Twitter paid off after all.  And thanks heaps once again to Jeff Smith for his prompt follow-up on the problem and genuine willingness to help find a solution.

If only more people at Oracle development descended from the high stratosphere they imagine they live in and did what Jeff does - *listen* to customers! Not "aces"! - things would be a lot easier for Oracle sales...


Anyways, nuff shop!  Some imagery to lighten up the mood.

This one I call "Swift Argument", although folks at work insist it's me having a DBAReaction:

:)

We had an office "Giveability" day recently.  Which involved lots of balloons.
So, what happens when someone turns a bunch of balloons lose in an office full of geeks?
I'll let the following images speak for themselves...




"I can fly, I can fly, I'm a beautiful butterfly!"
:)


Catchyalata, folks!

2013/04/02

more trouble with details...

I've said it before here and will say it again: the Oracle implementation of proxy logins is flawed from the start.

I've given at least one demonstration of why.  And why using ALTER SESSION SET CURRENT_SCHEMA is a much more manageable and secure alternative.

Despite that, folks insist and persist on using proxies...

Consider Tom Kyte's article on the latest Oracle magazine, with the examples of login A and login B.  I won't repeat the article here.  Just go read it - or better yet, open the mag next to the screen and look here and there. Yeah, like that. Works for me! ;)

In the article our technologist claims that this is the correct way to get a login to impersonate another.  Nothing wrong with the article, mind you!
It might be on paper and manuals, but day to day practicalities say it isn't, hence my disagreement.

For example of just a few glaring problems with the proxy login approach:

  1. If we use the construct exampled by Tom in the article, we must connect to the database as
    connect a[b]/a.
    Ie, a completely non-standard, non-intuitive connect command, with the login name defined as a[b] - how many tools/apps do you know of that simply won't accept that sort of syntax for a login name?  I know a few, try any odbc connection and watch what happens.  Worse yet: try to explain to an end user their login name is a[b] from now on?  Good luck...
  2. Once we are connected, access to system privileges - and ONLY those! - is controlled by a role that MUST be assigned to login B.  Not login A!  Ie, we want to restrict what login A can do - or indeed any other login that might also be a proxy - but we define that role and attach it to login B? It should be associated with A - and any other users that may need to be attached to login B.  Not to login B.
  3. The role restriction is incomplete.  It works fine for system privileges, but it does NOT stop login A for example from being able to see all rows in ALL tables of login B!
    Try it - repeat the example from the Oracle magazine in your system, only this time add only SELECT privilege on a suitable table owned by login B.  Then login as A[B].
    You'll notice that regardless of what you put in the role, you'll be able to see ALL tables of login B, and SELECT/INSERT/UPDATE/DELETE from them all!
  4. Worse: now go into v$session and v$process and see if you can see login A anywhere?  It's simply not there!  What you see is an additional login B.  So much for making it easy to identify who is who logged on to the database, eh?
By comparison, were we to use what I showed here a while ago and create login A as a simple user, then create a role, assign whatever we want to it - yes, system privileges, table access privileges, whatever! - then grant the role to login A only, we'd be able to simply create a login trigger for login A that does a

ALTER SESSION SET CURRENT_SCHEMA=B,

We would then be able to login to A - no need for fancy syntax - then we would be able to see all tables from login B that were granted to the role - and ONLY THOSE - as well as having whatever system privileges were granted to the role.  And anyone using v$session/v$process to monitor who is logged in, would see a login A - not login B!

I still think if Oracle made CURRENT_SCHEMA one of the context-controlled variables, the whole process would even be simpler: just give the correct schema to a custom context - together with any other variables, roles, etc, then simply make a login use that context as a default, like it is already possible to do. 

But I'll settle for a connect trigger for the time being, in the absence of  a better solution.

There are likely a lot of other examples of why using a login trigger doing an alter session set current_schema is a much better solution than the machiavelic proxy login a-la Oracle.
But to me the above are more than enough to shoot down the whole concept and use instead my approach.

And I couldn't care less if "triggers are evil" - or whatever the latest crusade is.

Sorry, but like I said before: the campaign to eradicate trigger use is ill-guided and ill-directed!
Yes, without a shadow of a doubt: triggers can be dangerous.
That is NOT a synonym for always dangerous!

On another note:

A coupla years ago I posted a performance screenshot from one of our prod db servers, at a supposedly idle time - Xmas eve, lunch-time.  In it was clearly shown the system was under quite a lot of I/O load, although not under any real stress.

That was enough for a lot of comments to start in the intervening time as to how "some DBAs never strive to improve their systems" and "always resist any change".  From the usual FUD sources.
Coincidental of course with Oracle trying to flog a bunch of Exa* whatever boxes on us, against my recommendations...  ("You're a *bad* dba, Nuno!" was the least that was invoked back then...).

Well, this was taken a few weeks ago during a restore from a backup into our development DW server, as part of a regular monthly refresh from the production backup.  Bear in mind it's a dev server, with no particular emphasis on its overall performance.



OK, so in two years we went from 300MB/s overall aggregate in production to 800MB/s in a non-critical, non-production server, with 670MB/s of those being in WRITES - not READS! - and with an overall busy time for disk devices of 21% while the CPU is nearly flat out.
You should see the production box now...

Kevin Closson would - rightly so! - say we're CPU-starved.  Of course.  But then again this is a restore from a compressed backup, where CPU in user mode is being pushed to expand the data.  And it is only run once a month, so I'm not really that worried about what it uses then - for the 3/4hour it lasts. Not worth any battles to fix that!

This was achieved with NO upgrade to hardware whatsoever.  The OS was upgraded to Aix 7.1 and Oracle upgraded to 11.2.0.3, that's all.  Plus a lot of work on how we split our Aix box into lpars and what is assigned to each in terms of memory, CPU, disk controllers, SAN LUNs, etcetc.
Cost?  3/4 of NULL == NULL.  Other than my time of course, which is part of my normal salary - I'm not a contractor here.

We do not strive for improvements, eh?  Sure...  Any other pearls of idiocy?


Anyways, on to some fun.  This fella showed up in a local park a while ago:


Lovely animals.  And of course, Victoria the Littoria is still around:


Catchyalata, folks!