2011/04/13

How to Kill session in Oracle

The command 'alter system kill session' is not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it! Normally the target sessions are nice and check that bit often enough in their code, act on it and die. But sometimes when the target session happens to be busy looping in some tight loop (due a bug perhaps) or is hung, then it never gets to check that “please die” bit and never exits. This is why DBAs often need to kill the OS process or thread via OS tools to get rid of that session (and its locks, transactions) as when you kill the OS process, PMON will detect it (if not fast enough then it can be woken up via ORADEBUG WAKEUP call few times) and clean up after that session.

Steps to kill session in Oracle:
1. Use 'alter system kill session' to kill session.
2. If that doesn't work immediately then check whether the target session has acknowleged the kill and rolling back its transaction.

SQL>select used_urec from v$transaction where addr = (select taddr from v$session where sid= and serial#=);
3. If there is no rolling back happening an session just seems to be stuck, then it's time to kill that session's process from OS level.
4. If couple of minutes after killing the process from OS level that session is still not released (no rollback), then attaching to the OS process with oradebug and run 'ORADEBUG WAKEUP 2' couple of times and check is the session is gone. The "2" means Oracle PID of PMON process which is usually 2, but you should check it from your v$process view.

SQL>oradebug setospid ;
SQL>oradebug wakeup 2;
5. If step 4 does not work, try to kill the oracle process using oradebug.

SQL>oradebug setospid ;
SQL>oradebug event immediate crash;

2011/04/08

Logging Attributes in Create Table Statement

Tables created with logging or nologging clause will stay in that way. If the logging clause is not used when creating table (other than LOBs), it defaults to the logging attribute of the tablespace in which it resides.

With respect to the nologging option, three benefits listed in the Administrator's Guide are:

  • Space is saved in the redo log files
  • The time it takes to create the table is decreased
  • Performance improves for parallel creation of large tables

The logging clause lets you specify whether creation of a database object will be logged in the redo log file (LOGGING) or not (NOLOGGING). If you create a table with NOLOGGING, but cannot afford to lose the data (which implicitly means you need the table it is stored in), the first step after the data load is complete is to take a backup.

The NOLOGGING clause also specifies that subsequent direct loads using SQL*Loader and direct load INSERT operations are not logged. Subsequent DML statements (UPDATE, DELETE, and conventional path insert) are unaffected by the NOLOGGING attribute of the table and generate redo.

2011/02/04

Why (on Linux) am I seeing so much RAM usage?

This is a guest post by Dayid http://dayid.org

There are better options to see your memory usage; however it seems `free` is more attune to creating the confusion I’m attempting to quell here. That said, see the Redhat docs about /proc/meminfo

Other commands to use to see memory usage

$ vmstat -aS M #see the "inactive" column for a rough "free" idea.
The real answer
There’s no reason to clear what’s in RAM until you need more space to write to it.


The short answer analogy
Buffers and cache in RAM being cleared is silly. Imagine a professor, who rather than writing all the way across the chalkboard, finishes a sentence and immediately erases and starts writing in the upper left corner AGAIN and AGAIN and AGAIN.

OR imagine you like a song. You record it to the beginning of a cassette tape. When you want a new song, do you re-record over the first song or record after it?

AKA: The horrible House/Barn analogy
Many people new to Linux or computers in general have a poor understanding of how RAM works. On Linux systems, most users will look at `top` or use `free` to see the amount of memory installed and/or free. Below is an example:

dayid@emiline ~ $ free -m
                   total  used  free   shared  buffers   cached
Mem:                2024  1970    53        0       19     1669
-/+ buffers/cache:         281  1742
Swap:               1953     4  1948
At first glance, they may look at their machine with 2GB of RAM and wonder how they only have 53MB free! While this is true, the surprise, fear, or angst about this comes from a misunderstanding.

We could take a trip to a million places for this horrible analogy, but let’s pretend we’re on a country farm.
Rather than working with 2024MB of RAM and 1953MB of SWAP, we’ll say we’ve got 20 beds in the house, and 20 beds in the barn.
Rather than programs we’ll have people occupying the space.
For our purposes, ignore costs of cleaning the bedding, water, etc.
The house can hold active workers or non-active workers.

Due to its distance and the time to get to/from it, the barn can only hold non-active workers. When a worker is called from the barn they will have to pass through the house and stay in the house while they work.

10 laborers show up to a job. Since the house is closer to the food, showers, and work they’ll be doing, we let them stay in the house.
10 of our 20 beds are used by active workers.
Our farm in `free -m`:

                   total   used   free   shared   buffers   cached
House:                20     10     10        0         0        0
-/+ buffers/cache:           10     10
Barn:                 20      0     20
8 more people show up for another job. They also stay in the house since we have the space for them.
18 of our 20 beds are used by active workers.
Our farm in `free -m`:

                   total   used   free   shared   buffers   cached
House:                20     18      2        0         0        0
-/+ buffers/cache:           18      2
Barn:                 20      0     20
The first job is over, we no longer need to keep around the first 10 laborers; however, letting them stay doesn’t cost us anything, as if they weren’t there the beds would just be empty (i.e., go to waste).
18 of 20 beds are used. 8 by active workers, 10 by non-active workers.
Our farm in `free -m`:

                   total   used   free   shared   buffers   cached
House:                20     18      2        0         0       10
-/+ buffers/cache:            8     12
Barn:                 20      0     20
Let’s take a timeout and review the above output. Right now we have 20 rooms. 18 are being used so only 2 are free. However, since 10 workers aren’t being used, they are in cache – kept around because we have no reason to kick them out – so, we actually have an operating space of 12 workers we could hire. 2 to stay in the unused rooms, and 10 to replace those that are already here.
We have a new job on the farm, so we have 4 new people show up. We do not have enough beds for them. 2 of the 10 who are not active leave. We move in those 4 new people.
20 of 20 beds are used. 12 by active workers, 8 by non-active workers.
Our farm:

                   total   used   free   shared   buffers   cached
House:                20     20      0        0         0        8
-/+ buffers/cache:           12      8
Barn:                 20      0     20
Right now we have 20 rooms filled. 8 are filled by people who aren’t working though, so technically we have 8 beds we can use if we need to. Now let’s get crazy.
It’s production season and we have a lot to do around the farm. We setup another program and need to hire 14 new workers for it. We’ll have to kick out the 8 non-active workers and move in 8 of the new workers. However, because we run out of rooms in the house, our least important workers will have to stay in the barn. The barn is still good storing area, but it will take them longer to get to and from the job each time they are required to.
20 of 20 beds are used by active workers. 6 rooms in the barn are used.
Our farm:

                   total   used   free   shared   buffers   cached
House:                20     20      0        0         0        0
-/+ buffers/cache:           20      0
Barn:                 20      6     14
Now, things calm down again and only 4 workers are going to remain active. We’re not going to toss out the rest though as they’re not harming anything just taking up space (at least not until we need the space again)
Our farm:

                   total   used   free   shared   buffers   cached
House:                20     20      0        0         0       16
-/+ buffers/cache:            4     16
Barn:                 20      6     14
That’s right, our “free” stays 0, as we still have no space available. The important thing to look at here is how much do we have available if we clean out the buffers and cache – which are not necessary to keep, but we generally keep until it needs to be discard.

2011/01/06

Resolving Shutdown Immediate Hang Situations

Copied from here:
http://askdba.org/weblog/2008/05/shutdown-immediate-hang-2/

Shutdown immediate can take long time to complete (appear to be hung) because of three reasons:

1. Uncommitted transactions are being rolled back.
2. SMON is cleaning temp segments or performing delayed block cleanouts.
3. Processes still continue to be connected to the database and do not terminate.

1. Uncommitted transactions are being rolled back:

This is the case when the message ‘Waiting for smon to disable tx recovery’ is posted in the alert log after we issue shutdown immediate.

There are two reasons for this:
- A large query was running at the time of shutdown immediate.
-A large transaction was running at the time of shutdown immediate.

For large queries:

SQL > select count(*) from v$session_longops where time_remaining>0;
If it returns a value > 0 then we can do a shutdown abort and then startup restrict and then again shutdown immediate.

For large transactions:

SQL > select sum(used_ublk) from v$transaction;
If it returns a large value then we have to wait for a long time for shutdowm to get completed.
If the large transaction is aborted and then shutdown is issued then we have to query v$fast_start_transactions and v$fast_start_server, we will not see anything in v$transaction at this time.

At this particular moment transaction recovery is going on and the count(*) will keep on decreasing:

SQL > select count(*) from v$fast_start_transaction;
Decreasing count will show that recovery is going on and when the recovery is completed the database will be shutdown.

But it is not desirable under some circumstances such as, when we have very short maintance window and we need to perform a shutdown immediate to do some work, in those cases we can use the following event and set in the init.ora file TEMPERORARLY To disable transaction recovery:

event=”10513 trace name context forever, level 2″
and bounce the instance and issue shutdown immediate to get complete without transaction recovery.SMON will not do a transaction recovery untill this event is set in the init.ora file so it is necessary to remove this event whenever you get a chance to shutdown the database again, this time shutdown immediate can even take 3-5 hours(Just remove this event from pfile).

2. SMON is cleaning temp segments or performing delayed block cleanouts:

During a SHUTDOWN IMMEDIATE and SHUTDOWN NORMAL, SMON cleans up extents which are no longer needed and marking them as freed. It means that count from uet$ will decrease and count in fet$ will increase.

To verify that the temporary segments are decreasing have an active session available in SQL during the SHUTDOWN IMMEDIATE. Run query to ensure the database is not hanging, but is actually perform extent cleanup:

SQL> select count(block#) from fet$;
COUNT(BLOCK)
----------
115

SQL> select count(block#) from uet$;
COUNT(BLOCK)
----------
713
After some time, issue the query again and check the results:

SQL> select count(block#) from fet$;
COUNT(BLOCK)
----------
210

SQL > select count(block#) from uet$;
COUNT(BLOCK)
----------
512
If you do not have sufficient time to wait for this cleanup then you can set the following event and bounce the database and reissue shutdown immediate to skip this cleanup:

event=”10061 trace name context forever, level 10″
It allows you to prevent SMON from cleaning up temporary segments. Again it is not recommended to set this event event forever. Whenever you have large downtime remove this event and allow SMON to do its work.

3. Processes still continue to be connected to the database and do not terminate:

After issuing shutdown immediate, If we see entries in alert log file as:

Tue Jan  8 12:00:27 2008
Active call for process 10071 user 'oracle' program 'oracle@server.domain.abc (J001)'
SHUTDOWN: waiting for active calls to complete.
Tue Jan  8 12:00:57 2008

SHUTDOWN: Active sessions prevent database close operation
It shows that there are some active calls at program ‘oracle@server.domain.abc (J001)’ which pmon is not able to clear up.This message is due to the fact that database is waiting for pmon to clean up processes, but pmon is unable to clean them. The client connections to the server are causing the shutdown immediate or normal to hang. Do the following in this case:

1. Before shutdown immediate, shutdown the listener:

$ lsnrctl stop
2. Now check if there are any connection present at the database as:

$ ps -eaf | grep LOCAL
It will give you the OSPIDs of the client connected to database.

3 Manually kill them as:

# Kill -9 
4. Issue shutdown immediate now.

Do not forget to bring up the listener after startup

In addition to this you can set 10046 event in the session used to shutdown the instance. This will help to tell the event on which session is waiting

SQL>alter session set events '10046 trace name context forever, level 12'

SQL>Shutdown immediate;
Look for the trace file in user_dump_dest location. Also look at the alert.log for any other messages. They might be helpful in case the shutdown is experiencing hang situation.