Posts from February 2006

small is beautiful

I had a beautifully crafted draft that said...

So MySpace has a staggering 55 million users (well 54 million angst ridden, teen blogs with garish colours, dodgy photos and flash animations galore) while the more recently launched WordPress.com has a mere 110,000 blogs but a far more discerning, perceptive, technically minded and intelligent user base. Quality not quantity.

...but now I see the most popular WordPress tag is 'LECTURES' so maybe WordPress also has 109,998 students, a Microsoft blogging evangelist and me.

'Fancy a drink after this lecture ? No thanks. I simply must go back to the library and blog about it.'

to whom it may concern

To the good people who were searching for answers and somehow blundered onto my blog, here is my considered response.

excuse for no homework

Sorry but please stay behind after class and write out 100 times 'There is simply no excuse'. However, I sincerely hope you did indeed try your luck with my suggestion and would dearly love to know the outcome.

robbie fowler four times fingers

Hmm. Don't tell me you turned off at half-time too ?

tom kyte london 2006

Unconfirmed but I would love it, absolutely love it.

benfica street map

Ask at the hotel desk. Tell them you have an important football match to watch.

can you convert existing car aerial to w

Sorry but WordPress tantalisingly truncated your question. See also 'statspack truncated SQL query text'.

PAUL SCHOLES WIFE

Naughty. And stop shouting.

krispy kreme doughnuts london waterloo

Eat yourself fitter.

what to know about baa security

IP address divulged to UK security services. Sorry but I am not a member of the 'We are not afraid' cult.

bbc boris chicken liverpool

Eh ? Seek help.

Best marketing campaigns ever

Well - did you agree with my verdict ?

new MOT test joke

Come on. I am holding my sides in anticipation.

persian kittens teething

You see. There are people out there who blog about Persian cats.

quickie at work

Tsk. Tsk.

accumulator principle

I hate to disappoint you but the only person getting rich from this program is the mentor.

avoid paying friends reunited

Oh come on, Janet. Surely you can afford 10 GBP to get my email address.

write query to produce statspack report

Don't bother. Use 'spreport.sql'.

Getting Things Don - spelled correctly

gtd

When you find typos in Dave Allens Getting Thing Done book, its a little deflating.

Your mind which was starting to feel like smooth, unrippled water, becomes more like a dripping tap.

Why, I haven't felt like this since I discovered Paul McKenna is losing all his hair despite the power of mind over matter.

  • p46. 'At at' -> 'And at'
  • p49. 'intution' -> 'intuition'

Any may your God go with you.

introduction to ETL for Oracle

There are a variety of methods to extract, transform and load (ETL) data in and out of an Oracle database. This short article provides an overview of some of the basic options that are supplied with Oracle.

L is for Load

Firstly, we need a table to load data into. This SQL script creates a table called PERSON.

    create table person
    (person_id number(12),
    forename varchar2(200),
    surname varchar2(200),
    title varchar2(10),
    email varchar2(100),
    gender varchar2(1),
    ni_number varchar(30),
    address1 varchar2(100),
    address2 varchar2(100),
    town varchar2(100),
    county varchar2(100),
    postcode varchar2(10),
    country varchar2(100),
    phone varchar2(100),
    mobile varchar2(100),
    salary number(12),
    date_of_birth date);

The following PL/SQL procedure may be used to populate this table with an arbitrary amount of data.

    create or replace
    procedure load_person(p_records in pls_integer)
    is
    v_person_id person.person_id%type;
    begin
    for i in 1 .. p_records
    loop
    v_person_id := to_char(i);
    insert into person
    (person_id,
    forename,
    surname,
    title,
    email,
    gender,
    ni_number,
    address1,
    address2,
    town,
    county,
    postcode,
    country,
    phone,
    mobile,
    salary,
    date_of_birth)
    values
    (v_person_id,   /* person_id */
    'Norman',     /* forename */
    'Brightside',    /* surname */
    'Mr',    /* title */
    'norman@brightside.com',  /* email */
    'M',    /* gender */
    'AA123456B',   /* ni_number */
    'Leafy Cottage',    /* address1 */
    '1 New Street',   /* address2 */
    'Anytown',    /* town */
    'Anycounty',    /* county */
    'NN12 9ZZ',    /* postcode */
    'UK',     /* country */
    '0208444555',   /* phone */
    '07789123321',   /* mobile */
    20000,    /* salary */
    '01-JAN-1970'   /* DOB */
    );
    end loop;
    commit;
    end;

Loading 1,000,000 records took 1 min 54 secs running Oracle 10g (10.2.1) on my home PC.

One obvious deficiency in the PL/SQL is that all records (and the SQL statement) are identical. In practice, this will not be the case. Therefore, it may be beneficial to replace the literal SQL with bind variables using using native, dynamic SQL.

    create or replace
    procedure load_bind(p_records in pls_integer)
    is
    sql_stmt varchar2(2000);
    v_person_id person.person_id%type;
    v_forename person.forename%type := 'Norman';
    v_surname person.surname%type := 'Brightside';
    v_title person.title%type := 'Mr';
    v_email person.email%type := 'norman@brightside.com';
    v_gender person.gender%type := 'M';
    v_ni_number person.ni_number%type := 'AA123456B';
    v_address1 person.address1%type := 'Leafy Cottage';
    v_address2 person.address2%type := '1 New Street';
    v_town person.town%type := 'Anytown';
    v_county person.county%type := 'Anycounty';
    v_postcode person.postcode%type := 'NN12 9ZZ';
    v_country person.country%type := 'UK';
    v_phone person.phone%type := '0208444555';
    v_mobile person.mobile%type := '07789123321';
    v_salary person.salary%type := 20000;
    v_date_of_birth person.date_of_birth%type := to_date('01-JAN-1970', 'DD-MON-YYYY');
    begin
    sql_stmt :=
    'insert into person
    (person_id,
    forename,
    surname,
    title,
    email,
    gender,
    ni_number,
    address1,
    address2,
    town,
    county,
    postcode,
    country,
    phone,
    mobile,
    salary,
    date_of_birth)
    values (:v_person_id,
    :v_forename,
    :v_surname,
    :v_title,
    :v_email,
    :v_gender,
    :v_ni_number,
    :v_address1,
    :v_address2,
    :v_town,
    :v_county,
    :v_postcode,
    :v_country,
    :v_phone,
    :v_mobile,
    :v_salary,
    :v_date_of_birth)';
    for i in 1 .. p_records
    loop
    v_person_id := to_char(i);
    execute immediate sql_stmt using
    v_person_id,
    v_forename,
    v_surname,
    v_title,
    v_email,
    v_gender,
    v_ni_number,
    v_address1,
    v_address2,
    v_town,
    v_county,
    v_postcode,
    v_country,
    v_phone,
    v_mobile,
    v_salary,
    v_date_of_birth;
    end loop;
    commit;
    end;
    /
    show errors

The modified PL/SQL procedure took 2 mins 1 sec for 1 million records. This doesn't imply that bind variables are worse than literals. More likely, that parsing overhead is relatively small for insert statements.

E is for Extract

People often ask why Oracle do not provide an equivalent SQL*Unloader to dump tables to delimited files. Well, the standard SQL*Plus utility may be used for such a purpose. Tom Kyte created some generic wrapper scripts (for both Unix and Windows) to dump an Oracle table to a delimited text file.

The 'flat' utility was used to dump the contents of the PERSON table into a comma separated value (CSV) file containing 1 million (identical) records.

SQL*Loader was then used to immediately load the data back in to the PERSON table again using this control file.

    LOAD DATA
    TRUNCATE
    INTO TABLE person
    FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
    TRAILING NULLCOLS
    (person_id,
    forename,
    surname,
    title,
    email,
    gender,
    ni_number,
    address1,
    address2,
    town,
    county,
    postcode,
    country,
    phone,
    mobile,
    salary,
    date_of_birth
    )

The time taken was 1 min 25 secs for conventional mode and 28 seconds when specifying DIRECT=TRUE for a direct path load.

Another interesting and often overlooked method for loading data is the use of external tables. This allows the creation of a virtual read-only table which is mapped directly to the host file using a CREATE TABLE syntax (similar to that for SQL*Loader control file).

This SQL script creates an external table which can then be analysed using standard SQL query tools without having to actually load a database table.

    create table person_ext
    (person_id number(12),
    forename varchar2(200),
    surname varchar2(200),
    title varchar2(10),
    email varchar2(100),
    gender varchar2(1),
    ni_number varchar(30),
    address1 varchar2(100),
    address2 varchar2(100),
    town varchar2(100),
    county varchar2(100),
    postcode varchar2(10),
    country varchar2(100),
    phone varchar2(100),
    mobile varchar2(100),
    salary number(12),
    date_of_birth date)
    ORGANIZATION EXTERNAL
    (TYPE ORACLE_LOADER
    DEFAULT DIRECTORY data_dir
    ACCESS PARAMETERS (FIELDS TERMINATED BY ',')
    LOCATION ('person.dat')
    );

It is then trivial to load data from the external table into a conventional Oracle table as follows:

SQL> insert into person select * from person_ext;

This method of data loading took 39 secs.

T is for Transform

And deliberately left until last. Extracting and loading data is relatively easy. Often the time consuming and complicated task in ETL is to identify, transform and potentially clean the data items from the source system and map them to tables in the target database.

poor mans satellite navigation

uk

Buy one of these for 10 GBP. Give to partner as early Valentine present. Jot down following notes as contingency in case navigator happens to be asleep during crucial section around Antwerp ring road.

Folkestone. Eurotunnel to Calais
E40 Ostende
E40 Bruges
E40 Gent
E17 Antwerp
E34 Eindhoven

If you note the time and zero the trip-o-meter, you can even get statistics on your journey.

breath of fresh air

Its easy to get carried away with all this Web 2.0 nonsense. So while you are trawling the blogosphere, desperately looking for invites to CoComment and seeing whether 30 Boxes does indeed live up to all the hype, pause for a moment and look at two of my favourite blogs that offer a slightly different perspective on Web 2.0.

Go Flock Yourself - Witty, incisive, opinioniated comment. Always worth reading. The blog title alone still brings a smile.

Squash - Phil Sim also has some interesting, thought provoking views from Down Under, albeit with slightly less profanity. I have to say I find myself agreeing with Phil's recent assessment of 30 Boxes.

the black stuff

uk

Had a drink in The Plough in Hemel Hempstead last night.

Conveniently located for the Buncefield Oil Refinery, The Plough offers fine ales and traditional English fare.

Bad news: Guinness was off.

Good news. Bitter and lager looked and tasted just like Guinness

Food: Sausages and mash with black pudding. Gravy was greasy, a little oily in fact.

email harassment (from Microsoft)

The BBC reports that 1 in 8 people have been harassed by email. But what is amusing about this story is the screenshot that shows an Inbox displaying emails originating from 'support@microsoft.com'.

Careful. Microsoft is a large, powerful American company. I wouldn't be surprised if their lawyers are not initiating legal action as we speak.

The story itself reminds me about of the old joke about sexual harassment in the workplace where the employee goes to complain to Human Resources about not getting any.

heard it on the Newsvine

Yesterday, I stumbled across Newsvine which is a very impressive news aggregator. Until now, I didnt much care for these type of sites. Digg left me a little cold with too many irrelevant, repeated articles (not to mention flame wars) and Slashdot hurt my eyes.

I currently use My Yahoo! with various news feeds configured and was starting to experiment with a similar, personalized Google portal with feeds from Google News as an alternative.

Newsvine is more focussed on news with feeds from Associated Press, tagging, watchlists, personalisation, discussions and your own news column (articles, links). The interface is well thought out, fast, easy to use and I quickly found myself exploring many different topics of interest.

Newsvine is yet another of those Web 2.0 applications currently in beta and needs an invitation to sign up. I anticipate Newsvine supplementing rather than replacing GreatNews as my RSS reader (for personal and technical blogs) but it is certainly a very impressive and interesting application.

I stumbled across Newsvine on this blog and the author was kind enough to issue me with an invite. If he has none left, I also have some Newsvine invites available if you are interested in trying it out.