Search Oracle Blogs

List of Blogs maintained by Paweł Barut.


Saturday, December 30, 2006

Security: getting DBA rights quite easy

I've stepped today at ORASEC blog by Paul Wright. I've found this issue a big problem. Anybody that is using Oracle client on windows can quite easy get DBA rights by editing client9.dll file. I've to check if this is still problem with latest Oracle 10.2.0.3 release.

Cheers, Paweł

Thursday, December 21, 2006

Applying Oracle 10.2.0.3 patch

Today I've installed patch 10.2.0.3 for Oracle on Windows on top of 10.2.0.2 database. I've noticed that my spool file generated by upgrade script was 33 MB big. It occurred that source of all oracle build-in packages were spooled. It was terrible experience to search for errors in this log file, as searching for 'ORA-' give hundreds results and I had to figure out if it is in comment of package, or real error.
Fortunately patch was installed successfully and everything runs fine.

Cheers, Paweł

Thursday, December 07, 2006

UTL_MATCH - String Similarity in Oracle

You can compare string using equality operator (=), or using similarity operator (LIKE). But there are cases where it is not enough. You Can use UTL_MATCH package to calculate string similarity index. This packages offers 4 functions that take two strings as parameters:
  • edit_distance - algorithm by Levenshtein - returns number of edits that must be done to change one string into second,
  • edit_distance_similarity - normalized results of edit_distance in percents - integer values,
  • jaro_winkler - returns similarity based on Jaro-Winkler distance algorithm,
  • jaro_winkler_similarity - same as above but presented as integer in range 0-100.
Lets teke a look how it works. First we can create sample table and insert some data:
create table countries(name varchar2(15) not null);
insert into countries values ('Poland');
insert into countries values ('Germany');
insert into countries values ('United States');
insert into countries values ('Portugal');
insert into countries values ('Czech Republic');
insert into countries values ('China');
insert into countries values ('Slovakia');
insert into countries values ('Slovenia');
commit;
Now we can take a look at results. Lets compare to miss-spelled country name: 'Slovnia'
select name
,to_char(utl_match.edit_distance(name, 'Slovnia'),'999') edit_dist
,to_char(utl_match.edit_distance_similarity(name, 'Slovnia'),'999') edit_dist_sim
,to_char(utl_match.jaro_winkler(name, 'Slovnia'),'999d9999') jaro_winkler
,to_char(utl_match.jaro_winkler_similarity(name, 'Slovnia'),'999') jaro_winkler_sim
from countries
order by jaro_winkler_sim desc;

NAME EDIT EDIT JARO_WINK JARO
-------------------- ---- ---- --------- ----
Slovenia 1 88 .9750 97
Slovakia 2 75 .8881 88
China 5 29 .5619 56
United States 12 8 .5531 55
Poland 6 15 .5317 53
Portugal 7 13 .5119 51
Germany 7 0 .3571 35
Czech Republic 13 8 .0000 0
Above we can observe differences in algorithms.
Lets test it on NULLs:
SQL> select to_char(utl_match.edit_distance('test', NULL),'999')
2 edit_dist
3 ,to_char(utl_match.edit_distance_similarity('test', NULL),'999')
4 edit_dist_sim
5 ,to_char(utl_match.jaro_winkler('test', NULL),'999d9999')
6 jaro_winkler
7 ,to_char(utl_match.jaro_winkler_similarity('test', NULL),'999')
8 jaro_winkler_sim
9 from dual;

EDIT EDIT JARO_WINK JARO
---- ---- --------- ----
-1 125 .0000 0
We can see that using edit_distance on NULLs migth be dengerous.
All samples were run on Oracle 10g XE

Hope You find it usefull.
Paweł
 

Copyright © Paweł Barut
Printing from DOS to USB Printer