― I know my name is dave, Friday, 2 March 2007 23:05 (eighteen years ago)
― Moodles, Friday, 2 March 2007 23:19 (eighteen years ago)
― I know my name is dave, Friday, 2 March 2007 23:20 (eighteen years ago)
― libcrypt, Friday, 2 March 2007 23:25 (eighteen years ago)
― libcrypt, Friday, 2 March 2007 23:27 (eighteen years ago)
― Jeff Wright, Saturday, 3 March 2007 04:09 (eighteen years ago)
― mh, Saturday, 3 March 2007 04:14 (eighteen years ago)
― libcrypt, Saturday, 3 March 2007 04:37 (eighteen years ago)
― Jaq, Saturday, 3 March 2007 04:47 (eighteen years ago)
― onimo, Saturday, 3 March 2007 10:17 (eighteen years ago)
― I know my name is dave, Saturday, 3 March 2007 12:54 (eighteen years ago)
― libcrypt, Saturday, 3 March 2007 18:25 (eighteen years ago)
― libcrypt, Saturday, 3 March 2007 19:14 (eighteen years ago)
― I know my name is dave, Saturday, 3 March 2007 21:28 (eighteen years ago)
― Aimless, Saturday, 3 March 2007 21:50 (eighteen years ago)
― libcrypt, Saturday, 3 March 2007 22:01 (eighteen years ago)
― I know my name is dave, Sunday, 4 March 2007 15:56 (eighteen years ago)
― libcrypt, Sunday, 4 March 2007 22:23 (eighteen years ago)
― JW, Sunday, 4 March 2007 22:42 (eighteen years ago)
lol. thread title not promising.
i have a query that searches for text that may contain foreign characters, so i need to use unistr like this:
select * from table where text = unistr('word');
now, 'word' is generally something in english like 'bowie' but sometimes it's something in chinese like '\7b2c\4e00\5929' but using unistr seems to stop it from utilising the index on text column and makes it run 500 times slower (33s vs 62ms). is there anything i can do to stop this?
(caveat: i can't change the query without changing the code (which i will also do, just can't yet because it'll need another code release). so changes to database schema / indexes are preferred)
― koogs, Tuesday, 11 September 2007 17:34 (eighteen years ago)
What's the charset of the database? If all yr ducks are in a row there, you might try precalculating unistr(column) and indexing it, depending on how it'll be used.
― libcrypt, Tuesday, 11 September 2007 19:15 (eighteen years ago)
See also function-based indices:
CREATE INDEX idx ON table (unistr(column))
― libcrypt, Tuesday, 11 September 2007 19:24 (eighteen years ago)
yeah, thanks, but i tried that, no dice.
in the above 'text' is the column name, 'word' is the user input and procedural programmers like myself figure that unistr('word') would be calculated once then looked up, via index and, therefore, lightning quick.
this, i guess, is why dba's exist.
database is utf8 iirc. not entirely sure how to find this out.
select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
AL32UTF8
select value from nls_database_parameters where parameter = 'NLS_NCHAR_CHARACTERSET';
AL16UTF16
― koogs, Wednesday, 12 September 2007 08:31 (eighteen years ago)
Is the text column NCHAR or NVARCHAR? Is \7b2c\4e00\5929 a valid UCS-2 representation? Have you compared EXPLAIN PLANs of good and bad queries?
― libcrypt, Wednesday, 12 September 2007 13:13 (eighteen years ago)
FUCK YOU, CEEFAX
― *rumpie*, Wednesday, 12 September 2007 14:53 (eighteen years ago)
text is a varchar2 (256 char) column.
\7b2c etc is valid whatever, in that you can insert it and retrieve it and display it and it looks like it should, don't know about ucs-2.
explain plan just shows, as mentioned above, that it stops using the index when unistr() is used and does a full table lookup of millions of rows instead.
have just stumbled onto the fact that changing the '=' to a 'LIKE' means it finds the same results in 1 second rather than 33. which is counterintuitive.
― koogs, Thursday, 13 September 2007 12:01 (eighteen years ago)
Unistr converts UCS-2 into the national character set, which is used in columns of type NCHAR, NVARCHAR, and so on. Your national character set is an Oracle variation on UTF16, whereas your database character set is the same, but UTF8. The text column, being VARCHAR, is UTF8, so you'd think that converting a UCS-2 string into UTF16 via unistr() would usually fail.
If it's not failing, then I don't get how unistr() works. In any case, it might be worth doing the conversion differently or changing the NLS in the next version of the software.
― libcrypt, Thursday, 13 September 2007 17:45 (eighteen years ago)
it's flying now with the LIKE instead of the =s, although nobody can sensibly explain why it should. have slapped a few missing indices onto it too (hey, don't look at me, i didn't design the thing). today's plan is to test whether replacing a left join to a 90% empty table with a join to the same table fully populated with 0s is going to help (i say no, but that may just be reticence at having to fill the table with 3.6m rows full of nothing useful to improve a search that's now only taking 63ms anyway)
cheers libcrypt. i'm not sure i understood what you said but i appreciate the help. 8)
― koogs, Friday, 14 September 2007 08:58 (eighteen years ago)