Tuesday, 22 January 2013

ORA-12899 and AS/400 to Oracle conversions

AS400 data conversion problems are killing me at the moment.  Well not literally, but they are very annoying.  If EBCIDIC was more like ASCII, I think that many of my problems would go away.

I have a site that is moving from AS/400 to Oracle, I have no idea how some of their data has made it into the base AS400 tables, but it's there.  There are a lot of crazy characters (tab, 'ç¢éDz, etc) in the tables.  So when I do a copy to oracle I get the old ORA-12899.  This is quite a pain, it only occurs when the field on the 400 is full (when there is 30 chars) and one of them is a special char.  It seems that the single byte ascii code does not exist for the char, so when it gents banged into a CHAR field in oracle it spills into two characters.

When you do a length(field) of the column, you see 31, yet when you do a length(field) you get 30.  So it takes more bytes to represent the string.  That is a pain!  of course I've tried various combinations of setting blank trimming, but alas none of this works for my situation.


SQL> sho parameter blank_trimming ;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
blank_trimming                       boolean     TRUE

Now, I'm thinking that there might be something with the CHARACTERSET values, but then again... I'm moving to unicode eventually - I'm not there yet.

SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET' ;
SELECT value$ FROM sys.props$ WHERE name = 'NLS_NCHAR_CHARACTERSET' ;

AL32UTF8  *this is not supported
UTF8  *this is not support

Great!  Thanks for the top database provisioning!  This is my problem, but I'm looking at a different solution that complete database rebuild.

I appreciate that I should have UTF16 (or WTF16!) for the NLS_CHARACTERSET.  This would probably sort everything out, and why you ask?

AL32UTF8 is a varying width characterset, which means that the code for a character can be 1, 2, 3, or 4 bytes long. 

So, I have some of my characters that are being represented by multiple bytes.  What can I do... 

This is a temporary problem for me, as I want to convert to unicode asap, so I don't really feel the need for changing the NLS_CHARACTERSET.

I'm going to cleanse the data in oracle - how might you ask?  Easy, I tell you...  but it's quite a bit of work.

Firsty, I used R98403 to copy all my data from AS/400 to oracle.  I was then able to read the reports or reconcile the row counts and know which tables had issues with ORA-12899 - about 12.

F0006 for example, this is how to cleanse out the crappy characters:

--segregate the problem data only
create table TESTDTA.F0006broke as (
select *
 from crpdta.f0006@cms t1 
WHERE not exists
(select 1
from testdta.f0006 t2
where t1.mcmcu = t2.mcmcu
));
--find what is wrong with the data
insert into testdta.f0006 select * from testdta.f0006broke;
--review data
select lengthb(mcdl01), length(mcdl01), translate(mcdl01,'ç¢éDz'||chr(5)||chr(13)||chr(10), 'cceC2   '), 
lengthb(translate(mcdl01,'ç¢éDz'||chr(5)||chr(13)||chr(10), 'cceC2 ')) FROM testdta.f0006broke ;
--fix data
update testdta.f0006broke set mcdl01 = translate(mcdl01,'ç¢éDz'||chr(5)||chr(13)||chr(10), 'cceC2   ') ;
commit;
--insert data
insert into testdta.f0006 select * from testdta.f0006broke;
commit ;

that was not too hard now was it?

You will also note that the NCHAR_CHARACTERCODESET is also wrong, it must be The EnterpriseOne Unicode data must be stored in UTF16, not UTF8.

Doh, might be back to the drawing board for this one!


2 comments:

geetha said...

Thanks for sharing this useful info. Keep updating

same way.
Regards,Siddu Corporate Training

GCP MASTERS said...

thanks for valuable info
gcp training in hyderabad

Extending JDE to generative AI