Discussion:
[SvarDOS] UTF8TOCP usage
(zu alt für eine Antwort)
Roland White
2022-04-20 15:54:57 UTC
Permalink
Hey, I found UTF8TOCP, a program which looks exactly like what I'm
chasing!

Err... ooops:

UTF8TOCP.COM 437 ANYTEXT.TXT

only /shows/ ANYTEXT.TXT converted, but does not write it (as starry-eyed
supposed #-).

However: At last, my goal is to convert e-mail messages like this...
=======================================================================
Return-Path: <***@web.de>
Received: from mout.web.de (212.227.15.4 [212.227.15.4])
by firemail.de (b1gMailServer) with ESMTPS id 18B701E9
for <***@firemail.de>; Wed, 20 Apr 2022 09:53:18 +0200 (CEST)
Received-SPF: Pass
identity=***@web.de; client-ip=212.227.15.4;
helo=mout.web.de
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de;
s=dbaedf251592; t=1650441198;

[...]

X-UI-Sender-Class: c148c8c5-30a9-4db5-a2e7-cb6cc037b8f9
Received: from localhost ([88.69.145.141]) by smtp.web.de (mrweb005
[213.165.67.108]) with ESMTPSA (Nemesis) id 1MtPre-1nursa0DJ0-00uidC for
<***@firemail.de>; Wed, 20 Apr 2022 09:53:18 +0200
From: Roland White <***@web.de>
To: <***@firemail.de>
Subject: TEST
Date: Wed, 20 Apr 2022 09:53:17 +0200
MIME-Version: 1.0
Message-ID: <04fc8792-9963-43e4-4037-***@web.de>
User-Agent: Trojita/0.7; Qt/5.12.7; xcb; Linux;
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:Zxl7Z47RsPdte/pXDZaHnvBb/ZvVzICdHm73VeihrPfIxawC5UK

[...]

=C3=84hnlich =C3=BCberflie=C3=9Fenden =C3=96lfa=C3=9F=C3=BCberl=C3=A4ufen?
R=C3=
=B6tlich?
=======================================================================

... to make it readable with PMAIL 3.50, which is suitable only for
7bit US-ASCII, AFAIS. For a clueless chap like me the line:

Content-Type: text/plain; charset=utf-8; format=flowed

appears obviously wrong to be UTF-8 for the whole message, so UTF8TOCP.COM
does no conversion at all.

Whew! - I've no idea how to make this work...

TIA for hints,

R-
--
Wieso, weshalb, warum?
Wer mich fragt bleibt dumm.
Mateusz Viste
2022-04-20 16:32:57 UTC
Permalink
Post by Roland White
UTF8TOCP.COM 437 ANYTEXT.TXT
only /shows/ ANYTEXT.TXT converted, but does not write it (as
starry-eyed supposed #-).
How about using the standard redirector?

utf8tocp 437 anytext.txt > anytext2.txt
Post by Roland White
to make it readable with PMAIL 3.50, which is suitable only for
7bit US-ASCII,
utf8tocp won't help there, as it converts UTF-8 to 8-bit codepages. Are
you sure that PMAIL handles only 7-bit ("low ASCII") characters?
Post by Roland White
Content-Type: text/plain; charset=utf-8; format=flowed
appears obviously wrong to be UTF-8 for the whole message, so
UTF8TOCP.COM does no conversion at all.
utf8tocp converts UTF-8 text to a codepage (and vice versa), it knows
nothing about header declarations. That's for you to fix manually. Or
leave it as-is, perhaps PMAIL will ignore the declaration if it assumes
a specific codepage anyway.

Mateusz
Roland White
2022-04-22 06:19:02 UTC
Permalink
Post by Mateusz Viste
Post by Roland White
UTF8TOCP.COM 437 ANYTEXT.TXT
only /shows/ ANYTEXT.TXT converted, but does not write it (as
starry-eyed supposed #-).
How about using the standard redirector?
utf8tocp 437 anytext.txt > anytext2.txt
Of course the first step. But...
anytext.txt:
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
anytext2.txt
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.

Changes only on screen, not written.
Post by Mateusz Viste
Post by Roland White
to make it readable with PMAIL 3.50, which is suitable only for
7bit US-ASCII,
utf8tocp won't help there, as it converts UTF-8 to 8-bit codepages. Are
you sure that PMAIL handles only 7-bit ("low ASCII") characters?
Mmh. No. However: Incoming E-Mails with umlauts are not really readable.
"Ölfaßüberläufe" is just a chunk of box drawings. And AFAIS there is no
option available to change.

It looks like a file PEGASUS.DE or so is compulsory in that case; cit.:
"
Pegasus Mail v3.x is fully internationalizable: all text, strings
and data structures are stored in PEGASUS.RSC and loaded at runtime.
At the time of release for 3.x, translations are under way for the
following languages: Dutch, German, Czech, French, Spanish, Finnish
and Portuguese. [...]
When international versions of Pegasus Mail are made available, they
will appear as replacements for PEGASUS.RSC; these files should be
copied into the same directory as PMAIL.EXE
"
...but probably not existing. According to

http://www.dendarii.co.uk/FAQs/pmail-addons.html#dos

things getting easier downgrading to 3.20 - wherever this version could be
found in the abyss.

R-
Mateusz Viste
2022-04-22 11:40:05 UTC
Permalink
Post by Roland White
Of course the first step. But...
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
anytext2.txt
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
Changes only on screen, not written.
You mean that utf8tocp does no data processing at all, ie. data is
left unchanged?

But anytext.txt was supposed to be UTF-8, while what you show above is
not UTF-8... What is it exactly that you are trying to achieve?
Post by Roland White
Mmh. No. However: Incoming E-Mails with umlauts are not really
readable. "Ölfaßüberläufe" is just a chunk of box drawings. And AFAIS
there is no option available to change.
Sounds simply like a codepage mismatch. Ie. you receive emails in one
codepage, and your DOS system uses a different codepage.

utf8tocp can possibly help converting the codepages, but you need to
know exactly what the source and target codepages are.

Mateusz
Roland White
2022-04-23 07:09:32 UTC
Permalink
Post by Mateusz Viste
Post by Roland White
Of course the first step. But...
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
anytext2.txt
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
Changes only on screen, not written.
You mean that utf8tocp does no data processing at all, ie. data is
left unchanged?
I mean in this example UTF8TOCP does data processing only on screen.

Detailed Elegy:

"type c:\texte\anytext.txt" shows:
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.

"c:\trullala\utf8tocp.com 437 anytext.txt >anytext2.txt" shows on screen:
Überflüssige Ölfaßüberläufe, Ähren, verölte, ätzende Faßdauben

"type c:\texte\anytext2.txt" shows afterwards:
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren
Post by Mateusz Viste
But anytext.txt was supposed to be UTF-8, while what you show above is
not UTF-8...
Therefore I wonder why it appears correctly converted on screen.
Post by Mateusz Viste
What is it exactly that you are trying to achieve?
Readable e-mails.
Post by Mateusz Viste
Post by Roland White
Mmh. No. However: Incoming E-Mails with umlauts are not really
readable. "Ölfaßüberläufe" is just a chunk of box drawings. And AFAIS
there is no option available to change.
Sounds simply like a codepage mismatch. Ie. you receive emails in one
codepage, and your DOS system uses a different codepage.
My DOS system is SvarDOS. Corresponding to autoexec.bat:
MODE CON CP PREPARE=((858) %DOSDIR%\CPI\EGA.CPX)
MODE CON CP SELECT=858
Post by Mateusz Viste
utf8tocp can possibly help converting the codepages, but you need to
know exactly what the source and target codepages are.
I expected that a file declared by automatism as UTF-8 /is/ UTF-8,
regardless if it's an e-mail body, a rubbish text file or Schiller's Lied
von der Glocke.

R-
--
Du hälst Wiederspruch im vorraus für Standart? - Also mir Macht daß
keiner weiß, ich habe Rückrat!
Mateusz Viste
2022-04-23 07:28:21 UTC
Permalink
Post by Roland White
I mean in this example UTF8TOCP does data processing only on screen.
That would be very strange, but maybe I am misunderstanding something.
Can you please send me by email the original file ("anytext.txt") so I
can look at it and try to reproduce the exact issue on my side?
mateusz - at - viste - punkt - fr

Mateusz
Mateusz Viste
2022-04-23 14:20:06 UTC
Permalink
Hello Roland,

Thanks for the file. What you have sent me is a simple UTF-8 encoded
phrase. I passed it through utf8tocp and it did work as expected.

See here: https://imgpile.com/i/5OTVhF

I'm using utf8tocp v0.9.4 under SvarDOS. My system runs under codepage
437, while yours is set to CP 858, but this shouldn't matter since
German glyphs are at the same positions in both these pages.

So I can only wonder what happens on your side...

Mateusz
Roland White
2022-04-24 06:24:11 UTC
Permalink
Post by Mateusz Viste
Hello Roland,
Thanks for the file. What you have sent me is a simple UTF-8 encoded
phrase. I passed it through utf8tocp and it did work as expected.
See here: https://imgpile.com/i/5OTVhF
I'm using utf8tocp v0.9.4 under SvarDOS. My system runs under codepage
437, while yours is set to CP 858, but this shouldn't matter since
German glyphs are at the same positions in both these pages.
So I can only wonder what happens on your side...
...waddadamf^H^HMea culpa!

As an old oafish Linux User I used nothing else but LESS.EXE for viewing.
On the other hand being unsure concerning common knowledge I wrote TYPE in
my posting instead of LESS. And yes, effects are different.

I'm really sorry!

R-
--
Du hälst Wiederspruch im vorraus für Standart? - Also mir Macht daß
keiner weiß, ich habe Rückrat!
Mateusz Viste
2022-04-24 07:55:40 UTC
Permalink
Post by Roland White
As an old oafish Linux User I used nothing else but LESS.EXE for
viewing. On the other hand being unsure concerning common knowledge I
wrote TYPE in my posting instead of LESS. And yes, effects are
different.
Ah, so that's where those weird <9A> <81> etc strings were coming from.

LESS.EXE seems indeed to be a poor viewer, as it does not display 8-bit
characters, only their hex values. I guess the lesson here is that it's
better to use native DOS tools instead of ports from some exotic
systems. ;-)

You might also give a try to FoxType, if you'd like to read UTF-8 files
directly.

Mateusz

Loading...