[pygtk] utf8 validating string
Yann Leboulanger
asterix at lagaule.org
Tue Dec 4 05:53:56 WST 2007
John Ehresman wrote:
> Yann Leboulanger wrote:
>> I'd like not to have it. But I getthis string by gpg-decodding a message
>> send by Miranda IM. I think it's a bug in their GnuPG implementation,
>> but anyway I'd like my client to detect those bad string and a) print
>> message correctly if I can or b) don't traceback and print a warning
>> message. But for that I need a function that tells me that
>> g_utf8_validate will fail ...
>
> You probably should explicitly decide how to handle \0. If it's always
> at the end, it's probably just a simple bug and can be chopped off but
> it may be something more if valid text follows the \0.
>
> But in general, I think this'll work:
>
> def valid_glib_utf8(s):
> try:
> unicode(s, 'utf-8')
> except Exception:
> return False
> else:
> return '\x0' not in s
>
> In case you need it s.replace('\x0', '') will remove the \0's.
>
> Cheers,
>
> John
>
That doesn't work:
>>> import gtk
>>> tv = gtk.TextView()
>>> b = tv.get_buffer()
>>> t = "test\x00"
>>> u = unicode(t, 'utf-8')
>>> b.set_text(t)
__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed
it's the same if I try with the unicode:
>>> import gtk
>>> tv = gtk.TextView()
>>> b = tv.get_buffer()
>>> t = "test\x00"
>>> u = unicode(t, 'utf-8')
>>> b.set_text(u)
__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed
--
Yann
More information about the pygtk
mailing list