[pygtk] Problem in fetching Unicode from URL and displaying it in PyGTK widget

Bertrand Kintanar b3rxkintanar at gmail.com
Sat Jul 18 18:08:51 WST 2009


On 7/17/09 11:19 PM, Walter Leibbrandt wrote:
> Bertrand Kintanar wrote:
>> On 7/17/09 9:19 PM, saeed wrote:
>>> s1 = 'Guzán'
>>> s2 = ''
>>> n = len(s1)
>>> i = 0
>>> while i<n:
>>>    if i<n-6:
>>>      if s1[i:i+3]=='&#x' and s1[i+5]==';':
>>>        s2 += unichr(int(s1[i+3:i+5], 16)).encode('utf-8')
>>>        i += 6
>>>        continue
>>>    s2 += s1[i]
>>>    i += 1
>>> print s2
>> Now this fixes it all. Thanks alot. I hope there is some sexier way 
>> to do this though. but this will work. thanks again
> import re
> htmluni = re.compile(r'&#x([\dA-Fa-f]+);')
> data = 'Guz&#xE1;n   Guz&#xE1;n'
>
> match = htmluni.search(data)
> while match:
>     data = data[:match.start()] + unichr(int(match.group(1), 16)) + 
> data[match.end():]
>     match = htmluni.search(data)
>
Thanks for this Walter. I'm also using regex for my search but never 
thought of it to use it as you have in here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.daa.com.au/pipermail/pygtk/attachments/20090718/33324d69/attachment.htm 


More information about the pygtk mailing list