[pygtk] Problem in fetching Unicode from URL and displaying it in PyGTK widget

Walter Leibbrandt walter at translate.org.za
Fri Jul 17 23:19:53 WST 2009


Bertrand Kintanar wrote:
> On 7/17/09 9:19 PM, saeed wrote:
>> s1 = 'Guzán'
>> s2 = ''
>> n = len(s1)
>> i = 0
>> while i<n:
>>    if i<n-6:
>>      if s1[i:i+3]=='&#x' and s1[i+5]==';':
>>        s2 += unichr(int(s1[i+3:i+5], 16)).encode('utf-8')
>>        i += 6
>>        continue
>>    s2 += s1[i]
>>    i += 1
>> print s2
> Now this fixes it all. Thanks alot. I hope there is some sexier way to 
> do this though. but this will work. thanks again
import re
htmluni = re.compile(r'&#x([\dA-Fa-f]+);')
data = 'Guz&#xE1;n   Guz&#xE1;n'

match = htmluni.search(data)
while match:
     data = data[:match.start()] + unichr(int(match.group(1), 16)) + 
data[match.end():]
     match = htmluni.search(data)

-- 
Walter Leibbrandt                  Software Developer
Recent blogs:
* Conquering the CellRendererWidget
http://www.translate.org.za/blogs/walter/en/content/conquering-cellrendererwidget



More information about the pygtk mailing list