Python 中文包含判断及unicode

示例代码:

    # 检测是否为字母数字混合, 默认(0),字母(1),数字(2), 数字字母(3),中文(4),异常(9)
    @classmethod
    def check_domain_name_type(cls, domain_name_prefix):
        domain_name_type = 0
        try:
            # 精确判断
            if type(domain_name_prefix).__name__ == "unicode":
                print domain_name_prefix, "\t\t   unicode"
                domain_name_prefix.encode('utf-8').decode('unicode_escape')     #  u'abc中国123'
            else:
                print domain_name_prefix, "\t\t   no unicode"
                domain_name_prefix = unicode(domain_name_prefix, 'utf-8')       #  'abc中国123'
                
            # 简洁判断
            if type(domain_name_prefix).__name__ != "unicode":
                domain_name_prefix = unicode(domain_name_prefix, 'utf-8')       #  'abc中国123'
                
                
            zhPattern = re.compile(u'[\u4e00-\u9fa5]+')
            zhMatch = zhPattern.search(domain_name_prefix)
             
            if re.match('^[a-z]+$', domain_name_prefix):            # 1 - letter
                domain_name_type = 1
            elif re.match('^[0-9]+$', domain_name_prefix):          # 2 - num
                domain_name_type = 2
            elif re.match('^[0-9a-z]+$', domain_name_prefix):       # 3 - letter-num
                domain_name_type = 3
            elif zhMatch:                                           # 4 - 中文
                domain_name_type = 4
            elif "-" in domain_name_prefix:                         # 7 - 横杠(-)
                domain_name_type = 7
        except:
            domain_name_type = 9
        return domain_name_type

 

测试示例:

if __name__ == '__main__':
    print("get_local_ip: " + YGDTime.get_local_ip())
    
    testStr = 'abcxyz'
    print testStr, "\t\t", YGCommon.check_domain_name_type(testStr)
    
    testStr = '098'
    print testStr, "\t\t", YGCommon.check_domain_name_type(testStr)
    
    testStr = 'abc123'
    print testStr, "\t\t", YGCommon.check_domain_name_type(testStr)
    
    testStr = u'abc中国123'
    print testStr, "\t\t", YGCommon.check_domain_name_type(testStr)
    
    testStr = 'abc中国123'
    print testStr, "\t\t", YGCommon.check_domain_name_type(testStr)
    
    testStr = 'abc-123'
    print testStr, "\t\t", YGCommon.check_domain_name_type(testStr)

 

运行结果:

abcxyz         abcxyz            no unicode
1
098         098            no unicode
2
abc123         abc123            no unicode
3
abc中国123         abc中国123            unicode
4
abc中国123         abc中国123            no unicode
4

abc-123         abc-123            no unicode
7

 

应用实例

米扑域名: https://domain.mimvp.com

 

 

参考推荐:

Python 字符编码与解码:unicode、str、中文

Python的ASCII,GB2312,Unicode,UTF-8区别

Python编码与解码

Python utf-8和utf8的区别

Python中Base64编码和解码

Python 中正确使用 Unicode

Python 中文包含判断及unicode