May 272011

This post is password protected. To view it please enter your password below:


I’m trying to extract the hashtags in tweets. So, what characters can a hashtags have?

Now, what I found is that, only English letters(lower case or upper case), numbers, underscore can be a part of the hashtag… So, no vowels, no CJK characters, no dash, no comma etc.

The regular expression to match a hashtag in tweet could be: #[\w]+

What’s your finding?



TU Delft,或许6号有个面试。嗯,目前为止,最好的地方。



© 2012 TAO Ke's Blog Suffusion theme by Sayontan Sinha