Improve the ability to search chat history for Asian regional languages, such as Chinese and Japanese. Telegram's chat history search function is based on words, and is suitable for languages such as English and Russian that are separated by spaces. But for languages that do not use spaces for word segmentation, you need to match the entire sentence to get the search results, which makes the search function of some languages almost unusable. I hope Telegram can improve this feature so that it can search languages that do not use spaces to separate words.
I am not entirely sure if this problem is related to space usage. I've had issues while searching in Korean (which utilizes spaces), as well as Chinese and Japanese. The problem might be double-byte character set.
In fact, all languages have this problem, even numbers (see here: https://github.com/telegramdesktop/tdesktop/issues/7096) because telegram does not support partial searches. In English situation ,this problem is not obvious because there are spaces between each word (if you search part of the word, it remains unsearchable). However, Chinese, Korean, and Japanese are each paragraph as a partition, therefore, making the search almost unusable
Eana Hufwe
Basic word segmentation based on spaces and mere stemming for only English is definitely not enough for a globalizing platform like Telegram! We need to see this happen!
D
Deleted Account
Even if we can use any bot like @Findinchannelbot to do this, it is so difficult.
Ah, you are right... Lack of partial search is the problem. This explains why searching experience was a little better for Korean (uses spaces, but BE verb is expressed through "conjugation" of the noun involved) than it was for Chinese or Japanese (does not use spaces). It also explains why I didn't notice this problem with English, as I usually need to look up nouns.
Languages with deliminators other than spaces need better search support! Currently unrecognized deliminators include Arabic comma, full size comma and full size period as in CJK languages, where spaces are NEVER used to divide words. When deliminators are unrecognized, searching the whole sentence is the only way to find anything useful.
An alternative is setting up an ELK instance, piping all messages into it, and do CJK keyword search in elastic search, but hey, that's a mouthful.
𝓢𝓮𝓴𝓲𝓑𝓮𝓽𝓾
yes, i've been using # to tag the content for years, searching the history is so useless for me
Log in here to report bugs or suggest features. Please enter your phone number in the international format and we will send a confirmation message to your account via Telegram.
Thanks for this insight.
An alternative is setting up an ELK instance, piping all messages into it, and do CJK keyword search in elastic search, but hey, that's a mouthful.