Improve the ability to search chat history for Asian regional languages, such as Chinese and Japanese. Telegram's chat history search function is based on words, and is suitable for languages such as English and Russian that are separated by spaces. But for languages that do not use spaces for word segmentation, you need to match the entire sentence to get the search results, which makes the search function of some languages almost unusable. I hope Telegram can improve this feature so that it can search languages that do not use spaces to separate words.
I like how your Twitter account teases What's App, but even What''s App has functioning searching on CJK content. I suggest your Twitter account think twice about this before making another meme on What's App. Or @durov can you show some progress on this just like your new post about Stories?
д
динар курбанов
maybe it possible/worths to index and search using letter(unicode character)-based n-grams. i am not an algorithms expert, there must be many alternative algorithms.
It depends on server-side behaviour. Third-party clients always cannot index all the messages, they just do the work of showing the search result fetched from the server.
𝐂𝐥𝐨𝐯𝐞𝐫 𝐘𝐚𝐧 ⋆
I don't know how the situation will last though this card is one of the cards with most thumb-ups, but I'd like to summarise all the things you want.
Cause: Telegram's server-side search only splits messages by spaces or punctuation marks.
Solution: - Rebuild the index in the server, splited by Unicode characters => message.split("") - Or turn to client-side index and search, make the client itself able to build index for dozens of messages and index them with the device's local index strategy.
Depends on Telegram, you should take action to do this to match our money in purchasing Premium.
P
PokeGuy
Agree, I’ll stop the premium subscription by this month. No hope for this, comparing to emojis, this is much more important. However, no one cares.
M
Mike
For English, I don’t see it is a need to split the “word” for index/search. This topics is generally for ASIAN language. Not talking above improvement, is a implementation.
i mean, by letter-based n-grams, this: qwertyuiop becomes: qwert, werty, ertyu, rtyui, tyuio, yuiop. this is universal, works for any language. but seems it requires nearly 6 times more resources (ram and cpu power) for english, for example, compared to usual indexing method. because, in that method, in english, it would be something like "qwerty uiop", and it becomes qwerty, uiop - 3 times less items in index, in this example. but in letter-5-grams it would become qwert, werty, erty_, rty_u, or, maybe, spaces should be removed before indexing. (i deleted previous version of this message in order to edit it, i forgot to say i compare with usual space-word-split indexing).
Log in here to report bugs or suggest features. Please enter your phone number in the international format and we will send a confirmation message to your account via Telegram.
Cause: Telegram's server-side search only splits messages by spaces or punctuation marks.
Solution: - Rebuild the index in the server, splited by Unicode characters => message.split("")
- Or turn to client-side index and search, make the client itself able to build index for dozens of messages and index them with the device's local index strategy.
Depends on Telegram, you should take action to do this to match our money in purchasing Premium.
No hope for this, comparing to emojis, this is much more important. However, no one cares.
This topics is generally for ASIAN language.
Not talking above improvement, is a implementation.
We are not aiming for improvement for English…
Take a look at ASIAN language.