SearchString.matchWord to UTF16(Chinese characters)

General help with the eC language.
Post Reply
samsam598
Posts: 212
Joined: Thu Apr 14, 2011 9:44 pm

SearchString.matchWord to UTF16(Chinese characters)

Post by samsam598 »

Times again I did a small practice program on SearchString,it works fine.But till recent I found it does not work as expected to Chinese characters when the matchWord's set or not.

Given below program,matchWord works fine when the source string and the string been found are English characters.But when I tested with Chinese characters,say ,searching "文" in "中文的文字是象形字",`matchWord`(匹配整个单词) makes big difference but not expected result.

Don't know whether this behavior's designed as expected.

Code: Select all

 
import "ecere"
//import "StringsBox"
//namespace gui::controls;
class gui::controls::Form1 : Window
{
   caption = "Form1";
   background = activeBorder;
   borderStyle = sizable;
   hasMaximize = true;
   hasMinimize = true;
   hasClose = true;
   size = { 496, 324 };
   anchor = { horz = -35, vert = 31 };
 
   bool matchWord;
   bool matchCase;
 
   Button chkMatchWord 
   {
      this, caption = "匹配整个单词", background = white, size = { 144, 15 }, position = { 24, 200 }, isCheckbox = true;
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         matchWord=button.checked;   
         return true;
      }
 
      bool OnStateChange(WindowState state, Modifiers mods)
      {
         //
         return true;
      }
   };
   Button rdxIcmp 
   {
      this, caption = "忽略大小写", size = { 88, 15 }, position = { 24, 144 }, isRadio = true;
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         matchCase=!button.checked;  
         return true;
      }
 
      bool OnStateChange(WindowState state, Modifiers mods)
      {
         //
         return true;
      }
   };
   Button rdxCmp 
   {
      this, caption = "区分大小写", size = { 86, 15 }, position = { 120, 144 }, isRadio = true;
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         matchCase=button.checked; 
         return true;
      }
 
      bool OnStateChange(WindowState state, Modifiers mods)
      {
 
         return true;
      }
   };
   Label label3 { this, caption = "待查找字符串", size = { 148, 13 }, position = { 16, 80 } };
   Label label2 { this, caption = "源字符串:", size = { 84, 13 }, position = { 16, 24 } };
 
   //bool matchWord;//=false;
   //bool ignoreCase;//=true;
 
 
   //StringsBox box{};
   EditBox editBox1 { this, caption = "editBox1", size = { 254, 219 }, position = { 208, 16 } };
   Button button1 
   {
      this, caption = "(O)操作字符串", altO, size = { 154, 21 }, position = { 40, 256 };
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         String source=this.editBox3.contents;//CopyString("This is a long string");
         String word=this.editBox2.contents;
         int index=-1; 
         char* result=SearchString(source,0,word,matchCase,matchWord) ;
 
 
         if(result)
         {
 
            char idxStr[256];
            index=result-source;  
            sprintf(idxStr,"Found at index of %d",index);
            label1.text=idxStr;//itoa(index);
 
         }
         else
         {
            label1.text="Not found!";
         }
 
         return true;
      }
   };
   EditBox editBox3 { this, caption = "editBox3", size = { 174, 19 }, position = { 16, 48 }, contents = "This is a very long string" };
   Label label1 { this, caption = "<=单击开始查找:", size = { 180, 21 }, position = { 208, 256 } };
   EditBox editBox2 { this, caption = "editBox2", size = { 166, 19 }, position = { 16, 104 }, contents = "is" };
   Button button2
   {
      this, caption = "(X)退出", altX, isDefault = true, position = { 400, 256 };
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         Destroy(0);
         return true;
      }
   };
 
   bool OnCreate(void)
   {
 
      return true;
   }
 
   bool OnPostCreate(void)
   {
 
      this.rdxCmp.checked=true;
      this.chkMatchWord.checked=true;
      matchWord=this.chkMatchWord.checked;
      matchCase=this.rdxCmp.checked; 
      return true;
   }
}
 
Form1 form1 {};
 
 
jerome
Site Admin
Posts: 608
Joined: Sat Jan 16, 2010 11:16 pm

Re: SearchString.matchWord to UTF16(Chinese characters)

Post by jerome »

Hi Sam,

SearchString is expecting ASCII characters. I'm guessing you mean UTF-8 here, as that is the standard encoding in eC source files and Ecere APIs. SearchString should probably handle UTF8, so could you please file a Mantis issue for it?

I think all that needs to be done is to replace the definition of the IS_ALUNDER macro in String.ec for:

Code: Select all

#define IS_ALUNDER(ch) (CharMatchCategories((ch), letters|numbers|marks|connector))
Regards,

Jerome
samsam598
Posts: 212
Joined: Thu Apr 14, 2011 9:44 pm

Re: SearchString.matchWord to UTF16(Chinese characters)

Post by samsam598 »

Sorry I can't reproduce the issue with current SDK.Will check again to see whether the issue disappears.

Thanks for the help.
Post Reply