Page 1 of 1

SearchString.matchWord to UTF16(Chinese characters)

PostPosted: Fri Sep 20, 2013 6:23 am
by samsam598
Times again I did a small practice program on SearchString,it works fine.But till recent I found it does not work as expected to Chinese characters when the matchWord's set or not.

Given below program,matchWord works fine when the source string and the string been found are English characters.But when I tested with Chinese characters,say ,searching "文" in "中文的文字是象形字",`matchWord`(匹配整个单词) makes big difference but not expected result.

Don't know whether this behavior's designed as expected.

Code: Select all
 
import "ecere"
//import "StringsBox"
//namespace gui::controls;
class gui::controls::Form1 : Window
{
   caption = "Form1";
   background = activeBorder;
   borderStyle = sizable;
   hasMaximize = true;
   hasMinimize = true;
   hasClose = true;
   size = { 496, 324 };
   anchor = { horz = -35, vert = 31 };
 
   bool matchWord;
   bool matchCase;
 
   Button chkMatchWord
   {
      this, caption = "匹配整个单词", background = white, size = { 144, 15 }, position = { 24, 200 }, isCheckbox = true;
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         matchWord=button.checked;   
         return true;
      }
 
      bool OnStateChange(WindowState state, Modifiers mods)
      {
         //
         return true;
      }
   };
   Button rdxIcmp
   {
      this, caption = "忽略大小写", size = { 88, 15 }, position = { 24, 144 }, isRadio = true;
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         matchCase=!button.checked; 
         return true;
      }
 
      bool OnStateChange(WindowState state, Modifiers mods)
      {
         //
         return true;
      }
   };
   Button rdxCmp
   {
      this, caption = "区分大小写", size = { 86, 15 }, position = { 120, 144 }, isRadio = true;
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         matchCase=button.checked;
         return true;
      }
 
      bool OnStateChange(WindowState state, Modifiers mods)
      {
 
         return true;
      }
   };
   Label label3 { this, caption = "待查找字符串", size = { 148, 13 }, position = { 16, 80 } };
   Label label2 { this, caption = "源字符串:", size = { 84, 13 }, position = { 16, 24 } };
 
   //bool matchWord;//=false;
   //bool ignoreCase;//=true;
 
 
   //StringsBox box{};
   EditBox editBox1 { this, caption = "editBox1", size = { 254, 219 }, position = { 208, 16 } };
   Button button1
   {
      this, caption = "(O)操作字符串", altO, size = { 154, 21 }, position = { 40, 256 };
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         String source=this.editBox3.contents;//CopyString("This is a long string");
         String word=this.editBox2.contents;
         int index=-1;
         char* result=SearchString(source,0,word,matchCase,matchWord) ;
 
 
         if(result)
         {
 
            char idxStr[256];
            index=result-source; 
            sprintf(idxStr,"Found at index of %d",index);
            label1.text=idxStr;//itoa(index);
 
         }
         else
         {
            label1.text="Not found!";
         }
 
         return true;
      }
   };
   EditBox editBox3 { this, caption = "editBox3", size = { 174, 19 }, position = { 16, 48 }, contents = "This is a very long string" };
   Label label1 { this, caption = "<=单击开始查找:", size = { 180, 21 }, position = { 208, 256 } };
   EditBox editBox2 { this, caption = "editBox2", size = { 166, 19 }, position = { 16, 104 }, contents = "is" };
   Button button2
   {
      this, caption = "(X)退出", altX, isDefault = true, position = { 400, 256 };
 
      bool NotifyClicked(Button button, int x, int y, Modifiers mods)
      {
         Destroy(0);
         return true;
      }
   };
 
   bool OnCreate(void)
   {
 
      return true;
   }
 
   bool OnPostCreate(void)
   {
 
      this.rdxCmp.checked=true;
      this.chkMatchWord.checked=true;
      matchWord=this.chkMatchWord.checked;
      matchCase=this.rdxCmp.checked;
      return true;
   }
}
 
Form1 form1 {};
 
 

Re: SearchString.matchWord to UTF16(Chinese characters)

PostPosted: Fri Sep 20, 2013 11:34 am
by jerome
Hi Sam,

SearchString is expecting ASCII characters. I'm guessing you mean UTF-8 here, as that is the standard encoding in eC source files and Ecere APIs. SearchString should probably handle UTF8, so could you please file a Mantis issue for it?

I think all that needs to be done is to replace the definition of the IS_ALUNDER macro in String.ec for:

Code: Select all
#define IS_ALUNDER(ch) (CharMatchCategories((ch), letters|numbers|marks|connector))


Regards,

Jerome

Re: SearchString.matchWord to UTF16(Chinese characters)

PostPosted: Sat Sep 21, 2013 12:09 am
by samsam598
Sorry I can't reproduce the issue with current SDK.Will check again to see whether the issue disappears.

Thanks for the help.