Strings in eC

General help with the eC language.
Post Reply
fedor_bel
Posts: 21
Joined: Sun Mar 14, 2010 4:46 pm

Strings in eC

Post by fedor_bel »

Dear Jerome,

Thank you very much for your explanations.
I know the question I asked was hard to answer, but the sum up you gave was pretty clear. Now I have to learn to live (i.e. to program) with this new reality.
It is new to me, because in the recent years I have been mainly a Java programmer and almost totally forgot the memory handling habits and rules of thumb from my young c/c++ programming childhood (probably i never had none good of them anyways :)
I was a pretty bad c/c++ programmer, spending days searching for memory leaks in my programs and long hours thinking on how to write my code so that i dont have to search for memory leaks afterwards instead of focusing on the problem domain.
Java relieved me of that burden and I could just program, program, program, focusing on the task to solve.
Now I find myself again tinkering more with the memory than object oriented thinking on the main problem. That is why I opened this thread.
I want to learn to handle memory your way or the ecere way, if there is one. I studied the samples and they look beautiful: simple, effective. When I compare my code with the samples - my code looks awfully heavy and completely different in style and I dont like it. Maybe it is because I am still thinking the Java way and it does not pass to ecere, which offers a slightly different, maybe better style, which I still can not figure out, grasp it for use in my programs.
For example:
1. I totally miss the String class from Java and I am very astonished that such an important and the most, most, most used class is not yet implemented in ecere. After some thinking and looking through samples I have got an idea that maybe there are reasons for that? My suspision is that, when programming ecere way, there really is not much need for it? Direct access to char buffer, some string handling functions and memory handling rules of thumb - are all a good ecere programmer needs? When it is totally clear with the char* access and string handling functions, the memory handling rules cause a problem for me - because I lack them and have to reinvent the wheel every time, instead of learning.
For example:
1.1 when some function returns a string (a char* ) - the string should be allocated on the heap - how to make sure it is freed when it is no longer needed.
1.2 I really hate to use char [] arrays of fixed length - you never know what size of string may pass through it, it is not safe to assume that no buffer overflow will occur. that is why it is a bit akward to use the "c" string functions, because almost none of them will increase the buffer size if needed.
I feel really nervous at this point, remembering the long hours I spent tinkering with the simplest string handling and comparing that with the lightning fast and baby-easy use of strings in Java.
I think I will stop now. It should be enough to begin with. Sorry for the long and wheening question. Just used my chance to complain :=)

Cheers,
Fedor
jerome
Site Admin
Posts: 608
Joined: Sat Jan 16, 2010 11:16 pm

Re: Strings in eC

Post by jerome »

The short answer to 'Where is the eC string class?' is 'Somewhere in the future'.

We just haven't gotten there yet. eC having been developed to provide a better framework to organize the Ecere SDK API, which originally was in C, it was not a part of the original goals.

However we do realize how important a String class is in making the language accessible for non-C programmers, and there are a lot of advantages for long time C programmers as well to using a String class, such as the prevention of buffer overflows that you brought up.

One of the reasons it takes so long to get around to it is that, 1, there is a LOT of outstanding things to do and time is (very) limited, 2, we want to have a String solution that at the same time keeps the compatibility and fits in nicely with the C string and char buffers, while adding the safety and ease of use of easier languages. Sort of the perfect solution. We're still thinking it through in the back of our mind, you can take a look at some very rough draft of ideas for the String class here. You can find the 'SuperString' class at the end of that wiki entry there. You can also take a look at joeyadams's take at making a string class.

One day I will sit down and implement that perfect String class. In fact, it would sort of have to be a special eC datatype, hence the added difficulty.

One thing you can use for a string class is an Array<char>, a generic dynamic array container. You can set the 'minAllocSize' property to the normal case maximum width, and then expand it if you go past the initial size. It's far from a perfect solution however.

The method I'll use often, is to use a buffer which should encompass the maximum size (although there probably are many instances when that can still be overflowed in exceptional situations), and then use CopyString(buffer) to prepare the final string, which will be of the perfect length.

Some other useful string functions included in the Ecere library are the file path manipulation functions such as PathCat, GetExtension, etc., SearchString which is like a more flexible version of strstr. Of course you can also make extensive use of all the C string manipulations functions.

Generally, if a String is 'returned' by a function, it should be a newly allocated string (which needs to be freed). Functions that do not allocate new memory for a string should take an output string buffer parameter, and should ideally be accompanied by another parameter for the maximum buffer size.

Right now the 'String' data type maps directly to a C 'char *'.

As for how to ensure these strings get deleted, just follow the 'block' model as I outlined in the thread about exception handling. This is one of the reasons why I like having separate declarations and statements, everything that is allocated can clearly be seen at the top of a block, and that's what you need to free at the end of the block. Also strings can be stored in classes where they are freed in the destructor. You can also use string properties this way:

Code: Select all

class MyClass
{
   String name;
public:
   property String name
   {
      set { delete name; name = CopyString(value); }
      get { return name; }
   }
   ~MyClass()
   {
      delete name;
   }
}
Generally speaking functions taking an input string (or properties set) should never hold on to that string, and it should make its own copy if it is required past the life of the function/method.

I know eC currently does very little in terms of making your life easier dealing with strings. You're pretty much back in C programming in this regards. It is one of the roughest edges of eC right now, and probably the main reason why eC would not be suited to most beginning programmers. However, the upside (if you're willing to call it such :P) is you're kind of forced to learn how to use C style strings. Hopefully the future will bring some very developer-friendly String class to eC :D

Cheers,

Jerome
fedor_bel
Posts: 21
Joined: Sun Mar 14, 2010 4:46 pm

Re: Strings in eC

Post by fedor_bel »

Dear Jerome,

Thank you for the sincere answer. That clarifies things a bit and brings me down to earth. Ok, untill that perfect string class is available I will have to write some simple utility class and use it together with the other memory and string handling advices you gave me. That should not be that much of a problem, once I get used to the new style.

Cheers,
Fedor
sacrebleu
Posts: 27
Joined: Sun Jan 17, 2010 12:37 pm

Re: Strings in eC

Post by sacrebleu »

I wrote an advance String class about 12 months ago for this project. :lol:
samsam598
Posts: 212
Joined: Thu Apr 14, 2011 9:44 pm

Re: Strings in eC

Post by samsam598 »

I tried to compile the JString you listed above but I encountered an issue.I will paste the source code below.Given below code the compiler complained n is undeclared,why?

Code: Select all

 
class JString : struct {
private:
   struct {
      uint len;
      char *data;
      uint size;
   } n;
public: 
...
...
property char* {
      set { n.data = value; } //this invokes the data property's set function
      get { return n.data; }
   }    
 
Below is the complete source and error message during compilation.
Error Message:

Code: Select all

Default Compiler
Building project jstring using the Debug configuration...
Compiling...
jstring.c
   jstring.ec: In function 'findfirst':
   jstring.ec:73: warning: return discards qualifiers from pointer target type
   jstring.ec: In function 'findfirstnon':
   jstring.ec:81: warning: return discards qualifiers from pointer target type
   jstring.ec: In function 'findlast':
   jstring.ec:89: warning: return discards qualifiers from pointer target type
   jstring.ec: In function 'findlastnon':
   jstring.ec:97: warning: return discards qualifiers from pointer target type
   jstring.ec: In function '__ecereConstructor_JString':
   jstring.ec:161: warning: implicit declaration of function '__ecereNameSpace__ecere__com__eSystem_New'
   jstring.ec:161: warning: assignment makes pointer from integer without a cast
   jstring.ec: In function '__ecereDestructor_JString':
   jstring.ec:165: warning: implicit declaration of function '__ecereNameSpace__ecere__com__eSystem_Delete'
   jstring.ec: In function '__ecereProp_JString_Set_literal':
   jstring.ec:154: warning: implicit declaration of function '__ecereNameSpace__ecere__com__eSystem_Renew'
   jstring.ec:154: warning: assignment makes pointer from integer without a cast
   jstring.ec: In function '__ecereProp_JString_Set_size':
   jstring.ec:176: warning: assignment makes pointer from integer without a cast
   jstring.ec: In function '__ecereProp_JString_Set_char__PTR_':
   jstring.ec:189: error: 'n' undeclared (first use in this function)
   jstring.ec:189: error: (Each undeclared identifier is reported only once
   jstring.ec:189: error: for each function it appears in.)
   jstring.ec: In function '__ecereProp_JString_Set_StringLiteral':
   jstring.ec:193: error: 'literal' undeclared (first use in this function)
   jstring.ec: In function '__ecereMethod_JString_Resize':
   jstring.ec:201: warning: assignment makes pointer from integer without a cast
   jstring.ec: In function '__ecereRegisterModule_jstring':
   jstring.ec:278: warning: passing argument 6 of '__ecereNameSpace__ecere__com__eSystem_RegisterClass' from incompatible pointer type
   jstring.ec:278: warning: passing argument 7 of '__ecereNameSpace__ecere__com__eSystem_RegisterClass' from incompatible pointer type
   jstring.ec:278: warning: passing argument 6 of '__ecereNameSpace__ecere__com__eSystem_RegisterClass' from incompatible pointer type

jstring (Debug) - 4 errors, 14 warnings
                                                   

Code: Select all

 
#include <stdio.h>
#include <string.h>
 
/* Issues:
If a program accidentally inserts at an insane offset (e.g. 2000000000), it will resize
the buffer that large and fault all the memory up to there.  This could allow for very annoying
bugs and even be a security problem.  Other padding places (e.g. property len) have the same
problem.
*/
 
//NOTE:  glibc has this exact same function as a GNU extension, but it is only stable and
//  correct in glibc 2.1 or newer (I wrote my own implementation rather than copying this)
//If _GNU_SOURCE is defined, that memmem will conflict with this one.
void *memmem(const void *haystack, uint haystacklen, const void *needle, uint needlelen) {
   const byte *s=(const byte*)haystack, *p, *e;
   const byte *n;
   uint d;
   byte fbn;
   if (!needlelen--)
      return (void*)haystack;
   if (haystacklen <= needlelen)
      return null;
 
   e = (byte*)haystack+haystacklen-needlelen;
   fbn = *(byte*)needle++;
   do {
      if (*s++ != fbn)
         continue;
      p = s;
      n = needle;
      d = needlelen;
      while (d--) {
         if (*p++ == *n++)
            continue;
         goto keepgoing;
      }
      return (void*)(s-1);
   keepgoing:
      ;
   } while (s<e);
   return null;
}
 
//CharFlags is a collection of characters that are to be returned (or not returned) by the find* functions
struct CharFlags {
   property char *charsSet {
      set {
         const byte *n = (const byte*)value;
         byte val = 0;
         while (*n)
            flags[(uint)*n++] = ++val;
      }
   }
   property char * {
      set { charsSet = value; }
   }
 
   byte flags[256]; //flags[c] indicates whether character c is selected
   void Clear(void) {
      memset(flags, 0, sizeof(flags));
   }
   void Set(const char *needles, byte val) {
      const byte *n = (const byte*)needles;
      while (*n)
         flags[(uint)*n++] = val;
   }
};
 
char *findfirst(const char *haystack, uint haystacklen, CharFlags cf) {
   const byte *h = (const byte*)haystack;
   while (haystacklen--) {
      if (cf.flags[(uint)*h++])
         return h-1;
   }
   return null;
}
char *findfirstnon(const char *haystack, uint haystacklen, CharFlags cf) {
   const byte *h = (const byte*)haystack;
   while (haystacklen--) {
      if (!cf.flags[(uint)*h++])
         return h-1;
   }
   return null;
}
char *findlast(const char *haystack, uint haystacklen, CharFlags cf) {
   const byte *h = (const byte*)haystack + haystacklen;
   while (haystacklen--) {
      if (cf.flags[(uint)*--h])
         return h;
   }
   return null;
}
char *findlastnon(const char *haystack, uint haystacklen, CharFlags cf) {
   const byte *h = (const byte*)haystack + haystacklen;
   while (haystacklen--) {
      if (!cf.flags[(uint)*--h])
         return h;
   }
   return null;
}
 
struct StringLiteral {
   property char* str {
      set {
         len = strlen(value);
         data = value;
      }
      get { return data; }
   }
   property JString {
      set {
         len = value.len;
         data = value.data;
      }
      get { return JString {literal = this}; }
   }
   property char* {
      set {
         len = strlen(value);
         data = value;
      }
      get { return data; }
   }
   uint len;
   char *data;
};
 
class JString : struct {
private:
   struct {
      uint len;
      char *data;
      uint size;
   } n;
public:
//I would like property StringLiteral literal to come first so all JStrings created with
//instantiation syntax will know the size immediately rather than have to calculate it with
//strlen.  Saying JString str {"Hello"} didn't work because of the problem using
//property char* set with that makes JString str = "Hello"; not work either.
   property char* data {
      get { return n.data; }
      set {
         literal = StringLiteral {len=strlen(value), data=value};
      }
   }
   property StringLiteral literal {
      get {
         value = StringLiteral {len=n.len, data=n.data};
      }
      set {
         n.len = value.len;
         if (n.len > n.size) {
            n.size = n.len;
            n.data = renew n.data char[n.size+1];
         }
         memcpy(n.data, value.data, n.len);
         n.data[n.len] = 0;
      }
   }
   JString() {
      n.data = new char[1];
      n.data[0] = 0;
   }
   ~JString() {
      delete data;
   }
 
   property uint size {
      get { return n.size; }
      set {
         if (n.len > value) {
            n.len = value;
            n.data[value] = 0;
         }
         n.size = value;
         n.data = renew n.data char[n.size+1];
      }
   }
   property uint len {
      get { return n.len; }
      set { //changes len with padding
         uint oldlen = n.len;
         Resize(value);
         if (value > oldlen)
            memset(n.data+oldlen, ' ', value-oldlen);
      }
   }
   property char* {
      set { n.data = value; } //this invokes the data property's set function
      get { return n.data; }
   }
   property StringLiteral {
      set { literal = value; }  //this invokes the literal property's set function
      get { value = StringLiteral {len = n.len, data = n.data}; }
   }
 
   //changes len without padding
   void Resize(uint newlen) {
      if (newlen > n.size) {
         n.size = newlen;
         n.data = renew n.data char[newlen+1];
      }
      n.data[newlen] = 0;
      n.len = newlen;
   }
   char *PrependGap(uint alen) {
      uint oldlen = n.len;
      Resize(n.len + alen);
      memmove(n.data+alen, n.data, oldlen+1);
      return n.data;
   }
   char *InsertGap(uint offset, uint alen) {
      uint oldlen = n.len;
      if (offset >= oldlen) {
         Resize(offset+alen);
         memset(n.data+oldlen, ' ', offset-oldlen);
      } else {
         Resize(oldlen+alen);
         memmove(n.data+offset+alen, n.data+offset, oldlen-offset);
      }
      return n.data+offset;
   }
 
   void Append(StringLiteral a) {
      uint oldlen = n.len;
      Resize(n.len + a.len);
      memcpy(n.data+oldlen, a.data, a.len);
   }
   void Prepend(StringLiteral a) {
      memcpy(PrependGap(a.len), a.data, a.len);
   }
   void Insert(StringLiteral a, uint offset) {
      memcpy(InsertGap(offset, a.len), a.data, a.len);
   }
   void Delete(uint offset, uint dlen) {
      if (offset >= n.len)
         return;
      if (dlen > n.len-offset)
         dlen = n.len-offset;
      memmove(n.data+offset, n.data+offset+dlen, n.len-offset-dlen);
      Resize(n.len - dlen);
   }
 
   char *FindStr(uint start, StringLiteral needle) {
      if (start > n.len)
         return null;
      return (char*)memmem(n.data+start, n.len-start, needle.data, needle.len);
   }
 
   //the n.len+1 in the following character-finding functions allows the zero terminator to be found also
   char *FindChar(uint start, char c) {
      if (start > n.len+1)
         return null;
      return (char*)memchr(n.data+start, c, n.len+1-start);
   }
   char *FindFirst(uint start, CharFlags cf) {
      if (start > n.len+1)
         return null;
      return findfirst(n.data+start, n.len+1-start, cf);
   }
   char *FindFirstNon(uint start, CharFlags cf) {
      if (start > n.len+1)
         return null;
      return findfirstnon(n.data+start, n.len+1-start, cf);
   }
   char *FindLast(uint end, CharFlags cf) {
      if (end > n.len+1)
         end = n.len+1;
      return findlast(n.data, end, cf);
   }
   char *FindLastNon(uint end, CharFlags cf) {
      if (end > n.len+1)
         end = n.len+1;
      return findlastnon(n.data, end, cf);
   }
};
 
class Main : Application {
   void Main() {
      //JString str = "Hello deleteme";
      //JString awesome = "awesome ";
         //the char* set property isn't working correctly here, so I had to do what is below instead
 
      JString str {"Hello deleteme"};
      JString awesome {"awesome "};
         //Moreover, I had to type this in an external editor because the IDE
         //  crashes if you type these lines or if you try to save after carefully
         //  typing around the problem.
 
      //puts(str);
         //this generates puts(__ecereProp_JString_Get_char__PTR_(&str)); instead of puts(__ecereProp_JString_Get_char__PTR_(str));
      puts(str.data);
 
      //str.Append(" world!");
      //str.Insert(" friggin", 5);
      str.Append({" world!"});
      str.Insert({" friggin"}, 5);
      str.Delete(14, 9);
      //str.Insert(awesome, 14);
      printf("\"%s\"\n", str.data);
      {
         //const char *loc = str.FindStr(0, "frig");
         //    This tries to convert "frig" to JString before converting it to StringLiteral.  Moreover, it runs into the char* set property problem.
         const char *loc = str.FindStr(0, {"frig"});
         if (loc)
            printf("\"frig\" found at %u\n", (uint)(loc-str));
      }
      {
         //const char *v = str.FindFirst(0, "AEIOUaeiou");
         //    This does no conversion at all.  It should invoke the property char* set function of struct CharFlags instead of passing the pointer directly
         const char *v = str.FindFirst(0, {"AEIOUaeiou"});
         if (v)
            printf("First vowel:  %c at %u\n", *v, (uint)(v-str));
         //v = str.FindLast(str.len+1, "AEIOUaeiou");
         v = str.FindLast(str.len+1, {"AEIOUaeiou"});
         if (v)
            printf("Last vowel:  %c at %u\n", *v, (uint)(v-str));
      }
   }
}
 
jerome
Site Admin
Posts: 608
Joined: Sat Jan 16, 2010 11:16 pm

Re: Strings in eC

Post by jerome »

Hi Sam,

I'm not sure whether that code worked at the time or not...
But Joey uses conversion properties there, and to convert a char * into a class, a new JString object actually needs to be allocated, like this:

Code: Select all

 
   property char* {
      set { return { n.data = value }; } //this invokes the data property's set function
      get { return n.data; }
   }
   property StringLiteral {
      set { return { literal = value }; }  //this invokes the literal property's set function
      get { value = StringLiteral {len = n.len, data = n.data}; }
   }
I'm not sure whether this will all work, and whether the memory will be properly kept track of, but that compiles :)

Regards,

Jerome
Post Reply