Why initializer list with dangle comma is allowed in C++11? [duplicate] - c++

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?

It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.

It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};

Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.

I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.

Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.

Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.

One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.

It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")

The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error

The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.

It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?

It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.

This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.

I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.

Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense

In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.

If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

Related

Would extra comma cause any difference in the array initializer in C/C++? [duplicate]

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

c++ curly braces and comments [closed]

C++ begginer here.
I'm struggling a bit to get what's the best practice for curly braces + //comments.
I see that for functions comments above the definition provide Visual Studio inspection utility by hovering the mouse on them anywhere.
But when it comes to if statements and the sorts, I can't figure out what will be best more helpful in upcoming projects
So, between
if (condition) { // comment
do something();
}
or
// comment
if (condition) {
do something();
}
or
if (condition) // comment
{
do something();
}
or even the following one (to use the that usually useless newline)
if (condition)
{ // comment
do something();
}
there no clear "Ah this one is better because of xyz", to me yet.
Thank you for any foresight!
Cheers
This example may be handled differently depending on whether comment is applied to condition or to bock. First case typically indicates that condition may be rather complex and it would make sense to refactor it into separate variable or to separate method with proper naming so comment (if it is still necessary) will be applied to this variable or method. Second case typically indicates that you are doing something complex in that block and it would make sense to refactor block into separate method with proper naming so comment (if it is still necessary again) will be applied to this method. Notice that introduction of separate entities with proper names often completely removes need for a comment.
As for curly braces there is no common approach, you can probably encounter all kind of crazy braces placement. Some people will even defend such pluralism. I prefer to place matching braces aligned - either horizontally (that is on the same line) or vertically (that is with same indentation) when content does not fit into one line. And this rule is applied to all braces, not just to curly.
If you're working on individual/personal projects then any of the above ways will work fine. However, I personally will say that I think an if statement looks more readable if they're written out like:-
if ( a > 10 )
{
std::cout << "a is above 10" << std::endl;
}
Or for a single line:-
if ( a > 10 )
std::cout << "a is above 10" << std::endl;
Unless when working in groups or organizations, in such case rules/standards/conventions may need to be followed.

how is an “incomplete” initializer list parsed? [duplicate]

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

int a[] = {1,2,}; Weird comma allowed. Any particular reason?

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

How to find the length of an array of std::strings?

I have a few questions related to portions of my code.
The first has to do with how I find the length of an array of arrays of strings. I'm using the following as a map for a Calculus tool I'm using.
std::string dMap[][10] = {{"x", "1"}, {"log(x)", "1/x"}, {"e^x", "e^x"}};
I'm wondering how to do the equivalent of
int arr[] = {1, 69, 2};
int arrlen = sizeof(arr)/sizeof(int);
with an array of elements of type std::string. Also, is there a better way of storing symbolic representations of (f(x), f'(x)) pairs? I'm trying to not use C++11.
My next question has to do with a procedure I wrote that isn't working. Here it is:
std::string CalculusWizard::composeFunction(const std::string & fx, const char & x, const std::string & gx)
{
/* Return fx compose gx, i.e. return a string that is gx with every instance of the character x replaced
by the equation gx.
E.g. fx="x^2", x="x", gx="sin(x)" ---> composeFunction(fx, x, gx) = "(sin(x))^2"
*/
std::string hx(""); // equation to return
std::string lastString("");
for (std::string::const_iterator it(fx.begin()), offend(fx.end()); it != offend; ++it)
{
if (*it == x)
{
hx += "(" + gx + ")";
lastString.erase(lastString.begin(), lastString.end());
}
else
{
lastString.push_back(*it);
}
}
return hx;
}
First of all, where's the bug in the procedure? It's not working when I test it out.
Second of all, when trying to make a string empty again, is it faster to do
lastString.erase(lastString.begin(), lastString.end());
or
lastString = "";
???
Thank you for your time.
Question 1) Understand that you can't, and really don't need to, calculate the size of a String this way. Just ask it how big it is and it will tell you.
// comparing size, length, capacity and max_size
#include <iostream>
#include <string>
int main ()
{
std::string str ("Test string");
std::cout << "size: " << str.size() << "\n";
std::cout << "length: " << str.length() << "\n";
std::cout << "capacity: " << str.capacity() << "\n";
std::cout << "max_size: " << str.max_size() << "\n";
return 0;
}
http://www.cplusplus.com/reference/string/string/capacity/
As for an array of strings, well go read this:
How to determine the size of an array of strings in C++?
Check out David Rodríguez's answer.
Question 2) The better way might be to define a FunctionPair class depending on what you're doing with them. Vector<FunctionPair> might come in handy.
If FunctionPair doesn't end up with any behavior (functions) associated with it then a struct might be enough: std::pair<std::string, std::string> could also be shoved into a vector.
You don't need a map unless your going to use one function string to look up the other.
http://www.cplusplus.com/reference/map/map/
Question 3) A little better description of what's not working would help. I notice lastString doesn't impact hx at all.
Question 4) "Second of all" Fastest is nothing to worry about at this point. Write what is easiest to look at until all the bugs are gone. "Premature optimization is the root of all evil", Donald Knuth.
Tip: Look into how the replace function might help you do the composition replacements:
http://www.cplusplus.com/reference/string/string/replace/
As the above commenter said, you shouldn't use c-style arrays even if you just want to make things 'easy'.
In reality doing things like that makes things harder.
c-style arrays aren't bounds checked. That means they are a source of bugs due to memory unsafety and can lead to all kinds of issues from segfaulting to corrupting data as you read random data from unrelated blocks of memory or even worse write to them.
#include <iostream>
int main() {
int nums[] = {1, 2, 3};
std::cout << nums[3] << std::endl;
}
.
# ./a.out
4196544
No programmer is perfect, every time you implement something like that there is a percentage chance you will be off by one in your bounds or something. Even if you are some programming god most people have to work on a team with people who aren't. In many cases no one will even notice since not every time will cause anything obvious. Memory can be randomly corrupted without causing anything to crash horribly. Until you make a totally unrelated change that causes the memory to be in a different order.
But when you do notice it will often effect something totally unrelated that you code sometime later. Given the fact that you will likely implement many such arrays in your programming lifetime you will likely make things much worse for yourself, you save yourself 10 minutes for each project but end up spending hours tracking down a bug in one.
If you really don't want C++11 then use std::vector<std::vector<std::string>>. It will use a little more memory so you might loose some performance , but most of the time when people are worried about performance they shouldn't be. Are you are calling this function 10,000 time a second? Even then you could gain more performance from threading the code or preallocating memory. Most of the time people think something has bad performance but in reality the computer is optimizing it away, or the CPU is. Is the performance from the memory allocation going to be worse than trying to find the array size every run?
This is also the case with raw pointers vs std::unique_ptr, std::shared_ptr.
If typing all those names looks like a pain, use a typedef to make it nice.
You can also look at using Boost's Array type, boost::array. Or whip up your own custom class.
That's not to say that you should never use that stuff. But you should only use it when you can justify it. The default should be the 'pure' C++ style code.
Performance (only when you have measured and see that you need it there).
C compatibility (but most of the time you can just wrap that stuff in the std classes anyway).
If you do feel you need it then. Make sure you unittest your code. And look at using the address and memory sanitizers that ship in current versions of gcc and clang. And quarantine the code as much as possible (ie in classe)s.
That all sounds like a lot of work, but once you have learned to do it, it becomes a habit and build it into your build system then it's just part of the development process. As easy as make test. And once you have it in one build system, you cut and paste it into everything else you do forever. You have expanded your programmers toolkit. That's all good habits to form even if you don't do that.
But here's the actual answer to your array size question:
std::string arr[][10] = {
{"xxx", "111"},
{"y", "222"},
{"hello", "goodbye"},
{"I like candy", "mmmm"},
{"Math goes here", "this is math"},
{"More random stuff", "adsfdsfasf"},
};
int size = sizeof(arr) / 10 / sizeof(std::string);
std::cout << size << endl; // Prints 6, as in 6 pairs of strings
Since the semantics is similar as Map ( you are mapping a function to it's differential), I guess most suitable data structure would be std::map, when you can easily get the differential using the function as index.
About the function, you are not appending lastString.
return hx+lastString;
Question 1 is actually quite straightforward:
std::string dMap[][10] = {{"x", "1"}, {"log(x)", "1/x"}, {"e^x", "e^x"}};
size_t tupleCount = sizeof(dMap)/sizeof(dMap[0]);
size_t maxTupleSize = sizeof(dMap[0])/sizeof(dMap[0][0]);
assert(tupleCount == 3);
assert(maxTupleSize == 10);
Note that you won't get the actual count of strings in a tuple this way. You only get the amount of std::strings that can fit into each tuple. Of course, you can search your tuples for the first default constructed std::string it contains. But the entire setup is an invitation for bugs, so you don't want to use it anyways (see below).
Question 2 can also be answered quite clearly. You should be using an std::unordered_map<>. Why?
You usecase is to map strings of one class to another. That is the semantics of either std::map<> or std::unordered_map<>.
From your question I gather that you don't need a notion of a next or previous mapping, your mapping pairs are essentially unrelated. In this case, std::unordered_map<> is simply faster than std::map<> because it uses a hash table internally. No matter how big your std::unordered_map<> gets, looking up its elements takes a constant amount of time. This is not true for std::map<>.

Resources