I’ve been spending a bit of time the last couple of days trying to figure out exactly how regular expressions work. I’ve got to use one in an application I’m working on and I’m just not getting the syntax organization at all.
Although you may think it’s gibberish, the text above is the regular expression I’m using for the solution to my earlier problem. Doesn’t look very "regular" does it? Aside from the fact it’s using characters I’m familiar with, the order and meaning behind them might as well be hieroglyphics to me. So, what is a "regular expression"?
The Wiktionary website defines it as follows:
A concise description of a regular formal language with notations for concatenation, alternation, and iteration (repetition) of subexpressions.
My task for my application was to exclude any .zip files from being able to be uploaded to the server. I couldn’t ever find a regular expression that excluded .zip files, but I did find the one above that basically allowed a list of other files.
Here’s the expression above again:
^.+\.(([jJ][pP][eE]?[gG])|([gG][iI][fF])|([pP][dD][fF])|([dD][oO][cC])|([dD][oO][cC][xX])|([bB][mM][pP])|([tT][xX][tT]))$
The "regular expression", or "regex" above basically looks for any .jpg, .jpeg, .gif, .pdf, .bmp, .doc, .docx, or .txt file and allows it to be uploaded. The upper and lower case version of the letter within brackets specifies that the file extension could be typed either way.
I’m still not real sure what all the other symbols are really specifying in there. I’ve got more to learn for sure.