Generate Sequences 
      
      
source code:  getvalidsequences.pl 
      
The ~1000bps are scanned for ATG position having 100 bp upstream and 100 bp downstream.
      and dump the 203 length ATG window sequences
      additionally the 
      is dumped on STDOUT. The position 500 (array index starts with index 0) 
      in each full length sequence is positive TIS
 
      
Usage: perl getvalidsequences.pl inputfile outputfile > mapfile
      
inputfile: ~1000bps nucleotide sequences
      
outputfile: 203 window nucleotide sequences
      
      
      
       Generate Features:
      
      
source code:  genfeatures.pl 
      
      
      This code generates features (arff format) from the 
      sequences(203 ATG window: upstream 100, ATG, downstream 100). 
      Total number of features
      upstream: monomers, dimers,trimers, tetramers, pentamers, codons
      
downstream: monomers, dimers,trimers, tetramers, pentamers, codons
      
Counter shows the processing sequence number
      
Defaultly all sequence classtype is set as positive
      so if you generate training file care should be taken to 
      make only 500 position and rest all negative for each full length sequence.
      This can be obtained from map file mentioned in getvalidsequences.pl
      since array index starts with 0 the position is 500, in the original full length
      sequence its 501 position.
      
Usage: perl genfeatures.pl inputfile outputfile
      
inputfile: ATG sequences file
      
outputfile: arff file
      
       Score calculation 
      
      
 source code:  mySMO.java 
      
      
 compilation: javac mySMO.java
      
 Usage: java mySMO > outputfile
      
 options given inside the java code
      
 complexity constant = 1.0
      
 cachesize = 250007
      
 epsilon = 1.0E-12
      
 tolerance parameter = 0.001
      
 t = training file
      
 T = Test file
      
 outputfile consists the classification results and score for each 203 nucleotide window sequence i.e., ATG score.