Discussion:
[galaxy-user] bedtools error
Tamara Simakova
2014-04-15 06:53:58 UTC
Permalink
Hello!

I'm using BedTools on the main Galaxy server. I try to use multiinter
function to compare two bed files, but there is an error in the resulted
file. Some coordinates that are present in both files are marked as they
present only in one file. The bed-files format is correct. What could be
the problem?

Thanks,
Tamara Simakova
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-user/attachments/20140415/5459e2b5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bedtools_error.jpg
Type: image/jpeg
Size: 555554 bytes
Desc: not available
URL: <Loading Image...>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: IAD39777_one-based_half-opened.bed
Type: application/octet-stream
Size: 1357 bytes
Desc: not available
URL: <http://lists.bx.psu.edu/pipermail/galaxy-user/attachments/20140415/5459e2b5/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_bad_regions.bed
Type: application/octet-stream
Size: 442 bytes
Desc: not available
URL: <http://lists.bx.psu.edu/pipermail/galaxy-user/attachments/20140415/5459e2b5/attachment-0003.obj>
Jennifer Jackson
2014-04-16 19:07:05 UTC
Permalink
Hi Tamara,

The results are correct, let me explain how to interpret the output.

For lines that have multiple input file names in the 5th column ("tag"),
this means that the set of overlapping intervals between the input files
represented by that line were completely contained within one of those
intervals - let's call this case "common set". When there is just a
single file in column 5th, this means that the common intervals did not
have a spanning interval in the input - and call this case a "common item".

In the output, both are reported:
- the largest interval is what is reported in columns 1,2,3
- number of intervals in the set represented by this largest interval
(and this line) column 4
- each file is listed as a tag (a form option) column 5
- how many intervals were in the set is reported in the last columns,
per input file

This is your header, the result of the "BEDTools -> Intersect multiple
sorted BED files" function:

chrom start end num list IAD39777_one-based_half-opened.bed
new_bad_regions.bed


_
__Example in your data of a "common set":_

See this line the result dataset:

chr7 117180358 117180364 2
IAD39777_one-based_half-opened.bed,new_bad_regions.bed 1 1


The interval above is found exactly in the input dataset
"new_bad_regions.bed". But, you will find the other interval that is
completely contained by the reported interval in the other input
"IAD39777_one-based_half-opened.bed", as this:

chr7 117180176 117180459



_Example in your data of a "common item":_

This is what you sent along in the attachments as highlighted lines.
Because there is no interval in either input dataset that spans all
overlapping intervals in the set with overlap, each are reported
individually. BUT - all are reported - in the "Common intervals output"
- so this is correct. The rows are filled out according to the same
rules as above, per line, as if each is a set of one, with overarching
knowledge that it is in common (overlapping) with others in the output.

Try a tool in the tool group "Operate on Genomic Intervals" for more
options.

Hopefully this helps,

Jen
Galaxy team
Post by Tamara Simakova
Hello!
I'm using BedTools on the main Galaxy server. I try to use multiinter
function to compare two bed files, but there is an error in the
resulted file. Some coordinates that are present in both files are
marked as they present only in one file. The bed-files format is
correct. What could be the problem?
Thanks,
Tamara Simakova
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
http://lists.bx.psu.edu/
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
http://galaxyproject.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-user/attachments/20140416/c987bccb/attachment.html>
Loading...